Michael Scriven on “Rethinking all of evaluation”

Jane posted the first draft of  my proposed workshop for the summer institute at Claremont Graduate University, where I suggest it’s time to reconceptualize evaluation from the ground up. Here’s a little bit more to explain what I’m after with that theme…

Let’s define a Copernican revolution—somewhat along the lines of ‘paradigm shift’—as a radical shift in the framework of our thinking about a substantial subject matter area, i.e., a rejection and/or redefinition of the most fundamental assumptions involved in the theories and possibly the language and data formats of the area. In the physical sciences, geocentrism was an early example of a framework that got rocked, and absolute space another. In these terms, I want to say that the discipline of evaluation as it now stands has gone through two of these already. The first was the shift away from the basic concept of program evaluation that was part of those texts in the social sciences that mentioned it at all in the 1950s/60s era, namely the idea that evaluation consisted of: (i) formulating the goals of the program  in behavioral terms, (ii) finding or creating a test of those behaviors, and (iii) applying the test to the targeted population so that one could determine how good the program was by finding out the extent to which it had met these goals. You still see professional evaluators giving that as the definition of evaluation on Evaltalk and (more often) elsewhere. It involves half a dozen gross errors, including the exclusion of goal critique, ethical process critique, side-effect determination, cost analysis, comparisons, etc. (See Hard-Won Lessons in Program Evaluationan issue of New Directions in Program Evaluation, as it was then called.) It was replaced by a much more far-reaching paradigm, hinted at in the phrase ‘determining comparative cost-effectiveness within an ethical/pragmatic framework’.

The second Copernican revolution in evaluation, on my account, was the shift from the ‘geocentric’ fallacy of thinking that evaluation is program evaluation—exhibited in the titles of textbooks and theories/models which announce their coverage of evaluation but with a content discussing only program evaluation—to the ‘heliocentric’ approach that takes the key focal element to be evaluation as the sun around which rotates a solar system of planets including program evaluation, product evaluation, policy analysis, personnel evaluation, etc. This shift leads to many practical as well as theoretical/conceptual results of great importance, like the first shift, but details of those is not my present topic.

Now, I’m suggesting that it’s time to be more self conscious about those past shifts, in our thinking and practice AND to consider whether it’s time for a third shift. What might that be? Suggestions from you—and criticism of the above account—would now be appropriate, and comments by you on those proposals, possibly including some from me if I’m not knocked unconscious by the brilliant insights themselves!


Michael Scriven

P.S. (A comment about e.g., transformative evaluation.) The shift away from assuming traditional Western European values to a framework more accepting of other cultural norms including those which fully accept the rights of women, ethnic, religious, GLBTG, and ‘disabled’ persons is part of the ethical framework referred to in the second revolution above, although it would certainly be an important revolution in its own right if there weren’t some other radical considerations that must also be incorporated.

19 comments to Michael Scriven on “Rethinking all of evaluation”

  • Dear professor Scriven

    Yes this is the right start from an evaluation based problematic! Thank you for your brilliant shift! I knew it’s time for a shift and re conceptualization. We must leaf our evaluation thinking in a developmental and genuinely manner toward the next shift or generation of evaluation! And same the past you are the initiatory.



  • Hello again Michael I was privileged to be on a telephone conference with you a few years ago via Wayne Edwards (ex Massey University ) while studying and enjoyed engaging with you on the topic of evaluation and have continued to explore this fascinating area. I agree that all involved in evaluation should continue critiquing their framework and practice to ensure that wider perspectives are listened to and also thinking about who is in the middle of an evaluation.

  • Michael Scriven

    I’m counting on kiwis and kangaroos to be contributing heavily to this discussion—after all, you stand on your heads down there, so you should have no trouble standing the Northerns’ thinking about evaluation on its head, right? So you don’t want me doing the only long comments, right?

    Here’s a further thought that might stimulate further reactions. Apart from the two ‘internal’ Copernican revolutions, there has been an even more profound ‘external’ one concerning the outsider’s take on evaluation. This went from the bad old days of seeing evaluation as simply a matter of taste, hence beyond the pale of scientific investigation and methods, to the recognition that in some odd way, it IS possible to give solid reasons for evaluative conclusions—sometimes, e.g., in the evaluation of programs and products. The third stage of that change is still incipient, but caps the change by completely reversing the original conception: it is the recognition that evaluation is in fact the logical backbone of every discipline, the indispensable guardian of quality that separates pseudo-science from real science, astrology from astronomy, creationism from evolution, etc. It’s no mere matter of taste that establishes the difference between good and bad explanations, data, and inferences, and the ability to make those distinctions—i.e., the ability to evaluate—is essential to science; without it, there’s only empty rhetoric and mere matters of taste. That’s a Copernican revolution in the perceived status of evaluation.

    Now, what are we still treating as obvious or true (or false) that doesn’t
    deserve that status? Or what needs to be seen in a radically different way?

  • Jane Davidson

    Michael, I’m not sure I can come up with a ‘Copernican’ revolution, but perhaps I can run a couple of ideas up the flagpole that have came as realizations or light-bulb moments for me and/or still seem to surprise other people I talk to …

    This idea got a little long, so I’ll start another post and see what you and others think.

  • Thorbjoern Mann

    A change of the kind suggested has been in the making for some time but in a different field. It is based on H. Rittel’s ‘argumentative model’ of design and planning, the insight that the information that should guide decision-making (in other words, evaluation of choices) is distributed in the population of potentially affected people, and is elicited in the argumentative discourse about the pros and cons of proposals. I have followed up on that by developing an approach for the evaluation of such planning arguments (it had been neglected by logic and rhetoric etc.) and made suggestions for integrating such methods in the overall planning (design, policy-making…) discourse. Details (book, articles) on request.
    Thorbjoern Mann

  • Nan Wehipeihana

    It is the recognition that evaluation is in fact the logical backbone of every discipline, the indispensable guardian of quality that separates pseudo-science from real science, astrology from astronomy, creationism from evolution, etc.

    I’d like to believe in the above but who gets to make that judgment? I’m not convinced that such a claim has yet been born out in practice or in the way society, government, purchasers of evaluation conceive of evaluation and the extent to which they see it as the holy grail of quality and judgment (well certainly not in NZ).

    It ‘feels’ somewhat self-serving despite the rationale and emerging evidence around the intent and use of evaluation. (We’re not big on blowing our own trumpet here in NZ, even when it’s patently obvious.)

    I think in NZ we are beginning to see a ‘valuing’ or an appreciation of the ‘functionality purpose’ of evaluation; as you say the ability to establish the difference between good and bad explanations, data, and inferences, and the ability to make those distinctions.

    And I think there is some/a good level of acceptance of evaluation as ‘science’ (when done well) and again as you say without it, there’s only empty rhetoric and mere matters of taste.

    For me, in the NZ context, the tide’s still out on the perceived status of evaluation being a Copernican revolution; the recognition that evaluation is in fact the logical backbone of every discipline. I’m not convinced its quite reached these shores.

  • Michael Scriven


    Indeed you’re right that few people, even in evaluation, accept this view. That’s why I said “The third stage of that change is still incipient,…” My grounds for saying it’s true are nothing to do with acceptance, only with logic. If the logic is sound, it’s not bragging to assert the conclusion, even in NZ. It is a logical fact that no discipline can lay claim to that title unless it has standards of quality for its data, hypotheses, theories, and methods. Applying those standards, or showing that they have not been met, is by definition evaluation. Hence it’s a logical fact that the key component of establishing and maintaining a discipline’s credentials is evaluation. QED

    Have I slipped from the standards of good logic in that argument? If so, please correct me. If not, we have every right to the title of king of the disciplines, or words to that effect, whether or not kiwis think we shouldn’t mention it. I’d rather like to see them pushing that position hard, in fact leading that part of the third revolution, so I hope to persuade you and them that this is true, highly important, and something we should all be pushing BECAUSE it’s the absence of good evaluation in science that has led to such shocking results as having the whole of science built on peer review which fails to meet minimum standards for a good evaluation process e.g. reliability, and–worse–to have been rarely studied for meeting those standards, and having it rarely known that it fails those standards. NZ also probably has the best national system for research evaluation in the world (see Chris Coryn’s work), and that’s another good example of world class achievements. So how about it, Aoteoroa NZEA; you could reform the world of science by pushing to make it scientific (now there’s a thought!), i.e., following its own standards instead of just talking about them as if they did, like every other national science establishment?

    P.S. As someone who not long ago committed to spending the rest of my life in NZ, and would still be there if the government hadn’t made it seem impossible to run an evaluation doctorate at Auckland University (it was a minor side-effect of a pretty good plan, so no hard feelings) I have to add that I noticed some well-justified variations from the national standard of modesty, not just in yachting circles (for a while!), something perhaps best justified in relation to the creation and continued (considerable albeit incomplete) support of the Treaty of Waitangi, an achievement not irrelevant to my taking up a tenured job offer there. So I’m not pushing for a violation of universal modesty, only for ‘fair’s fair’–if you preach scientific method’s virtues, you ought to follow them. (Applies to evaluators too–hence meta-evaluation.)

    P.P.S. I’d be most grateful to Thorbjoern Mann if he’d send me some details on the very interesting results he mentions.

  • “You say you want a revolution. Well, you know…”

    This call seems to assume there is too much consensus and things need shaking up. On what grounds? You could argue the reverse, that these days there are z to z views on evaluation and if there is any problem it is in the difficulty of accumulating an agreed corpus of knowledge. That objective itself might be questioned by many. In these circumstances how do you guide people through the choices available?

  • Achieving the type of Copernican revolution in evaluation that you describe Michael, here in NZ is certainly something that some of us here in NZ are steadily and perhaps still somewhat stealthily (in our humble way) working towards! There still abound many basic and fundamental misconceptions about evaluation here in NZ, in government, in academia, and in general. For example, that evaluation is about assessing whether a project or program met its objectives, that evaluation is all about describing the program, and then making illicit claims about its value, to name only a couple of the stand outs for me. The idea that evaluation is about the judgment of merit, worth and significance, and that there might be systematic ways of doing this are still only just gaining a foothold here. Our quiet revolution is happening on a number of fronts, and is evident in the recent development of a set of evaluator competencies that we hope will stimulate debate about not only the criteria for judging who might legitimately call themselves an evaluator (see http://www.anzea.org.nz), but also about what evaluation is vis-a-vis other activities that masquerade as evaluation. IMHO, there is truly transformative promise emerging here, in the application of evaluation specific logic in many indigenous contexts, as M?ori discover that applying truly evaluative processes ensure their values are at the heart of evaluative judgment. Who knows what could happen if Maori world views and values systematically underpinned evaluative judgments about the quality and value of services and programs designed for them?!!

  • Mark Dalgety

    I too do not have a Copernican revolution on hand, but I do have a comment about the significance of one of the planets in the evaluation solar system which has wider implications.

    Michael, you seem to state in your opening post that approaches such as transformative evaluation or culturally responsive (or based or informed?) evaluation do not seem significant enough to qualify as fundamentally revolutionary. And Jane’s experience that traditional evaluation logic (evaluative criteria developing rubrics, judging the evidence) can accommodate varied cultural norms seems to support that notion. I agree in part and would locate culturally based approaches as a new and evolving sub branch on the values branch of Akins’s (2004) tree which classify the major evaluation theories.

    However where I might differ is in your apparently seeing these approaches as primarily ‘ethical’ frameworks that address ‘the rights of various marginalised groups’. The implications are much broader than ethics. These approaches posit that the underlying ontology, epistemology, methodologies and practice of evaluation are fundamentally informed by a pervasive Western world view. They raise questions regarding the validity and therefore quality and usefulness of current evaluation practice, as well as ethical concerns about harm. That has a revolutionary sound to me for evaluation and science in general.

    Some examples of hidden cultural assumptions of ‘mainstream’ evaluation and science that come to mind are:
    The not surprising assumption in the evaluation literature, that the North American context is often the default, and so we little old Kiwis need to modify for our differing needs.

    I notice as I prepare with a group of colleagues a workshop that the traditional western evaluation conference structures and norms of behaviour feel constricting for what we propose.

    In New Zealand discourse, the construct of family has begun to be modified by the M?ori concept of wh?nau.

    Finally remind me again why the northern hemisphere is on top of the globe? What are the mindsets that flow from that one in our post-Newtonian universe?

    These types of issues are about the culturally based nature of mainstream Western evaluation and science, rather than the rights of marginalised groups. So we are not considering here an asteroid or Pluto on the fringes of the evaluation solar system, we have instead perhaps an evolving Jupiter rising in the night sky.

    Alkin, M. C. and C. A. Christie (2004). An Evaluation Theory Tree. Evaluation Roots:Tracing Theorists’ Views and Influences M. C. Alkin. Thousand Oaks, Sage Publications, Inc.

  • Jane Davidson

    Mark, I’m not sure I agree that conceptualizations of evaluation based on indigenous, minority, or other ‘non-Western’ ontology, epistemology, methodologies, and practice are a ‘planet’ in orbit around the evaluation ‘sun’. These are, at the very least, reconceptualizations of the sun (evaluation) itself. Or maybe of the nature of the 10-dimensional space-time in which the solar system is embedded. [I’m not sure about the space-time aspect just yet, but have the sense that Something Big is brewing and just hope that my tiny brain will be able to fathom at least some of it as it unfolds …]

    I do think (and was trying to explain in the other post, perhaps not too successfully) that the so-called “traditional” evaluation methodologies that are most dangerous (in terms of potential harm) and least likely to be valid or useful in indigenous, minority, and other non-Western contexts are the ones that skip over the ‘values’ piece too lightly. The most common approaches I see are either (a) avoiding it like the plague (because it’s ‘unscientific’) or (b) assuming a ‘default’ (Western/white/North American/whatever) definition of (for example) what outcomes should be considered ‘potentially valuable’ enough to include in an evaluation and with what relative emphasis.

    It is critically important for validity (as well as for social justice) that the right voices at the values-definition table – and all too often there isn’t even a values-definition table at all. The values are there but they are ‘undiscussable’ because they are the ‘default’ that is invisible to those who hold them. That’s why I believe it’s critically important to incorporate evaluation methodologies and processes that are very explicit about the ‘values’ piece up front as well as in the evaluative interpretation/sensemaking process.

    And yes, I totally agree, it’s unfortunate that so few people understand that, actually, South is UP. Thank heavens the Wizard of New Zealand has put out a world map that helps correct this widely held misconception:

    Wizard's World Map

  • Michael Scriven

    Jane and Mark;

    Right, the view that people down under have an inverted view of what’s ‘really’ true is an excellent example of a loaded perception. That opens the mental door for replacing the Northern epistemology/ethics/concept of family etc. with a less biased one. BUT, apart from redoing maps so that NZ is on top, what exactly IS the new concept going to be, i.e., the new world view in the specific fields of epistemology, ethics, methodology, and what useful new results does it produce? Copernican revolutions are not JUST assertions that an alternative view is right, they are fully argued proofs that it’s better e.g., because it explains or predicts some phenomena that the original view did not. OR, not quite as good but very important, the revolutionary view provides an equally good general account. (The multihull yacht designers used to argue that their design was simply better than monohull design, a Copernican revolution; but it turns out that while it IS faster, it’s also not self-righting so it’s much more dangerous and so less reliable for transoceanic voyages; so we’re now more open minded about it, but not overwhelmed by it as with relativity and heliocentrism. Etc.)

    My take is that we have simply abandoned the old view of evaluation as mere expression of taste, or of program evaluation as mere determination of the extent to which goals have been met; so there were true revolutions in those cases, backed by proof of improvement. Now, can we show that evaluation or some sub-division of it, as now reconceived, rests on some other assumptions that should be discarded? Clearly it MIGHT (as the UP/DOWN points of view and maps show) but DOES it; exactly what is the new and better (or equally good) alternative view and what’s the evidence for improvement (or parity of merit)? Possibilities don’t qualify as proofs.

  • Michael Scriven

    Responding to Rick and Mark….

    Rick Davies says: “This call seems to assume there is too much consensus and things need shaking up. On what grounds? You could argue the reverse, that these days there are z to z views on evaluation and if there is any problem it is in the difficulty of accumulating an agreed corpus of knowledge.”

    Good point to raise! But I don’t assume there is too much consensus etc., I only want us to think about that possibility. The alternative concern—the one that you raise—may be more important. We just need to avoid complacency in a discipline that has already had more than one basic assumption crumble under its feet, just when everything seemed secure.

    Mark suggests a different point—that I’m treating the move towards multicultural sensitivity in evaluation as not worth calling a revolution. It’s a great revolution, but not specifically one in evaluation—or so it seems to me, for a couple of reasons. (i) It was a revolution in the content and approaches of the social/behavioral sciences in general, including political science, psychology, and sociology, but there wasn’t that much program evaluation done before it began to take effect. I find it hard to think of examples of serious evaluations that suggest a period within the short history of evaluation in which the general climate in evaluation was condescending towards indigenous cultures. As to the idea that there was condescension towards Southern evaluators, my impression was that it was/is more common in the South than the North. If we extend our focus to not just ethical insularity but epistemology as well, this seems even more true. Karl Popper, and many others like me, came South via free choice because we thought the quality of thinking there was at least on a par with anything up North; my take on the situation in my many years at the U of Western Australia—and the same was true of Auckland—was that the general belief in the South that ‘North was where the action was’ meant that faculty in my areas (education and philosophy) got overseas leave and funding at a level that resulted in our staff being about twice as well informed about current quality research as the people at Harvard and Berkeley, where my recent appointments in the North had been. (Of course, since me first two degrees—in mathematics and philosophy—were from the South, I had other evidence for this.) Add Waitangi to that and the South looks at least as good as the North on cultural equity in the academic-social as well as the political-social domain. (ii) Throughout the academic world, just as in the rest of the world, there was demonstrable unethical bias against some groups, notably women, ethnic/religious minorities, the physically handicapped, and GLBTG. The substantial although incomplete improvement in that was/is indeed a great revolution, just not a revolution specific to evaluation. Perhaps more importantly, the demonstrable error in the prior practices was in their ethics. You like the idea that it was also true, and continues to be true, in the North-centered epistemology. That remains to be shown. It’s certainly possible, and indeed such a serious possibility that it should be carefully considered. But the possibility doesn’t justify giving any weight at all to alternative views just because they’re Southern and hence MIGHT have been discriminated against. They should get a serious hearing, though, and that’s exactly what I’m hoping to hear about here.

  • Patricia Rogers

    The World Map simply turns the Mercator projection upside down, using a projection which provides a distorted representation of the areas of the different continents and countries. As the website for the Peters Projection (http://www.petersmap.com/page3.html) explains it:

    “The Mercator projection creates increasing distortions of size as you move away from the equator. As you get closer to the poles the distortion becomes severe. Cartographers refer to the inability to compare size on a Mercator projection as “the Greenland Problem.” Greenland appears to be the same size as Africa, yet Africa’s land mass is actually fourteen times larger. Because the Mercator distorts size so much at the poles it is common to crop Antarctica off the map. This practice results in the Northern Hemisphere appearing much larger than it really is. Typically, the cropping technique results in a map showing the equator about 60% of the way down the map, diminishing the size and importance of the developing countries.

    Greenland: 0.8 million sq. miles

    Africa: 11.6 million sq. miles

    This was convenient, psychologically and practically, through the eras of colonial domination when most of the world powers were European. It suited them to maintain an image of the world with Europe at the center and looking much larger than it really was. Was this conscious or deliberate? Probably not, as most map users probably never realized the Eurocentric bias inherent in their world view. When there are so many other projections to chose from, why is it that today the Mercator projection is still such a widely recognized image used to represent the globe? The answer may be simply convention or habit. The inertia of habit is a powerful force.”

  • I’m coming into this discussion somewhat late. For some reason I’m reminded of Pirandello’s Six Characters in Search of an Author. Michael has a hunch and everyone else either harrumphs or clear our throats. However, if Michael is correct I’d suggest that we may need to look in unusual places. It seems to me arrogant and self delusional (in a minor kind of way) to assume that the major changes in evaluative understanding will happen within the evaluation field. The evaluation field is, in my experience, a very conservative trade that moves slowly and cautiously. It follows revolutions not creates them. The reassessment of evaluation being more than judging whether a program objective had been reached was more influenced by broader changes in social science and policy development than from any internal reassessment. There’s a character in Alan Bennet’s play The History Boys, who says that history is women following behind with the bucket. Evaluation is a bit the same.

    So if we are looking for the next sea change in evaluation I’d not shine the light of my torch inside evaluation, but in some other domain. What are the big debates going on in policy, in organisational development, in action research, in the systems field, in the knowledge management domain, in community development, in the reorientation of overseas aid towards capacity development … I don’t know. But not I think in evaluation. Unless you take a book out of complexity theory and start searching for very very small signals … the huge exceptions to the general rule.

    Goodness me. Harrumphing and clearing my throat.

  • Jane Davidson

    Bob, you may well be right, and I must confess I haven’t had my nose enough in the various other disciplines to be able to reflect much on what’s happening there.

    Evaluators may not (as a group) be early adopters, but most are also members of other professional communities, some of which are probably onto something interesting. I suppose the big question is which of these ideas get buy-in from a critical mass in the evaluation community – and which haven’t (or won’t) but should.

    When Michael asks a question like this, my first assumption is that he’s already thought of the next Copernican revolution and is just seeing whether people can either guess what it is or come up with something else interesting he hadn’t thought of yet. But perhaps this one really is up in the air still …

  • Haddy Njie

    Dear Prof. Scriven,
    I have been following the discussions on the need for a continuous Copernican revolution in the field, but I do not think I’ve gotten the much needed experience and knowledge of evaluation to partake in that debate. I found the discussions very enlightening though. I am presently taking a class “Evaluation of New Educational Programs” and as part of our class activities, we are doing a role play on the evaluative perspectives, views and positions of major contributions in the field. I will be trying to present some of your perspectives on evaluation. Fortunately, I stumbled on this site and wondering if you could inform me about your position as far as the roles of stakeholders in the evaluation process is concerned. Thanks inadvance for your reply.

    Haddy Njie

  • I don’t know about a Copernican revolution but it seems to me that a great deal of the wailing, flailing and gnashing of teeth with respect to evaluation owes to a conflation of measurement and evaluation. They are not the same thing but often treated as though they are.

    To evaluate something (anything) is to determine its value. That raises several related questions: Value in what terms? Value to whom? Value in relation to what?

    To evaluate a program is to determine its value. Again the questions: In what terms? To whom? In relation to what?

    To assess the efficacy of a program in achieving its objectives is not the same as evaluating the program (although assessing the efficacy of a program might be a step along the way in determining its value to a particular audience).

    So, for me, the evaluation waters and thinking are muddied by some very sloppy or careless or unthinking use of the language. Straighten that out and a Copernican revolution might be unnecessary.

    P.S. I’ll post this to the EvalTalk list, too, which is where my attention was drawn to this blog.

  • Daniel Ticehurst

    Dear Michael,

    i am sure you are making an important point, but i genuinely do not understand what it is. It is just that using an analogy based on a 200 year process that saw a heliocentric model replace the Ptolemaic model to explain cum understand the need for a bother shift. Is this what Jane refers to in her blog: the difference between evaluation and research???

    Perhaps I am way behind the times but i) formulating the outcome of a programme in behavioral terms, (ii) finding or creating a test of those behaviours or trying to develop signs with them – institutions, groups or individuals – regarding how their capacities or behaviours could change and look like in the future (think theories of change and outcomes in a log frame (iii) applying the test to the targeted population so that one could determine how good the program was by finding out the extent to which it had met these outcomes as, say, outcome mapping and realistic evaluation tries to do is surely not passé. If it was replaced by ‘determining comparative cost-effectiveness within an ethical/pragmatic framework’ please explain how and why. The flaws you outline – including the exclusion of goal critique, ethical process critique, side-effect determination, cost analysis, comparisons, etc. – are surely the results of mis-use rather than anything intrinsic to defining the scope of behavioural changes, and how best to understand how and to what extent programme support is stimulating them and with what effect (impact). As the late and great Gil Scott-Heron said: this revolution will not be televised…….Thanks, Daniel