Why “What’s the best tool to measure the effectiveness of X?” is totally the wrong question

Time after time in online discussion groups I see questions like this one:

“What are the best tools to measure the effectiveness of [insert any program, policy, or initiative]?”

It’s a classic case of thinking evaluation is merely measurement, and measurement gives you the answers.

Many managers and non-evaluators think like this – that evaluation is merely a process of picking a few indicators and measuring them, and somehow the value or effectiveness of the evaluand (program, policy, initiative, etc) will be miraculously self-evident.

The reality is that many evaluators think like this too. If they are not of the “indicators” mindset, it’s so often a conversation about methods (qual, quant), or tools, or instruments.

Consider this quote (out of the latest NDE, #133) from Michael Quinn Patton:

… Moreover, and this is critically important, [Scriven] shows that valuing is fundamentally about reasoning and critical thinking.

Evaluation as a field has become methodologically manic-obsessive.

Too many of us, and those who commission us, think that it’s all about methods.

It’s not.

It’s all about reasoning.

(p. 105; quote broken into paragraphs for readability)

MQP makes the above comment in the latest brilliant edition of New Directions for Evaluation (#133, on Valuing, edited by George Julnes).

A great read from cover to cover no matter what, where, or how you evaluate; this issue goes right to the heart of one of the fundamental elements of genuine evaluation — values.

So, values and “valuing” are what makes evaluation, well, e-valu-ation.

And reasoning and critical thinking are central to “valuing”.

Evaluation, then, is about the use of evaluative reasoning and critical thinking to draw values-based conclusions based on the evidence and the definitions of “quality” and “value”. [These definitions were themselves developed with evidence-informed reasoning.]

Evidence is gathered using various methods and – yes – tools.

The evidence is but one ingredient in the evaluation; the evaluative reasoning is how we determine what evidence to gather in the first place, and how to interpret it once we have.

Once we get a grip on this fundamental reality, it’s amazing how it can change the whole conversation and steer it in the direction of more genuine evaluation.

Instead of leaping to tools and instruments, we come right back to the big picture purpose of the evaluation, its intended users, and what they intend to (or, could potentially) use the evaluation for:

  • Who asked for this evaluation and why?
  • Who are the people who need answers, to what questions, for what purposes, and to inform what thinking or decision making?

At this point the reasoning elements are key:

  • How should “effectiveness” (or quality, performance, value, etc) be defined in this context? Based on what? And, whose expertise (including that local knowledge from recipients and community members) do we need in order to get this right?
  • Which outcomes should be considered valuable, and are some of them more valuable or important than the others? Why? Based on what?
  • How big an impact would be “enough” given the investment of time, effort, and resources that went into this?

… and it’s only then that the methods or tools even become relevant:

  • What mix of evidence would be convincing when answering those questions?
  • What data sources or tools might we use to capture that evidence?

Once the evidence is in, that doesn’t mean the answers to those big picture questions have miraculously materialized.

No; the evaluation team’s job is far from done. There’s more evaluative reasoning and critical thinking involved in making sense of the evidence:

  • What performance picture does the evidence paint, when we look at it alongside our earlier reasoning about what “good” or “effective” should look like?
  • Who might disagree with our interpretation, and what evidence or reasoning would they offer to support their conclusions? How would we know if they were right? How can we check?
  • What’s the most powerful and compelling way to present our findings in a way that (1) engages our audiences; (2) conveys the most important insights clearly and quickly; (3) provokes genuine evaluative thinking about what it all means and what the organization should consider doing next.

What are some strategies that have worked – or failed! – for you when trying to get either evaluators or clients to understand the importance of evaluative reasoning and critical thinking?

Please share your experiences in the comments section by clicking through to the post.


5 comments to Why “What’s the best tool to measure the effectiveness of X?” is totally the wrong question

  • Gilles Mireault

    Hi Jane and Patricia,

    This post is really at the heart of the problem of doing evn. Unfortunately, I’m still in waiting of reading the fundamental text on how to do this correctly. Dr DAVIDSON’ book hit a lot of content on this topic but still I’m having a hard time trying to start a structured discussion with the stakeholders on this theme.

    I’m working right now on an evn to implement a new way of treating youngsters in a out of home placement facility. Lot of good intentions, no level of performance set, lot of monitoring things I’m not sure are “valuable”.

    How do you start the conversation ? how do you know it’done right ? that you’ve reach the real values of the people ?

    Many questions, not so many know-how.

    Keep the good work, you’re stimulating my critical thinking

    P.S the last NDE issue on valuing is indeed a pearl !

  • Hi Gilles,

    It’s an ongoing struggle for me too – you are not alone!

    One of the challenges I think we must respond to as evaluators is when clients push back on our work, saying we are “overcomplicating things”.

    Now, oftentimes this comment comes from someone who is looking for an overly simplistic approach (pick a few indicators) that will in the end tell either a misleading story or no particular story at all.

    But, rather than simply respond with that argument, I really do try to listen to what they are saying and see if I can find a way to authentically represent the reality about performance, value, or quality – but in a really clear and straightforward way.

    There are some things we can do with that conversation at the front end. One strategy I use is getting together a list of what evidence will be tracked and asking questions like: “What would these outcomes look like if they are just good enough but not brilliant? What would unacceptably weak look like, and why? What would a really good set of outcomes look like?”

    What I am trying to do here is get the client to paint a few evidence scenarios in evaluative terms, to get them to articulate what “good”, “just good enough”, and “not good enough” should look like.

    Sometimes it makes sense to work these scenarios up into evaluative rubrics. [These generally work best if they are developed with stakeholders, which takes some time to do but is invariably worth the effort because you can have those conversations up front rather than arguing about evidence interpretation after the fact.]

    One thing we can do with evaluative rubrics is create rich descriptions of what different levels of performance look like, based on the full mix of qualitative and quantitative evidence (not just one indicator).

    And, if you have levels, you now have a way of representing ‘snapshots’ of performance or progress over time against these levels.

    If you can crunch it down to a really good, succinct piece of data visualization* then quite often the reaction from the client is “Ohhh, now that’s what I’m talking about!”

    In other words, there are ways to crunch multiple pieces of mixed method evidence into a performance snapshot that is likely to satisfy those who just want something where they can track progress over time in a visual way.

    But the big advantage compared to the “pick an indicator” approach is that it is simple but not simplistic. It gives a succinct summary of where things are at, but underlying that summary is a richness of evidence that covers far more territory than one [usually narrow] indicator ever could.

    Hmmm… you remind me of another post I have been meaning to do about indicators …

  • Irene Guijt

    Absolutely the crux, Jane. I recently blogged along similar lines in relation to ‘rigor’ and the US intelligence community. http://bigpushforward.net/archives/1432. It’s the quality of thought that makes for rigor.

  • Robert K. Walker

    Question: What do scholasticism and Calvinism, on the one hand, and the Cartesian-Newtonian worldview, on the other, have in common?
    Answer: They both downplay the role of reason. For the former, the answers are clear from traditional texts. For the latter, they are clear from sensory experience.
    The challenge to these worldviews is very modern, but with a rich historical tradition. Perhaps it is best called holism.
    Evidence-based means just that. It does not mean that evaluative conclusions can ever be perfectly clear from the evidence.

  • Nick York

    Excellent post – thanks Jane, and just to say I completely agree with your comment and am now getting hold of the NDE issue to have a good read. We have tried to push a greater focus on theories of change and hypothesis generation here, rather than starting from the measurement end. If not, when you get results like – “high quality RCT shows that intervention x doesn’t have an impact”, but we don’t know why, what do you do next?