“No dodgy stats” – highlights and lowlights of the year

Pic by Will http://invisiblepics.com/invisible-measuring-tape/

As we run down to the end of our first year of Genuine Evaluation (the blog), we’ve decided to have a series of weeks focusing on what are emerging as key aspects of genuine evaluation (the theory and practice).

This week we are focusing on the quality of evidence used in evaluations.  The first standard of evaluation, is labelled “accuracy”.  Thomas Schwandt commented  in a recent AEA conference session on evaluation in international development, it might be better to clearly broaden this beyond statistical accuracy and look at the quality of evidence for the warrants made in the evaluation – that is, the empirical base for claims about how things are (description) and what produces results (causal inference).  In our jingle we’ve referred to this as “no dodgy stats”, but the same concerns about validity apply to qualitative data.

Some of the common problems in this area are:

using inappropriate measures, including uncritically reporting people’s opinions as if they were completely adequate assessments of a situation – for example their assessments of the success of an intervention, when they have little information about its outcomes

poor sampling  – taking a sample that is highly likely to be unrepresentative and then reporting it as if it were, including fancy inferential statistics that dress up the results with the mystique of science, or taking the results from a focus group (a method for exploring issues and understanding points of view) and then generalizing it to the population

constrained focus that does not look at unintended outcomes or costs to other parties, producing a distorted cost-benefit assessment

high-stakes performance measurement leading to data corruption and gaming.

So, what are the exemplars or examples from 2010 we should highlight and learn from for the future?  Good and bad examples.  What comes to mind?

Here are some nominations:

1. Victorian health service performance monitoring framework

This sorry saga seems to be continuing as an exemplar of goal displacement, where organizations achieve the required targets but at the cost of the real goals.

A survey of 124 emergency department doctors last month found that hospital chiefs were not allowing doctors to activate ambulance bypass procedures when emergency departments were full, because they did not want to fail to achieve a government benchmark that says hospitals should be on bypass less than 3 per cent of each year.

Seventy per cent of the doctors said this had been a problem for them and one said it had cost lives because ambulances were delivering seriously ill patients to overcrowded emergency departments that were unable to care for them. The Melbourne Age 11 Nov 2010 (click through for Tanberg’s cartoon that sums up the situation eloquently) 

2. UK accident and emergency waiting time targets 

By contrast, the UK government has scrapped its targets for waiting times in accident and emergency departments after evidence that this was leading to worse care:

Hospital waiting time targets, including the four hour wait in accident and emergency (A&E) departments, are to be scrapped. The Health Secretary Andrew Lansley made the announcement as he told MPs that a public inquiry would be help into the scandal-hit Mid-Staffordshire NHS Trust.Lansley said the “far-reaching reforms of the NHS [would] go to the very heart of the failures at Mid Staffs”, which he called “one of the darkest chapters in our national health service”.

At least 400 more people died at the hospital in Mid Staffordshire between 2005 and 2008 than would otherwise have been expected. An earlier inquiry into events at Mid Staffs found that patients endured “unimaginable distress and suffering” from poor care and were left “sobbing and humiliated” at the hospital, which had become focused on targets and cost-cutting.

Lansley told the Commons that the culture led to a situation where patients “were being discharged when they should not have been, and patients were being transferred to inappropriate wards where there was no provision to look after them”. He said that Robert Francis QC, who chaired the first inquiry, had been crystal clear that the obsessive chasing after targets had created a situation in which managers and frontline staff lived in fear of losing their jobs. “We will scrap such process targets and replace them with a new focus on patients’ outcomes – the only outcomes that matter,” he said. (Hansard, 9th June 2010: Column 333. via WebMD Health News) 

10 comments to “No dodgy stats” – highlights and lowlights of the year

  • This has absolutely nothing to do with your post for Nov 24, but, then again, I’m writing it yesterday…Greetings from Edmonton, Alberta, Canada! Hello Patricia and Jane. What one, green button can do. I’m spreading the word in my part of the world with the hopes I can cause enough stir for the 2011 Canadian Evaluation Society’s Conference, “Generations: Multi-Generational Approach to Evalation.” Dr. Robert Stake is our keynote speaker. What a dear-heart that man is!

    Perhaps there is a thread of commonality with your posts of late with my post (a shameful plug here, but it’s for evaluation folks!): The CES 2011 conference theme is Generations: a multigenerational approach to evaluation. Over the past thirty years (one generation) the field of evaluation has changed. This conference will highlight:

    >changes in evaluation techniques and tools,
    >changes in evaluators including education, training, experiences and perspectives,
    >changes in the populations served by the programs and services we evaluate

    Better yet here is the link (I hope it works…):
    Our 2011 theme is Generations: a multigenerational approach to evaluation. Over the past thirty years (one generation) the field of evaluation has changed. This conference will highlight:

    changes in evaluation techniques and tools,
    changes in evaluators including education, training, experiences and perspectives,
    changes in the populations served by the programs and services we evaluate
    Our 2011 theme is Generations: a multigenerational approach to evaluation. Over the past thirty years (one generation) the field of evaluation has changed. This conference will highlight:

    -changes in evaluation techniques and tools,
    -changes in evaluators including education, training, experiences and perspectives,
    -changes in the populations served by the programs and services we evaluate

    Here is the link to the Canadian Evaluation Society’s Home Page (you will be able to find the link to the conference on the left hand side-I hope the link works, if not, just google “Canadian Evaluation Society” and you will find us):

    Hope to see you in Edmonton in May, 2011 and trust me, we do not have mountains of snow at that time of year!

    Donna McBey
    Edmonton, Alberta, Canada

  • Patricia Rogers

    Sounds like it will be a great conference. Hmmm, May….

  • Patricia Rogers

    Emergency waiting times are now in the news in Canada. It’s easy to be distracted by the furore over the “I’m eating my cookie” refusal to talk to the media (http://www.theage.com.au/national/backlash-over-cookie-monster-sacking-20101126-189io.html) but of more relevance is the proposed legislative amendment being debated, which would put limitations on waiting times in emergency departments (http://edmonton.ctv.ca/servlet/an/local/CTVNews/20101125/edm_sherman_politics_101125/20101125/?hub=EdmontonHome). Should such performance targets be in legislation? Is there any evidence of learning from the UK and Australian experiences?

  • Donna McBey

    We would love to have you Patricia (and Jane too)! We are an amazingly friendly bunch in Canada and especially Edmonton. Bring your buttons and I’ll help you promote “Genuine Evaluation”…perhaps it will address your post about health care waiting times. As for the “cookie” thing – media hype to over shawdow the real issues, sorry, but true.

    PS…this is the first “blog” for me to “blog”…I’m a neophyte-blogger, so patience on behalf of my fellow bloggers would be most appreciated. :o)

  • High stakes performance measures assume all good things are measurable. They create incentives for ‘duking the stats’ (portrayed brilliantly in ‘the wire’ in both the Baltimore police and education departments). They turn career minded people working in complex fields with all the skills and good intentions required to be effective into automaton operating on a time horizon of a few minutes.

    Mark Friedman subtitled his book ‘Results Based Accountability’ ‘trying hard is not good enough’ but when ‘measures’ are used for ‘accountability’ we may wish to stop and consider that ‘tying hard to measure results is not good enough’ as Yoda may say ‘do or do not-there is no try’

    Finally a quote about the need to remove the blinkers of a results focus to do good work…note the emphasis on personal relations to get things done.

    Thomas Merton, “The Hope of Results”

    Do not depend on the hope of results. When you are doing the sort of work you have taken on . . . you may have to face the fact that your work will be apparently worthless and even achieve no result at all, if not perhaps results opposite to what you expect. As you get used to this idea you start more and more to concentrate not on the results but on the value, the rightness, the truth of the work itself. And there, too, a great deal has to be gone through, as gradually you struggle less and less for an idea and more and more for specific people. The range tends to narrow down, but it gets much more real. In the end, it is the reality of personal relationships that saves everything.

    Thomas Merton, “Letter to a Young Activist”
    cited in “A Jesuit Off-Broadway” by James Martin

  • Patricia Rogers

    Thanks, Andrew, for those links.

    Does the Merton quote go further than acknowledging that not all results can be easily or even possibly measured, and suggest that even when there is evidence you are being ineffective or even harmful, you should persist because you know you are doing the right thing? Hopefully this is not what was meant.

  • Patricia Rogers

    Another example to add to the list – the emerging story about faked test scores in Atlanta (http://www.ajc.com/news/atlanta/atlanta-public-schools-cheating-758757.html) (Hat-tip to GreeneBarrett http://twitter.com/GreeneBarrett)

  • Chad Green

    Since the year has not ended yet, I would like to propose a redefinition of the word “accuracy” in celebration of the recent breakthroughs in stem-cell research, in particular, Harvard researchers’ reversion of mature cells back into the adult stem-cell stage. It requires a series of transformations of this concept as follows:

    Picture a measurement system within the context of the scientific method that defines accuracy and precision using the traditional analogy of a target (i.e., bullseye). Now replace that image with an image of the cross-section of the Earth with three layers: core, mantle, and crust. Now label each of the three layers, starting at the core, with Peirce’s concepts of firstness, secondness, and thirdness (i.e., the why, how, and what). Now replace that background image one more time with the image of the three qualities of stems cells pictured as follows: http://bit.ly/gWnQQY.

    Stem cells image

    Using this framework, can we have a more sophisticated conversation about the meaning of validity in the context of evaluation? For example, with the stem cell as the analogy of choice, could we perchance transform the ongoing debate about the dynamics of truth (i.e., the search for firstness) within our field and beyond?


  • Mamnoon Chad

    Your seasonable attention inspired me now it is good time for revolution in evaluation



  • Chad Green

    Thank you, Moein.

    How about we show how this framework contributes to the science of logic by incorporating beauty and justice as processes?

    Beauty as process

    Classical logic involves the process of reducing a whole into its parts. Intuitionistic logic includes classical logic but also the law of the included middle. Together these two logics form the ends of a continuum with many intermediate logics in between. How can we map this continuum onto the framework above?

    Bob Williams’ presentation on truth, beauty, and justice at the AEA conference in San Antonio provides some inspiration. For example, one of his examples of beauty, the triangle of activity theory, is the same thinking device that I use to seek firstness in my transdisciplinary research, except that I focus cognitive effort on the top-most triangle. Let’s assume that the shape of a triangle represents both ends of the logic continuum, similar in nature to the bifurcation patterns in the framework.

    Using the triangle as a simplexity thinking device, you can now think of the top of the triangle (i.e., psychological tools such as language) as the source of dualities if you work from the top down, and the base of the triangle (subject and object) as the disintegrators of dualities if you work from the bottom up. What I have found is that in order to deeply understand dualites no matter how complex, you need to use both processes of knowledge creation and disintegration simultaneously. In other words, to truly understand duality means that you must dissolve it in order to make it your own.

    Justice as process

    Bob’s presentation on justice dealt with the importance of boundaries, and it is here where we have a little disagreement. I agree that boundaries serve to protect and sustain the integrity of cultural traditions (i.e., to ensure their survival), however, according to this framework, there is a single source at the center that transcends all boundaries. I don’t know what exactly that is, but Richard Baxter’s quote below provides a clue: “Special mercy arouses more gratitude than universal mercy.”

    Perhaps you could say the same thing about truth and justice as well. :)