A recent conversation with a colleague has reminded me of how traditional social science training has managed to hardwire our brains into some default thinking that needs to be questioned.
Obviously, there are a lot of places one could go with this as an opening statement, but for now, let’s look at the design of survey questions.
Are we unknowingly being caught in a Likert scale trap?
I mentioned a while ago, in a discussion following a Friday Funny post (Data as “the truth”), that I generally avoid the use of Likert scales because they are evaluatively uninterpretable. Andrew Hawkins asked me why. So, belatedly, here’s why.
Real, genuine evaluation is evaluative. In other words, it doesn’t just report descriptive evidence for others to interpret; it combines this evidence with appropriate definitions of ‘quality’ and ‘value’ and draws conclusions about such things as:
- the quality of program/policy/product/etc design and implementation
- the value and practical significance of outcomes
- whether the entire evaluand was a good (or, the best possible) use of time/money/resources or not
Now, as I’ve mentioned before (in the ‘No Value-Free’ post), many so-called evaluations are what we call ‘value-free’, a.k.a. “evaluations NOT”! They skip this whole evaluative inference step. My view: This is not acceptable. JMHO.
The usual alternative is to take descriptive evidence, often gathered using traditional social science methods, and attempt to interpret it relative to the relevant definitions of quality and value.
This sounds straightforward enough, but it’s actually quite tricky. Over the years, I have worked on some ways of making it easier.
Like, why not build evaluative elements into survey questions themselves?
Building evaluative elements into survey questions
A typical survey/questionnaire might ask questions like:
To what extent do you agree or disagree with the following:
|The course was well organized||1||2||3||4||5|
OK, so what exactly does a mean of 3.8 (for example) mean? Is that well organized? Excellently? Mediocrely?
In some cases, the mean score (and distribution of scores) is provided alongside the mean and s.d. across a range of programs (this is very common with training programs). This does give an inkling of relative merit compared to others. In other words, you can see if you are generally doing better or worse than ‘the rest of the pack’.
But what if we want to know how good something was in some absolute sense? We might be better than average, but is that actually any good?
Most ‘evaluations’ seem to use the so-called Rorschach inkblot approach (a.k.a. the value-free, or “you work it out” approach). Basically, this means presenting descriptive data such as the above (possibly including comparison data) and letting clients and stakeholders draw their own conclusions about how good the findings are.
If we do want to take it that one step further, to say something explicit about the quality or value of something, how do we do that? Invent cut-offs, e.g. saying that 3.5-3.8 is good, 3.8-4.2 is very good, etc? What would be the basis for these?
Or, what if we ditch the agree-disagree response scale and opt for something that has evaluative terms built right in. Like, for example …
How would you rate the following:
|How well the course was organized||1||2||3||4||5|
Now, it has to be said, simply reporting summaries of participants’ ratings is not, by itself, “doing” an evaluation. The evaluator still needs to draw an overall conclusion.
However, by using evaluative terms right in the questionnaire, the participant ratings become a lot easier to interpret in terms of quality or value.
I use item designs like these primarily for process evaluation – getting a handle on the quality of content, design, and implementation. They are far more evaluatively interpretable than the traditional Likert scale-type items.
Later this week, stand by for ideas for outcome/impact evaluation that build causation right into the items.