Building causation into survey items about outcomes

Earlier this week I posted about the uninterpretability of the standard Likert scale that asks people to agree or disagree with a statement. I suggested a more evaluative scale that is more interpretable, particularly for survey items that go after process evaluation.

Now, let’s look at survey items for outcome evaluation and a few ideas for overhauling those.

Over the years, having started from the usual applied social science approach to research design, and finding it frustratingly hard to interpret and use, I have evolved my methods to incorporate some other ideas. Two in particular …

  1. have the response scale reflect something about the value and practical significance of the outcomes
  2. build causal language into the stem

So, to illustrate, let’s take a typical item we might see fishing for outcomes from a training and development program, for example.

To what extent do you agree or disagree with the following:

strongly disagree disagree neither agree nor disagree agree strongly agree
I have applied the knowledge and skills I learned in my job. 1 2 3 4 5

OK, what’s wrong with this survey item? Well, it does tap into transfer of training (application on the job), but isn’t the main point whether the knowledge and skills actually added value on the job? For example, enhancing performance, or some other valuable outcome?

Second, the agree/disagree format is, to me, not that interpretable. I’d rather know something about how substantial the impact was – something that asks more directly about the program’s value added.

So, here are a few ideas from a questionnaire I developed a couple of years ago to evaluate outcomes from a leadership development program. Bear in mind the participant surveys were NOT the only source of evidence here. And, I am not listing ALL the questions asked, just a sample.

How much impact so far has the program had on:

no impact
(as yet)*
slight impact
(so far)
very strong
Your access to useful networks across the sector 1 2 3 4
Your visibility as a candidate for higher level positions 1 2 3 4
Your ability to take on more complex work and/or an even more senior role 1 2 3 4
Your career potential or career success 1 2 3 4

[* Please note that your responses will be analysed taking into account the time you have been in the program, therefore lower levels of impact are expected on several of these outcomes for managers who have entered the program more recently. Such findings are not necessarily a poor reflection on the program.]

To supplement just some of the above, the following open-ended items were also used …

Since completing the program, have you changed job roles or responsibilities at all? If so, would you consider this change a career advancement? To what extent did the program contribute to your getting the new role or responsibilities? How do you know?



What differences (if any) has the program made for your organization? How has it benefitted by sending you on the program?



How would you rate the program overall as a worthwhile use of your time? (circle one letter grade)

Excellent Good Adequate Poor


8 comments to Building causation into survey items about outcomes

  • David Onder

    I really like these suggestions, and they seem to clarify the intent of the question. One issue I would mention, though, is there is no way to know, based on the wording of the questions, whether the impact was positive or negative. You might consider rewording the questions, providing a wider scale with negative responses, or making the response compound with positive/negative as an additional selection.


  • Jane Davidson

    Great point, David, and I now recall we did actually get one case like this, where taking time out for professional development was viewed in the organization as negative, and the manager felt this had held back his career progression. Interesting!

    So, yes, I would be inclined (in hindsight) to add just one response option for negative impact (cases are too rare to warrant a scale of negativity; the open-ended comments fill in the story) – or just dovetail that into “no (or negative) impact” and ask people to comment especially if negative.

    Some of the most interesting stuff came in response to the “have you changed job roles” open-ended question. Managers were actually amazingly clear in most cases about whether the program had contributed to their promotion or not, and could cite very convincing causal evidence. One example was feedback they got from the selection panel about how they’d responded to tough questions, drawing directly on the knowledge they’d gained from the program.

    Thanks again for the comment/suggestion!


  • Sue Williams

    Just a very simple question on “How would you rate the program overall as a worthwhile use of your time?” Why are there two boxes marked D and F for Poor rather than D+ D D- ?

  • Jane Davidson

    Sue, thanks for the question! Basically, just mirroring the usual grading system used in some universities, for example, which goes A, B, C, D, F (in the U.S.). For this particular survey, I actually used E rather than F because it was a NZ program I was evaluating.

    As for why I didn’t put +/- options on the ‘fail’ grades, a couple of reasons.

    First, users/participants, in my experience, tend not to be making such fine distinctions down at that less used end of the scale, so it seemed unwise to increase their cognitive load by asking them to think in a more fine-grained way than they usually did.

    Second, from the perspective of the client, such fine differences at the poor end of the scale are not important – it is that there is anything there at all!

    People do, in my experience, tend to use +/- options up the top end of the scale, particularly for generally good quality programs, so it’s easy for them to answer and the client finds it useful to see the spread as well.

    Naturally, all this is supplemented with responses to an open-ended “why?” item …

    Thanks again!

  • David Earle

    I think this is really good example of exploring what an evaluative rating means for respondents.

    the opening example is a good case of blind use of a scale, without even thinking about whether it is a statement that encourages a range of agreement or disagreement, let alone what that might mean.

    I also like the way you have set out diffent types of impact. Although personally I would go for something other than “impact”. It is bit jargony to me and unclear what it could mean. I would use a wording “difference” – or ideally have questions about both frequency and importance.

    The really interesting analysis would be to see how the overall grades lined up with the responses to the individual questions. My guess would be that there would be quite a lot of variation and some apparent contradictions. And that is where it starts to get interesting :-)

  • david cohen

    I agree with David Onder that there is no way to know, based on the wording of the question related to “How much impact so far has the program had on” whether the impact was positive or negative. You might consider rewording the questions and providing a wider scale

    New suggested question:
    How much benefit so far has the program had on

    Replace the word impact by benefit in the title if the 4 columns

    If there is a reasonable chance of a negative impact, add a first column to the other 4 columns titled: Negative impact (benefit) with a -3 value
    (average of -2, -3 and -4)
    Your open ended question on benefits should help in understanding the nature of the benefits (positive and negative). However it does not provide a clear indication on the degree of the negative impact. If there is a strong chance of negative impacts, you could add on the left 3 additional columns ( slight , noticeable, strong)to the existing 4 columns with the following values: -2, -3, -4

  • Mikkel Møldrup-Lakjer

    Thanks very much for addressing a problem that evaluators are likely to come across. I see that one of your suggestions is to use the research question about causation – basically: Was it the program that caused the observed effect? – directly as a survey question to the respondents.

    What are your thoughts about the ability of respondents to make the causal connection? I am thinking about the “How do you know?” question. What is your experience with this kind of questions?

    Maybe it will help if we give some kind of guidance about the kinds of evidence we are looking for.

    Is it the the counterfactual: “If I had not participated in the program, I would (not) have…”

    Or is it a specific indication of linkage: “The specific abilities I acquired were a requirement for applying to my new job”?


  • Mikkel, thanks so much for your question.

    The “How do you know?” question is absolutely key, and how strong the answer is depends on who you are asking the question to, and how much insight they have. In this case I was asking quite senior managers, and they had some very good insights.

    I just left it open-ended for their explanations, and generally they were talking about linkages and causal pathways, how they knew it had made a difference.

    For another part of the same evaluation I asked the “where would you be now?” question, probing specifically for the counterfactual. I have also used an expert-estimated counterfactual, where I asked leadership development experts to estimate where people would have been now given how they were tracking.

    So, yes, it is possible to probe for a specific kind of evidence, and for the counterfactual I think you do need to ask it specifically. Another project I’ve been involved in has used this with teens and their parents or other significant adults, and they understand and answer it very clearly.

    The “how do you know?” question works quite well without further guidance, although I suspect it depends who your key informants are. But actually, my experience to date has been that participants know more than people sometimes assume, and can quite often tell you quite clearly.

    I must get on with writing my Causation minibook, so I can drop in all these examples!