Critiquing a partial evaluation – is a half-full glass better or worse than no drink at all?

Where do you draw the balance?  Should we stop doing small evaluations that only look at a few pieces of data, to avoid the risk of misinterpretation?  Or should we work harder to ensure their findings can be appropriately incorporated with other information?

In a recent post on the George Mason University website , Trevor Butterworth makes a trenchant critique of a recent evaluation of a program to reduce childhood obesity under the pointed title New York Public School’s Whole Milk Swindle:

“Instead of monitoring children’s weights after removing whole milk from school cafeterias and replacing it with low fat or no-fat milk, it just measured the difference in overall calories entering the cafeteria system and concluded that as the children couldn’t eat them, they couldn’t turn them into fat, and therefore they were better off.”

The report (summarized in the CDC Newsletter acknowledges its limitations including

“no data were collected on total food consumption during the school day, so the effect of the milk switch on overall diet is unknown. Students might compensate for the averted calories/fat from milk by changing their consumption patterns.’

So should all evaluations be required to collect data all along the causal chain – inputs (food supplied), activities (students actually consume it), and results in terms of total calorie and fat intake (including outside school) and changes in BMI?

Or does the fault lie in how this has been interpreted?  Some of the reporting refers to a more modest aim for the study, to see if low fat/fat free milk could be introduced without either reducing the amount of milk consumed, or leading to substitution of sweetened, flavored milk instead.  The graph included in the report shows that these more modest aims seem to have been met.



One of the dilemmas of evaluation is balancing the evaluation standards of

  • Accuracy – which would suggest expanding a study to add more of this data
  • Feasibility  – since not every evaluation study can be expanded within resource constraints
  • Utility  – having timely information on this was clearly important to inform policy) and Propriety  – in this case being very clear about the limitations of the study, not just in a formal section, but in the media releases produced to minimize the risk of the results being mis-interpreted).

This data leaves a lot of important questions unanswered.  Did this change occur among students who are overweight or obese?  Did it lead to reductions in their overall daily calorie intake?  Did it contribute to significant and lasting reductions in the percentages of children who are overweight or obese?

It can’t answer these questions. But does it need to?  Is this OK for a research project but not for an evaluation? What do you think?

4 comments to Critiquing a partial evaluation – is a half-full glass better or worse than no drink at all?

  • Jane Davidson

    The general question is a tough one alright. Is something, anything, better than nothing at all? To me, two questions come to mind:

    1. What is this piece of work being presented as? An evaluation of Program X? Or something much more modest?

    2. What sort of value for money has the evaluation itself delivered? How valuable and useful are the findings given (a) what the client really needed and (b) how much they forked out for this piece of work?

    Let’s look at the specific example.

    The primary presenting need for this program was childhood obesity levels. Therefore, at a bare minimum, any evaluation needs to include the outcomes of weight loss/gain and BMI (or similar) of the children in the schools where the milk substitution was taking place (preferably with a comparison or control group, i.e. other schools, but if this isn’t feasible then there are other ways to infer causation). Side effects (e.g. on kids’ behavior, learning, wellbeing) are also important but not covered at all, as far as we know from the report.

    Calorie intake is a very weak substitute for what the evaluators really should have been going after. They appear to try and justify this with reference to [equally weak] goals: “The goal of the milk policy change for NYC public schools was to reduce a key source of dietary calories and fat without reducing the total amount of milk purchased per student …” (from the CDC summary). This “goal” was but one intermediate outcome en route to the greater *purpose* of the program, which was to address the identified need of reducing obesity levels. It’s a nice example of the importance of needs assessment in evaluation for figuring out what outcomes really matter – stated goals are no substitute!

    Just because the program managers/funders/whoever have narrowed their focus to such a level doesn’t mean that the evaluators should now follow suit. They may be delivering what was asked for, but they aren’t delivering what was needed.

    In sum, I think the so-called “evaluation” flunks on #1 – it is presented as an evaluation of the program but hasn’t gone anywhere near even THE most important outcome, let alone any other effects on the kids themselves. And, come on, it’s not THAT hard an outcome to get at.

    It also flunks on #2. It may have delivered what was wanted/asked for, but that wasn’t what was needed (whether the client realized that or not).

    I know that it’s common practice to soften such shortcomings with a lot of caveats about limitations. But for the average reader, this is very fine print and usually missed. The main implicit message was “this thing works”, “this is a good idea”, but the evidence wasn’t truly there to back that claim. And that is not what I would call genuine evaluation.

    I’m very interested in others’ comments on this too.


  • A partial evaluation contributes to progress in a specific area. It’s a breach or opening to explore further perspectives and deepening. An added value of a partial evaluation is the opportunity it gives to critics. In fact, if we don’t have anything (if only a partial evaluation) we don’t have any progress and anything for critic. So, a partial evaluation has to be modest in terms of its statements or conclusions; that is one of the characteristics of a genuine evaluation. And it has to precise the context in terms of purpose and resources (finances and time) available for it. We have to be careful when it comes to use media for dissemination. Because media are looking for sensational facts, and they can easily extrapolate. In total, it’s better to have partial evaluation let’s say a half-full glass, rather than no water at all. It’s up to us to fill in the gap and complete the work started. It’s important to value a partial evaluation because critiquing an evaluation (even partial) is also an e-valu-ation.

  • I agree with ALIDOU and I think others are being too harsh on the evaluation in question. Expecting the change in menu to reduce weight in the children is to expect it to influence an outcome that too many other forces affect. Instead, they looked at a specific goal (reducing the fat and calories available at at school in a typical menu, so to speak); that goal logically can help children if part of a larger effort. This is just a review of program that plays its part in that larger effort.

  • Patricia

    Doug and Alidou have raised important issues. It’s always possible to point at an evaluation and find issues that were not addressed. Doing this is not the point of the blog.

    What we are exploring is what is ‘good enough’ evaluation, along the lines of the research on the ‘good enough’ mother, who provides the type and extent of nurturing required in a feasible, sustainable and effective way.

    In the school milk example, what is clear is that the cautious, limited conclusions of the researchers were not adequately represented in the reporting of their findings. And, while a single evaluation cannot answer all the questions, and the researchers did not have access to data about the impact on total calorie intake and BMI, what is needed is some portfolio of planned evaluations that could put together some or all of this information, even on a small scale, since the overall impact is clearly the information needed.
    Doug’s other point, that total calorie intake and BMI are affected by other factors, opens up another important issue about appropriate ways to evaluate interventions that work through a complicated causal path, heavily influenced by other programs and external factors. This is an issue we will definitely explore on this blog.