The new funding rules for the US Department of Education’s $650 million Investing in Innovation, published in Newsweek, show it is still using an out-of-date model of evidence-based policy based on a problematic hierarchy of evidence.
The purpose of the funding is described as follows:
Program Description: The Investing in Innovation Fund, established under section 14007 of the American Recovery and Reinvestment Act of 2009 (ARRA), provides funding to support (1) local educational agencies (LEAs), and (2) nonprofit organizations in partnership with (a) one or more LEAs or (b) a consortium of schools. The purpose of this program is to provide competitive grants to applicants with a record of improving student achievement and attainment in order to expand the implementation of, and investment in, innovative practices that are demonstrated to have an impact on improving student achievement or student growth, closing achievement gaps, decreasing dropout rates, increasing high school graduation rates, or increasing college enrollment and completion rates.
These grants will (1) allow eligible entities to expand and develop innovative practices that can serve as models of best practices, (2) allow eligible entities to work in partnership with the private sector and the philanthropic community, and (3) identify and document best practices that can be shared and taken to scale based on demonstrated success.
This already sounds like the “leaky pipeline” model of evidence-based policy, where we find out “what works” and then scale it up:
The largest, or “scale up,” grants—worth up to $50 million each—will require “strong” evidence, such as program evaluations that used random assignment of students.
The second-tier, “validation” grants of up to $30 million each will go to proposals that show “moderate” evidence, such as those that use sophisticated statistical techniques to try to measure the true effects of a program.
The final-tier, “development” grants are wild cards to a degree; they are $5 million awards to proposals that are each based on a “reasonable” hypothesis or theory. The department made one change to the timing of that third tier of grants: No longer will applicants need to get prescreened before submitting their applications; their applications will be due at the same time as all the other i3 proposals.
“The overall design of the competition tries to account for the importance of evidence at each stage of innovation,” Mr. Shelton said.
For each tier, the level of evidence required is an all-or-nothing eligibility requirement; an applicant that doesn’t have the research to back up a proposal for that particular tier should not bother applying.
Let’s look at this carefully -at the top of the hierarchy is experimental design with a randomly assigned control group; then come quasi-experimental designs that statistically create a comparison group, and then come good ideas. There is literally no place for non-experimental causal inference.
What’s wrong with this?
Firstly, at least on the information currently available, there seems to be little attention given to the need for what Rohrbach and colleagues have called “Type II translation”, that is, building knowledge about how to move from an intervention that has been found to efficacious in a controlled setting to one which is effective when scaled up.
Secondly, the funding rules exclude serious impact evaluations of systemic interventions or highly customized practices where control groups or comparison groups are impractical. That could exclude all interventions that are not focused on standardized individualized treatments. To put all these interventions in the third category only is to ignore any credible evidence they might have and judge them only on the basis of “good ideas”.
(As an aside – to demonstrate that we can build credible evidence of the effectiveness of an intervention – consider the public health messages about sleeping positions for babies. In response to the high number of SIDS deaths, and based on some unproven theories, parents were advised to put babies to sleep on their backs, avoid overheating and cigarette smoke. For ethical reasons, the advice was given to all parents, not to a randomly selected treatment group. The incidence of SIDS has fallen in a way that most people are confident to attribute to the changes in parenting practices. A time series design, in conjunction with evidence of actual change in behavior, and some investigation of possible alternative explanations, was sufficient to endorse the intervention).
Thirdly, it is not clear whether the requirements for evidence, which try to find the “real” effect of an intervention adequately address the “causal packages” that are often needed to produce impacts. For example, if a new math innovation only works for certain types of students, or when used by a teacher with particular skills, or in a school where it is complemented by another program, will this be reported and used in informing appropriate scale up?
So what would I want to see in the funding guidelines? I would want to RAISE the bar on the nature and type of evidence needed.
For the top tier, ‘scale up’ grants, I would require that studies have investigated not just the average effect but differential effects for different groups, and have recommendations about when and where the intervention is likely to be effective – AND when the proposals are funded they include serious follow-up and building evidence to support Type II translation.
For the second tier, ‘validation’ grants, I would want to see included in here projects that have credible non-experimental evidence of impact, and require the grants to include in their research designs attention to differential effects and the contribution of contextual factors to impacts. This would include serious studies investigating the practices of effective teachers and seeing if these can be learned by others ( for example Doug Lemov’s taxonomies of effective teaching practices and Deborah Loewenberg Ball’s Practice-Based Theory of Mathematical Knowledge for Teaching, both described in detail in Elizabeth Green’s recent article in the New York Times.
And for the third group, “development grants’, I would want to see built into the grants the requirement that they properly evaluate innovations, drawing on the guidance provided by Burt Perrin on ‘How to – and how not to- evaluate innovation”, including paying careful attention to the successes and the failures and learning from all of them, and seeking to understand when and where and how it works or fails to work, and removing or reducing the powerful incentives to focus on the average effect or overclaim results.
Some further reading on the issues in evaluating complicated and complex interventions can be found in my paper on using program theory and Michael Patton’s forthcoming book on Developmental Evaluation by Guildford Press that will be out in July (there’s a presentation to the Canadian Evaluation Society in 2009 currently available).