<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Genuine Evaluation &#187; Education</title>
	<atom:link href="http://genuineevaluation.com/tag/education/feed/" rel="self" type="application/rss+xml" />
	<link>http://genuineevaluation.com</link>
	<description>Patricia J Rogers and E Jane Davidson blog about real, genuine, authentic, practical evaluation</description>
	<lastBuildDate>Fri, 03 Feb 2012 19:49:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>What constitutes &#8220;evidence&#8221;? Implications for cutting-edge, tailored treatments, and small sub-populations</title>
		<link>http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/</link>
		<comments>http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/#comments</comments>
		<pubDate>Tue, 18 May 2010 00:31:26 +0000</pubDate>
		<dc:creator>Jane Davidson</dc:creator>
				<category><![CDATA[Causal inference]]></category>
		<category><![CDATA[Community programs]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Government programs]]></category>
		<category><![CDATA[Health]]></category>
		<category><![CDATA[Strategic policy evaluation]]></category>
		<category><![CDATA[BES]]></category>
		<category><![CDATA[cutting-edge initiatives]]></category>
		<category><![CDATA[medicine]]></category>
		<category><![CDATA[RCTs]]></category>
		<category><![CDATA[small sub-populations]]></category>
		<category><![CDATA[tailored initiatives]]></category>
		<category><![CDATA[what works]]></category>
		<category><![CDATA[what works for whom?]]></category>
		<category><![CDATA[WWC]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=919</guid>
		<description><![CDATA[In the medical profession in particular, there are some very rigid beliefs about what constitutes good enough "evidence of effectiveness" to justify offering, recommending, allowing patients to try, or even just not vehemently opposing a particular type of treatment for a patient. 

There are some glimmers of hope in other sectors (e.g. in the Best Evidence Synthesis work here in New Zealand). But there are still three areas where there are very serious challenges in building a credible evidence base given the kinds of constraints and realities surrounding them. They are: (1) cutting-edge treatments;  (2) treatments that are by their very nature tailored/individualized rather than standardized across patients or populations; and (3) learning what works for small sub-populations <a href="http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Fwhat-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Fwhat-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Building on an earlier <a href="http://genuineevaluation.com/long-term-effects-what-to-do-with-them-and-without-them/" target="_blank">discussion Michael Scriven started about long-term effects (what to with them and without them)</a>, I&#8217;m interested in people&#8217;s thoughts on a related issue.</p>
<p>In the medical profession in particular, there are some very rigid beliefs about what constitutes good enough &#8220;evidence of effectiveness&#8221; to justify offering, recommending, allowing patients to try, or even just not vehemently opposing a particular type of treatment for a patient. [There are obviously some parallels in other sectors, such as education, social services, international development, criminal justice, etc, but let's start with some medical examples for now.]</p>
<p>There are some glimmers of hope in other sectors (e.g. in the Best Evidence Synthesis work here in New Zealand). But there are still three areas where there are very serious challenges in building a credible evidence base given the kinds of constraints and realities surrounding them. They are: (1) cutting-edge treatments;  (2) treatments that are <em>by their very nature </em>tailored/individualized rather than standardized across patients or populations; and (3) learning what works for small sub-populations.</p>
<h4>1. Cutting-edge treatments</h4>
<p>Advancements are being made in medical practice all the time, and many of these are initially developed by clinicians (doctors, specialists, surgeons) trying a new approach on a limited number of patients, e.g. when the standard treatments are either not working, or when there&#8217;s a plausible idea about how to improve benefits for patients.</p>
<p>In order for a new idea to be trialled on a larger scale, it must be picked up by individuals with a research/evaluation agenda, rather than just an ongoing medical practice. From there, there&#8217;s a very long and slow process from writing a grant, through getting it funded, conducting the evaluation, writing it  up, then submitting it to a peer-reviewed journal, going through the entire review process, before it is finally published and considered actual &#8220;evidence&#8221;. On top of this, top journals exhibit a strong preference for RCTs over other types of designs.</p>
<p>Harvard professor of anaesthesia, pediatrics, and medical ethics and chief of the Division of Critical Care Medicine at Boston Children&#8217;s Hospital Dr. Robert Truog, in a presentation entitled <a href="http://www.bioethics.nih.gov/slides04/truog.ppt" target="_blank">Ethical Conflicts in Randomized Controlled Trials</a>, lists <strong>eight approaches to learning about what works in medicine</strong>, in ascending order of confidence:</p>
<blockquote>
<ol>
<li>Anecdotal Case Reports</li>
<li>Case Series without Controls</li>
<li>Case Series with Literature Controls</li>
<li>Case Series with Historical Controls</li>
<li>Databases</li>
<li>Case / Control Observational Studies</li>
<li>Randomized Controlled Trials</li>
<li>Meta-analyses</li>
</ol>
</blockquote>
<p>Truog argues that RCTs are not the only way to learn, even in the medical profession: <em>&#8220;Phase I and    II trials, which precede RCTs, often provide strong evidence for    effectiveness.&#8221;</em></p>
<p><strong>When should we think about alternatives to the RCT?</strong> Truog lists four conditions:</p>
<ol>
<li>When therapies are potentially life-saving</li>
<li>When evaluating rapidly developing technologies (improvements in both experimental and control treatments may make the results of an RCT obsolete by the time it is published)</li>
<li>When RCTs are not the most efficient way to acquire knowledge</li>
<li>When the non-randomized data [are] compelling</li>
</ol>
<p>Cutting-edge treatments often provide several of the above conditions, and the reality is that formal RCTs are always going to be way behind the technology. Because of the timeframes involved, the results of RCTs are often &#8220;old news&#8221; by the time they appear in print. In addition, there are often ethical dilemmas in the rigid use of RCTs. As Robert Truog asks &#8230;</p>
<blockquote><p>&#8220;Who  wants to be the last patient enrolled in the control  arm of a positive  randomized controlled trial?&#8221;</p></blockquote>
<p>The same is equally true for a RCT of an educational, community health, international development, or business development intervention.</p>
<h4>2. Tailored/individualized and adaptive treatments</h4>
<p>In the medical and health professions, as in many other arenas, there are certain treatments (or programs/initiatives) that <em>by their very nature</em> must be completely tailored to the individual (or to the community, or to the organization) and/or that must be responsive to changing needs and need to be adapted over time.</p>
<p>One medical example of this is acupuncture and the use of Chinese herbs. Individuals with the same general Western diagnosis (e.g. depression, back pain, infertility), and even with the same basic underlying medical cause for that diagnosis (e.g. endometriosis, polycystic ovaries, diminished ovarian reserve), the Chinese medicine diagnosis of the underlying imbalances may differ substantially. A competent acupuncturist will proceed with a highly individualized treatment based on each person&#8217;s specific (Western and Eastern) diagnosis, will reassess at each session and tweak the treatment accordingly.</p>
<p>Clearly, this individualization and constant tweaking of treatment are at odds with the usual approach to RCTs, which is to standardize treatment and have each practitioner deliver it in exactly the same way. [There are some exceptions to this problem, e.g. <a href="http://infertility-acupuncture.info/infertility-acupuncture/ivf/" target="_blank">some RCTs have been conducted to evaluate specific acupuncture treatments before and after IVF transfer</a>, with statistically and practically significant effects documented. In fertility treatment, this covers just one very specific short-term application, but not the kinds of longer-term treatments that are also commonly used by couples experiencing infertility.]</p>
<p>An additional complication for evaluating acupuncture treatment is that diagnosis requires skilled professional judgment and (given that treatment cannot be simplistically standardized) treatment efficacy is highly dependent on the competence of the practitioner. A large-scale RCT would need to use several practitioners whose competence may vary widely, and this cause of variance could easily wash out effects.</p>
<p>This challenge is not limited to healthcare and medicine. Think about organizational development or community development initiatives. We have all heard countless examples of programs that really only worked amazingly well because of the passion of one or two highly committed people at key locations. Or that needed to be adapted locally to respond to changing needs and aspirations (or because they were initially not well enough understood). If the intervention couldn&#8217;t be standardized across multiple locations, it doesn&#8217;t fit the mold very well for an RCT.</p>
<h4>3. What works for small subpopulations?</h4>
<p>A third major challenge in working out &#8220;what works for whom&#8221; in medicine is that some patient subgroups have very specific combinations of factors that may lend themselves to particular kinds of treatments, but these populations are too small in number to even develop an RCT or any other quantitative design with sufficient statistical power to meet the usual requirements for publication. Or, the &#8220;target audience&#8221; for the findings is considered too narrow.</p>
<p>A good example is looking at the effectiveness of IVF treatment. It&#8217;s very easy to find a substantial sample size of women in their 30s with, say, blocked fallopian tubes or endometriosis &#8211; they often have insurance coverage for infertility or are eligible for publicly funded treatment, so there are plenty trying various IVF protocols (large N) and there is quite good knowledge about what works for them.</p>
<p>But suppose we wanted to understand what works for women over 40, or (even harder) over 42, who have specific diagnoses? First, the numbers are naturally lower for this group because most couples have completed their families by this age. For those still trying, the woman&#8217;s age and/or her specific diagnoses often mean that she is not eligible for insurance coverage or publicly funded treatment. So, there are far fewer trying IVF, and even fewer again for the specific diagnoses that are likely to make one ineligible for insurance or publicly funded treatment.</p>
<p>The reality is that some specific sub-populations will never be large enough in numbers to allow the use of RCTs to learn what works. But at the same time, certain clinicians will refuse to allow the patient to try treatment approaches that have not been supported by what they consider to be &#8220;solid&#8221; clinical trials.</p>
<p>At the same time, there are certain clinicians around the world who are known as top of their fields in dealing with specific types of case (such as women over 40). However, only some of them publish their findings, and often their work is sidelined by mainstream medicine as being &#8220;fringe&#8221; &#8211; and the limited sample sizes and only semi-standardized treatment protocols trigger further snorts of derision about the quality of their &#8220;evidence&#8221;.</p>
<p>The same is again true in education, community health, international development, business, and just about any other field one can name.</p>
<h4>Where does this leave us &#8211; and where to next?</h4>
<p>Right now, in medicine (and to varying degrees elsewhere), it&#8217;s only a small exaggeration to say:</p>
<ul>
<li>If you are seeking a &#8220;tried and true&#8221; (as supported by RCTs, or by other studies published in peer-reviewed journals) approach, you will only have access to &#8220;old&#8221; treatments and initiatives &#8211; and (in the case of RCT evidence) only those that can be completely standardized.</li>
<li>If you&#8217;re after something cutting-edge or that needs to be tailored or adapted mid-stream, you have to pin your hopes on anecdotal evidence (and hope your physician or funder will support you).</li>
<li>If you&#8217;re a member of a relatively large or typical   subgroup, your treatment can be informed by evidence from RCTs and other published studies with a decent sample size.</li>
<li>But if you&#8217;re in   a very small minority sub-population, all we have is &#8220;anecdotal case studies&#8221; and the   whole exercise is basically a crap-shoot.</li>
</ul>
<p>Here in Aotearoa New Zealand, we have seen some <strong>very high quality government-funded work integrating a range of qualitative, quantitative and mixed method evidence about what works in education</strong> &#8211; the <a href="http://www.educationcounts.govt.nz/themes/BES" target="_blank">Iterative Best Evidence Synthesis (BES)</a>. A short quote from the <a href="http://www.educationcounts.govt.nz/__data/assets/pdf_file/0016/6640/BES-Development-Guidelines-27-07-04.pdf" target="_blank">Guidelines for Generating a Best Evidence Synthesis Iteration</a> explains how evidence is selected for inclusion:</p>
<blockquote><p>The [New Zealand] Ministry of Education is using the term ‘best’ within the best evidence synthesis programme to describe a <em>body of evidence</em> that provides credible evidence, and explanations for, influences that have made, and can make a bigger difference to desirable learner outcomes for diverse learners simultaneously. The criterion for selection of evidence for a best evidence synthesis is that the research provides evidence about impacts on learner outcomes. &#8230;</p>
<p>This criterion for selection of evidence means that research from a wide range of methodological designs (including for example, action research studies, case studies, microgenetic studies of classroom processes, ethnographic-outcome focused studies, quasi-experimental research, multiple regression studies, longitudinal studies and experimental research) can make valued contributions to a best evidence synthesis. The point of synthesis is that a cumulative body of research, carefully interrogated, provides more explanatory power than findings from any one research study or design type. (p. 33)</p></blockquote>
<p>This is in stark contrast to the U.S.-based <a href="http://ies.ed.gov/ncee/wwc/references/idocviewer/Doc.aspx?docId=19&amp;tocId=4" target="_blank"> What Works Clearinghouse (WWC) evidence standards</a>:</p>
<blockquote><p>The WWC  reviews each study that passes eligibility screens to determine  whether the  study provides strong evidence (<em>Meets  Evidence  Standards</em>), weaker evidence (<em>Meets  Evidence Standards with  Reservations</em>), or insufficient evidence (<em>Does Not Meet Evidence  Standards</em>) for an  intervention’s effectiveness. Currently, only  well-designed and well-implemented  randomized controlled trials (RCTs)  are considered strong evidence, while  quasi-experimental designs (QEDs)  with equating may only meet standards with  reservations; evidence  standards for regression discontinuity and single-case  designs are  under development.</p></blockquote>
<p>As a humorous side note, Michael   Scriven recently (on EVALTALK) nicknamed the WWC the  &#8220;WWQNC,  standing for   What Works for Quantitative Nerds Clearinghouse  (pronounced  &#8216;WONKS&#8217;)&#8221;.</p>
<p>While it&#8217;s very heartening to see some more enlightened evidence synthesis work such as NZ&#8217;s BES,<strong> I am still not sure we yet have good evidence accumulation and synthesis solutions for:</strong></p>
<ol>
<li> cutting-edge treatments where the technology and thinking is changing  faster than RCTs (or even other large-scale long-term evaluation  designs) can usefully inform</li>
<li>individualized, tailored, and  adapt-as-you-go initiatives</li>
<li>small sub-populations that need to  know what&#8217;s going to work for them</li>
</ol>
<p><strong>Are there ways, in  medicine, to accumulate knowledge directly from  clinicians and  aggregate that to get approximate answers to these &#8220;what  works for whom  and under what conditions&#8221; questions?</strong> [I recently  had a discussion  with a medical academic who insisted it definitely was  NOT possible!]</p>
<p><strong>Are  there ways in which outcome data and other  learnings from localized  small-scale initiatives can be meaningfully  aggregated?</strong> I have been  working on several projects that attempt to  do just this (one in  special education, one in primary school literacy,  one for evaluating a  nationwide strategy designed to help M?ori (NZ  indigenous) students  enjoy education success <em>as M?ori</em>) but would  be interested how  others have gone about the same.</p>
<p>For more on RCTs, see also my short JMDE (2006) editorial: <a href="http://survey.ate.wmich.edu/jmde/index.php/jmde_1/article/view/35/45" target="_blank">The RCTs-Only Doctrine: Brakes on the Acquisition of Knowledge?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Investing In Innovation &#8211; a need to apply what we know about evidence-based policy</title>
		<link>http://genuineevaluation.com/investing-in-innovation-a-need-to-apply-what-we-know-about-evidence-based-polic/</link>
		<comments>http://genuineevaluation.com/investing-in-innovation-a-need-to-apply-what-we-know-about-evidence-based-polic/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 00:09:10 +0000</pubDate>
		<dc:creator>Patricia Rogers</dc:creator>
				<category><![CDATA[Causal inference]]></category>
		<category><![CDATA[Causal inference strategies]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[causal packages]]></category>
		<category><![CDATA[complex]]></category>
		<category><![CDATA[complicated]]></category>
		<category><![CDATA[evidence-based policy]]></category>
		<category><![CDATA[innovation]]></category>
		<category><![CDATA[Investing In Innovation]]></category>
		<category><![CDATA[non-experimental]]></category>
		<category><![CDATA[RCTs]]></category>
		<category><![CDATA[scale-up]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=538</guid>
		<description><![CDATA[The new funding rules for the US Department of Education's $650 million Investing in Innovation appear based on an out-of-date model of evidence-based policy and  hierarchy of evidence.  Recent developments in our understanding of evidence-based policy  would suggest changes are needed to the selection criteria and to how successful proposals will be evaluated. <a href="http://genuineevaluation.com/investing-in-innovation-a-need-to-apply-what-we-know-about-evidence-based-polic/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Finvesting-in-innovation-a-need-to-apply-what-we-know-about-evidence-based-polic%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Finvesting-in-innovation-a-need-to-apply-what-we-know-about-evidence-based-polic%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>The new funding rules for the US Department of Education&#8217;s $650 million <a href="http://www2.ed.gov/programs/innovation/index.html">Investing in Innovation</a>, published in<a href="http://www.edweek.org/ew/articles/2010/03/08/25i3.h29.html?tkn=VVSFF%2FS9zGQlITenjPWrC90PGmGBPnZo1j5p&amp;cmp=clp-edweek"> Newsweek</a>, show it is still using  an out-of-date model of evidence-based policy based on a problematic hierarchy of evidence.</p>
<p>The purpose of the funding is described as follows:</p>
<blockquote><p><strong>Program Description:</strong> The Investing in Innovation Fund, established under section 14007 of the American Recovery and Reinvestment Act of 2009 (ARRA), provides funding to support (1) local educational agencies (LEAs), and (2) nonprofit organizations in partnership with (a) one or more LEAs or (b) a consortium of schools. The purpose of this program is to provide competitive grants to applicants with a record of improving student achievement and attainment in order <strong>to expand the implementation of, and investment in, innovative practices that are demonstrated to have an impact </strong>on improving student achievement or student growth, closing achievement gaps, decreasing dropout rates, increasing high school graduation rates, or increasing college enrollment and completion rates.</p>
<p>These grants will (1) allow eligible entities to expand and develop innovative practices that can serve as models of best practices, (2) allow eligible entities to work in partnership with the private sector and the philanthropic community, and (3) identify and document best practices that can be shared and taken to scale based on demonstrated success.</p></blockquote>
<p>This already sounds like the &#8220;leaky pipeline&#8221; model of evidence-based policy, where we find out &#8220;what works&#8221; and then scale it up:</p>
<blockquote><p>The largest, or “scale up,” grants—worth up to $50 million each—will require “strong” evidence, such as program evaluations that used random assignment of students.</p>
<p>The second-tier, “validation” grants of up to $30 million each will go to proposals that show “moderate” evidence, such as those that use sophisticated statistical techniques to try to measure the true effects of a program.</p>
<p>The final-tier, “development” grants are wild cards to a degree; they are $5 million awards to proposals that are each based on a “reasonable” hypothesis or theory. The department made one change to the timing of that third tier of grants: No longer will applicants need to get prescreened before submitting their applications; their applications will be due at the same time as all the other i3 proposals.</p>
<p>“The overall design of the competition tries to account for the importance of evidence at each stage of innovation,” Mr. Shelton said.</p>
<p>For each tier, the level of evidence required is an all-or-nothing eligibility requirement; an applicant that doesn’t have the research to back up a proposal for that particular tier should not bother applying.</p></blockquote>
<p>Let&#8217;s look at this carefully -at the top of the hierarchy is experimental design with a randomly assigned control group;  then come quasi-experimental designs that statistically create a comparison group, and then come good ideas.  There is literally no place for non-experimental causal inference.</p>
<p>What&#8217;s wrong with this?</p>
<p>Firstly, at least on the information currently available, there seems to be little attention given to the need for what <a href="http://ehp.sagepub.com/cgi/content/abstract/29/3/302">Rohrbach and colleagues</a> have called &#8220;Type II translation&#8221;, that is, building knowledge about how to move from an intervention that has been found to efficacious in a controlled setting to one which is effective when scaled up.</p>
<p>Secondly, the funding rules exclude serious impact evaluations of systemic interventions or highly customized practices where control groups or comparison groups are impractical.  That could exclude all interventions that are not focused on standardized individualized treatments. To put all these interventions in the third category only is to ignore any credible evidence they might have and judge them only on the basis of  &#8220;good ideas&#8221;.</p>
<p>(As an aside &#8211; to demonstrate that we can build credible evidence of the effectiveness of an intervention &#8211; consider the public health messages  about sleeping positions for babies.  In response to the high number of SIDS deaths, and based on some unproven theories, parents were advised to put babies to sleep on their backs, avoid overheating and cigarette smoke.  For ethical reasons, the advice was given to all parents, not to a randomly selected treatment group. The incidence of SIDS has fallen in a way that most people are confident to attribute to the changes in parenting practices.  A time series design, in conjunction with evidence of actual change in behavior, and some investigation of possible alternative explanations, was sufficient to endorse the intervention).</p>
<p>Thirdly, it is not clear  whether the requirements for evidence, which try to find the &#8220;real&#8221; effect of an intervention adequately address the &#8220;causal packages&#8221; that are often needed to produce impacts.  For example, if a new math innovation only works for certain types of students, or when used by a teacher with particular skills, or in a school where it is complemented by another program, will this be reported and used in informing appropriate scale up?</p>
<p>So what would I want to see in the funding guidelines?  I would want to RAISE the bar on the nature and type of evidence needed.</p>
<p>For the top tier, &#8216;scale up&#8217; grants, I would require that studies have investigated not just the average effect but differential effects for different groups, and have recommendations about when and where the intervention is likely to be effective &#8211; AND when the proposals are funded they include serious follow-up and building evidence to support Type II translation.</p>
<p>For the second tier, &#8216;validation&#8217; grants, I would want to see included in here projects that have credible non-experimental evidence of impact, and require the grants to include in their research designs attention to differential effects and the contribution of contextual factors to impacts.  This would include serious studies investigating the practices of effective teachers and seeing if these can be learned by others ( for example <a href="http://uncommonschools.org/usi/aboutUs/taxonomy.php">Doug Lemov&#8217;s taxonomies of effective teaching practices</a> and <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.7284&amp;rep=rep1&amp;type=pdf#page=13">Deborah Loewenberg Ball&#8217;s Practice-Based Theory of Mathematical Knowledge for Teaching</a>, both described in detail in <a href="http://www.nytimes.com/2010/03/07/magazine/07Teachers-t.html">Elizabeth Green&#8217;s recent article in the New York Times</a>.</p>
<p>And for the third group, &#8220;development grants&#8217;, I would want to see built into the grants the requirement that they properly evaluate innovations, drawing on the guidance provided by <a href="http://mande.co.uk/docs/perrin.htm">Burt Perrin</a> on &#8216;<a href="http://evi.sagepub.com/cgi/content/abstract/8/1/13">How to &#8211; and how not to- evaluate innovation&#8221;</a>, including paying careful attention to the successes and the failures and learning from all of them, and seeking to understand when and where and how it works or fails to work, and removing or reducing the powerful incentives to focus on the average effect or overclaim results.</p>
<p><a href="http://www.guilford.com/cgi-bin/cartscript.cgi?page=pr/patton.htm&amp;dir=research/res_eval&amp;cart_id=646718.8067"><img class="alignright" title="mqp" src="http://ecx.images-amazon.com/images/I/51ip7%2BxQztL._SL500_AA300_.jpg" alt="" width="300" height="300" /></a>Some further reading on the issues in evaluating complicated and complex interventions can be found in <a href="http://evi.sagepub.com/cgi/content/abstract/14/1/29">my paper on using program theory</a> and Michael Patton&#8217;s forthcoming book on <a href="http://http://www.guilford.com/cgi-bin/cartscript.cgi?page=pr/patton.htm&amp;dir=research/res_eval&amp;cart_id=452493.3180">Developmental Evaluation</a> by Guildford Press that will be out in July  (there&#8217;s <a href="http://cjpe.ca/distribution/20090601_quinn_patton_michael_a.pdf">a presentation to the Canadian Evaluation Society</a> in 2009 currently available).</p>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/investing-in-innovation-a-need-to-apply-what-we-know-about-evidence-based-polic/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The media and evaluation reporting &#8211; clueless or unscrupulous?</title>
		<link>http://genuineevaluation.com/the-media-and-evaluation-reporting-clueless-or-unscrupulous/</link>
		<comments>http://genuineevaluation.com/the-media-and-evaluation-reporting-clueless-or-unscrupulous/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 23:04:53 +0000</pubDate>
		<dc:creator>Jane Davidson</dc:creator>
				<category><![CDATA[Appropriate reporting]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[accuracy]]></category>
		<category><![CDATA[critical thinking]]></category>
		<category><![CDATA[evaluation concepts]]></category>
		<category><![CDATA[evaluation reporting]]></category>
		<category><![CDATA[grading vs. ranking]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[NZ National Standards]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=128</guid>
		<description><![CDATA[Most lay people can grasp the difference between grading/rating and ranking, so what's wrong with the media? Following on from Patricia Rogers' recent posts about the misreporting of evaluation findings, this post looks at an example from the New Zealand media (reporting on the new National Standards for literacy and numeracy) of leading the public astray with a complete lack of understanding of this very fundamental evaluation concept. Jane also ponders the reasons why the mainstream media in particular gets this kind of thing wrong so often ...  <a href="http://genuineevaluation.com/the-media-and-evaluation-reporting-clueless-or-unscrupulous/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Fthe-media-and-evaluation-reporting-clueless-or-unscrupulous%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Fthe-media-and-evaluation-reporting-clueless-or-unscrupulous%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Patricia Rogers, in her recent post entitled <a href="http://genuineevaluation.com/does-the-recent-evaluation-show-that-head-start-doesnt-work/" target="_blank">Does the recent evaluation show that Head Start doesn&#8217;t work?</a> asks the question:</p>
<blockquote><p>Why do these reports  summarize the findings inaccurately? Is it deliberate misrespresentation or error?  Is it just too hard to include variations in the results in a brief summary, or are the reporters not sufficiently skilled? Or do the reporters judge that the results are not broad enough across the domains, or  not large enough?</p></blockquote>
<p>I&#8217;d like to add to this question/discussion with a few observations about the media and its reporting of not just evaluation findings, but even just simple evaluative concepts.</p>
<p>Here in New Zealand, the government has moved to introduce National Standards for literacy and numeracy in primary schools. These are basically descriptions of what children should be able do/read/write/understand in order to be able to adequately access the rest of the curriculum at that level. Teachers are to use a range of appropriate assessment tools (as they presumably do already) to gauge where each child is at, and &#8211; this is the new piece &#8211; are required to clearly report to parents/families whether their child is performing at, above, below, or well below the National Standards in literacy and numeracy.</p>
<p>Just last week, New Zealand&#8217;s most watched news program (TVOne 6pm news) ran an item that began by telling us how terribly confused parents were about the new National Standards &#8211; implying, of course, that they were going enlighten everyone and help them understand the basics. Right &#8230;take a look at this 3-minute news item: <a href="http://tvnz.co.nz/national-news/controversial-new-national-standards-introduced-3347374/video.xhtml" target="_blank">Controversial new national standards introduced</a>.</p>
<p>The National Standards are a <strong>criterion-referenced</strong> approach, i.e., performance (in literacy and numeracy) is evaluated against &#8211; as the title implies -<strong> &#8220;standards&#8221;</strong> (i.e., where  the child NEEDS to be).</p>
<p>How does TVOne news explain them? As a<strong> norm-referenced</strong> approach. The reporter states that the Standards will tell parents whether their child is at, above, or below the <strong>&#8220;national average&#8221;</strong> (i.e., where the child is relative to other peers). WRONG!!</p>
<p>The same news program has repeatedly said that children are going to be <em>&#8220;ranked&#8221;</em><em> </em>- illustrating exactly the same total lack of understanding a of very very fundamental evaluation concept.</p>
<p><strong>Most lay people can grasp the difference between grading/rating and ranking, so what&#8217;s wrong with the media?</strong></p>
<p>My own observations lead me to believe that reporters and those controlling the quality of mainstream news media reports are, for the most part, woefully lacking in critical thinking ability, have a completely inadequate grasp of even the most basic evaluation concepts, and seem to have no sense of responsibility about getting a sound understanding before going to air and &#8216;educating&#8217; the public.</p>
<p>The reporting of actual evaluation findings is a bit more complicated, as Patricia explains in <a href="http://genuineevaluation.com/does-the-recent-evaluation-show-that-head-start-doesnt-work/">her post</a>. My hunch is that much of the blatantly inaccurate reporting we see is a combination of:</p>
<ul>
<li>ignorance of key evaluative concepts,</li>
<li>a lack of critical thinking ability,</li>
<li>laziness (e.g. just reporting what others have said instead of actually reading the original sources),</li>
<li>cherry-picking the newsworthy (read: ratings-boosting) snippets at the expense of a more complete representation, and</li>
<li>pressures to dumb down the news into tabloid-sized snippets that the public don&#8217;t have to work too hard to take in.</li>
</ul>
<p>Of course, I&#8217;m mostly referring to TV here; the print media often do a much better job (and thank goodness someone does), but IMHO the bulk of it is still well short of good enough, <em>genuine</em> reporting. And, of course, I&#8217;m not watching media from all over the world (mostly New Zealand, but we get some Australian, U.S. and English channels here) &#8211; how do your local media outlets compare?</p>
<p>I suppose the big question for me is, what can we DO about this? Here are a few ideas that have crossed my mind, but I&#8217;d very much like to hear others&#8217; views:</p>
<ul>
<li>Teach critical thinking in schools and in higher education.</li>
<li>Teach some basic evaluative thinking too &#8211; it&#8217;s a life skill; it can save you a fortunate as you go through life making decisions about what to purchase, invest in, etc</li>
<li>Help evaluators turn the nub of their findings into easy-to-grasp sound bites that the media could pick up and use as they are &#8211; maybe all evaluation reports should come with a set of press releases, some very short, some a little longer?</li>
<li>As citizens and consumers of the news, keep criticizing the dumbed down news that is fed to the masses.</li>
<li>Perhaps professional evaluation associations should consider commenting publicly on the misrepresentation of evaluative concepts and evaluation findings.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/the-media-and-evaluation-reporting-clueless-or-unscrupulous/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Does the recent evaluation show that Head Start doesn&#8217;t work?</title>
		<link>http://genuineevaluation.com/does-the-recent-evaluation-show-that-head-start-doesnt-work/</link>
		<comments>http://genuineevaluation.com/does-the-recent-evaluation-show-that-head-start-doesnt-work/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 11:46:50 +0000</pubDate>
		<dc:creator>Patricia Rogers</dc:creator>
				<category><![CDATA[Appropriate reporting]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[enduring results]]></category>
		<category><![CDATA[HeadStart]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=123</guid>
		<description><![CDATA[Another Head Start evaluation, another controversy about whether the results show it works or not.  In her comment on our post on the NY School Milk Study Susan Wolf drew our attention to some important differences between the recent evaluation &#8230; <a href="http://genuineevaluation.com/does-the-recent-evaluation-show-that-head-start-doesnt-work/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Fdoes-the-recent-evaluation-show-that-head-start-doesnt-work%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Fdoes-the-recent-evaluation-show-that-head-start-doesnt-work%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Another Head Start evaluation, another controversy about whether the results show it works or not.  In her comment on our post on the <a href="http://genuineevaluation.com/misreporting-evaluation-findings-example-1/comment-page-1/#comment-15">NY School Milk Study</a> Susan Wolf drew our attention to some important differences between the recent evaluation report on Head Start, and how it was represented in an email from the Brookings Institute.</p>
<p>The January 21 2010 post on the Brookings Institute <a href="http://www.brookings.edu/opinions/2010/0121_head_start_whitehurst.aspx">UpFront blog</a> summarized the findings succinctly:</p>
<blockquote><p>The study demonstrated that children’s attendance in Head Start has no demonstrable impact on their academic, socio-emotional, or health status at the end of first grade. That’s right. If you were a mother who lost the lottery, couldn’t get your child into Head Start, and had to care for her at home, she was no worse off at the end of first grade than she would have been had she gotten into Head Start.</p></blockquote>
<p>The <a href="http://www.acf.hhs.gov/programs/opre/hs/impact_study/reports/impact_study/executive_summary_final.pdf">executive summary of the evaluation report</a> , while acknowledging that many of the  results that were achieved during children&#8217;s participation in the program were not evident by the end of Grade 1, identified the following results:</p>
<blockquote><p>Cognitive Outcomes: Head Start group children did significantly better on the PPVT (a vocabulary measure) for 4-year-olds and on the Woodcock-Johnson III test of Oral Comprehension for the 3-year-olds.</p>
<p>Social-Emotional Outcomes. By the end of 1st grade, there was some evidence that the 3-year-old cohort had closer and more positive relationships with their parents.</p>
<p>Health Outcomes. For the 4-year-old cohort, there was an impact on child health insurance coverage at the end of kindergarten and 1st grade, and an impact on child health status in kindergarten.</p>
<p>Parenting Outcomes. For the 3-year-old cohort, there were positive favorable impacts on use of time-out and authoritarian parenting at the end of 1st grade and on spanking and time out in kindergarten.</p></blockquote>
<p>The Brookings summary has been widely picked up, including:</p>
<p><a href="http://blog.heritage.org/2010/01/15/head-start-a-150-billion-failure/">The Foundry</a>, the blog of the Heritage Foundation which stated:</p>
<blockquote><p>Unfortunately, a new (<a href="http://www.foxnews.com/opinion/2009/12/29/dan-lips-heritage-preschool-head-start-politics/">long overdue</a>) report published by the Department of Health and Human found that the $150 billion that taxpayers have “invested” in Head Start since 1965 is yielding zero lasting benefits for participating children. According to the <a href="http://www.acf.hhs.gov/programs/opre/hs/impact_study/reports/impact_study/executive_summary_final.pdf">Head Start Impact Study</a>: “the benefits of access to Head Start at age four are largely absent by 1st grade for the program population as a whole.” The Heritage Foundation reviews the findings of the new evaluation in a <a href="http://blog.heritage.org/wp-content/uploads/head-start-paper.pdf">forthcoming Backgrounder report</a> concluding: “Head Start has little to no effect on cognitive, socio-emotional, health, and parenting outcomes of children participating in the program.”</p></blockquote>
<p>Why do these reports  summarize the findings inaccurately? Is it deliberate misrespresentation or error?  Is it just too hard to include variations in the results in a brief summary, or are the reporters not sufficiently skilled? Or do the reporters judge that the results are not broad enough across the domains, or  not large enough?  If the other gains are not sustained into Grade 1, does this reflect inadequacies of Head Start or of Grade 1?  Does Head Start need to be a &#8216;silver bullet&#8217; to be successful?</p>
<p>Given the long history of controversial Head Start evaluations, I&#8217;m sure we&#8217;ll be hearing a lot more about this evaluation.  Including paying attention to differential results as well, since the evaluation clearly identified groups for whom Head Start worked particularly well, and those for whom it appeared to be harmful.</p>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/does-the-recent-evaluation-show-that-head-start-doesnt-work/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

