<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Genuine Evaluation &#187; Education</title>
	<atom:link href="http://genuineevaluation.com/category/context/education/feed/" rel="self" type="application/rss+xml" />
	<link>http://genuineevaluation.com</link>
	<description>Patricia J Rogers and E Jane Davidson blog about real, genuine, authentic, practical evaluation</description>
	<lastBuildDate>Fri, 03 Feb 2012 19:49:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Pushing sand uphill with a pointy stick? &#8216;No value-free&#8217; in higher ed evaluation</title>
		<link>http://genuineevaluation.com/pushing-sand-uphill-with-a-pointy-stick-no-value-free-in-higher-ed-evaluation/</link>
		<comments>http://genuineevaluation.com/pushing-sand-uphill-with-a-pointy-stick-no-value-free-in-higher-ed-evaluation/#comments</comments>
		<pubDate>Wed, 08 Dec 2010 20:19:56 +0000</pubDate>
		<dc:creator>Jane Davidson</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[Values-based]]></category>
		<category><![CDATA[higher education]]></category>
		<category><![CDATA[self-assessment]]></category>
		<category><![CDATA[value-free]]></category>
		<category><![CDATA[valuephobia]]></category>
		<category><![CDATA[values]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=2236</guid>
		<description><![CDATA[There&#8217;s a unique and extremely challenging barrier to singing the &#8216;no value-free&#8217; parts of the genuine evaluation song in a higher education (a.k.a. tertiary education) setting. And that&#8217;s what Michael Scriven calls the value-free doctrine. Last week I delivered the &#8230; <a href="http://genuineevaluation.com/pushing-sand-uphill-with-a-pointy-stick-no-value-free-in-higher-ed-evaluation/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Fpushing-sand-uphill-with-a-pointy-stick-no-value-free-in-higher-ed-evaluation%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Fpushing-sand-uphill-with-a-pointy-stick-no-value-free-in-higher-ed-evaluation%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>There&#8217;s a unique and extremely challenging barrier to singing the &#8216;no value-free&#8217; parts of the<a href="http://genuineevaluation.com/genuine-evaluation-has-a-jingle-and-a-badge/" target="_blank"> genuine evaluation song</a> in a higher education (a.k.a. tertiary education) setting.</p>
<p>And that&#8217;s what Michael Scriven calls <strong>the value-free doctrine</strong>.</p>
<p>Last week I delivered the <a href="http://realevaluation.com/actionable-self-assessment-evaluation-for-teos-dec-2010/" target="_blank">opening keynote</a> at the<a href="http://www.otagopolytechnic.ac.nz/about/events/self-assessment-for-quality-conference.html"> Self Assessment for Quality conference</a> for tertiary (=higher) education  organizations working to implement New Zealand&#8217;s  new <a href="http://comm.eval.org/EVAL/EVAL/Resources/ViewDocument/Default.aspx?DocumentKey=69f9f626-4a43-4e0b-b65c-2afe6d81e4d0" target="_blank">evaluative approach to quality assurance</a>.</p>
<p>This means conducting self-assessment that asks and answers  questions about the quality of their offerings and services and the  value of their outcomes for learners and other key stakeholders (such as  employers, communities and iwi).</p>
<p>The theme of the conference was<strong> </strong><em>Self-Assessment for Quality: How do you know good when you see it?</em></p>
<p><strong>One of the great challenges of implementing this in academic organizations is that <em>the very idea of doing truly evaluative self-assessment runs counter to the fundamental beliefs and values that form part of the culture of those institutions</em>.</strong></p>
<p>It&#8217;s not just part of the way that most academics think about evaluating programs, courses, and other services or offerings; it&#8217;s a way of thinking and a belief/value system that is embedded deep in their very own research portfolios, across almost all disciplines.</p>
<p>Last month in San Antonio, a participant in my AEA pre-conference workshop said to me, &#8220;WOW, this is completely different from what the professors in my master&#8217;s program told me; they said we should never use words like &#8216;effective&#8217;, &#8216;good&#8217;, or valuable&#8217;.&#8221;</p>
<p><strong>Yep, the value-free doctrine is still alive and well, people!</strong></p>
<p>There is a pervasive belief in traditional Western academia that &#8216;values&#8217; have no place in the analysis of data or evidence. High quality research should be objective, dispassionate, and just present the data &#8211; not interpret it using &#8216;value-laden&#8217; terms like &#8216;excellent&#8217;, &#8216;high quality&#8217;, &#8216;effective&#8217;, or &#8216;ineffective&#8217;.</p>
<p><a href="http://office.microsoft.com/en-us/images/results.aspx?qu=scientist#ai:MP900178908|mt:2|"><img class="alignright" title="scientist" src="http://officeimg.vo.msecnd.net/en-us/images/MH900178908.jpg" alt="" width="227" height="227" /></a>Anyone who asks a value-free &#8216;evaluator&#8217; whether a particular outcome was actually good/valuable/worthwhile is likely to be greeted with snorts of derision. Obviously the person asking has no idea what &#8216;real science&#8217; is all about.</p>
<p>Values should be avoided like the plague &#8211; even with rubber gloves and a mask on!</p>
<p>We present the [descriptive] facts.</p>
<p>You (the decision makers) work out what&#8217;s good and what&#8217;s not &#8211; and we&#8217;ll steer well clear while you do. Don&#8217;t expect any help from us with &#8220;that stuff&#8221;!</p>
<p>As Scriven points out, this <strong>&#8216;valuephobia&#8217;</strong> is actually pretty rich coming from these individuals. These exact same valuephobes then trot back to their desks, where they apply clear definitions of &#8216;quality&#8217; and &#8216;value&#8217; to the evaluation of student work; of faculty teaching, research, and service; of papers submitted for publication &#8230;</p>
<p>Evaluative conclusions are by no means alien to academics, but for some reason, as soon as there&#8217;s anything resembling formal data collection, that inclination goes out the window.</p>
<p>Just what job placement rate, speed of obtaining positions, and quality of positions constitute really excellent employment outcomes for a Bachelor of Nursing degree delivered in a specific economic climate and job market? What would mediocre look like? Or clearly not good enough?</p>
<p><em><strong>Just why are these questions not worth answering?</strong></em></p>
<p>Even the decision not to go there is a value claim about the question itself!</p>
<h4>Related posts and references:</h4>
<ul>
<li><a href="http://realevaluation.com/actionable-self-assessment-evaluation-for-teos-dec-2010/">Actionable self-assessment and evaluation</a> (powerpoint handout of Jane&#8217;s presentation to the Self-Assessment for Quality conference)</li>
<li>Scriven&#8217;s <a href="http://books.google.co.nz/books?id=koL0Fs_ZSvQC" target="_blank">Evaluation Thesaurus</a> &#8211; a must-have classic for every evaluator&#8217;s bookshelf! See our Shelfari bookshelf on the left for what we thought of it</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/pushing-sand-uphill-with-a-pointy-stick-no-value-free-in-higher-ed-evaluation/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Punished for productivity &#8211; poor use of an average in performance evaluation</title>
		<link>http://genuineevaluation.com/punished-for-productivity-poor-use-of-an-average-in-performance-evaluation/</link>
		<comments>http://genuineevaluation.com/punished-for-productivity-poor-use-of-an-average-in-performance-evaluation/#comments</comments>
		<pubDate>Wed, 27 Oct 2010 23:17:59 +0000</pubDate>
		<dc:creator>Patricia Rogers</dc:creator>
				<category><![CDATA[Appropriate inference]]></category>
		<category><![CDATA[Appropriate measurement]]></category>
		<category><![CDATA[Appropriate reporting]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Evaluative rubrics]]></category>
		<category><![CDATA[Government programs]]></category>
		<category><![CDATA[Synthesis of findings]]></category>
		<category><![CDATA[average]]></category>
		<category><![CDATA[goaldisplacement]]></category>
		<category><![CDATA[mean]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[perfofrmanceindicators]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=2057</guid>
		<description><![CDATA[Developing good performance indicators is not easy.  The history of their use is littered with examples of how they can produce a distorted picture of performance and provide dysfunctional incentives.  Burt Perrin&#8217;s report to the OECD (Organization for Economic Co-operation &#8230; <a href="http://genuineevaluation.com/punished-for-productivity-poor-use-of-an-average-in-performance-evaluation/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Fpunished-for-productivity-poor-use-of-an-average-in-performance-evaluation%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Fpunished-for-productivity-poor-use-of-an-average-in-performance-evaluation%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div class="wp-caption alignright" style="width: 306px"><img src="http://static.tvguide.com/MediaBin/Galleries/Editorial/090202/Smartest_Shows_TV/crops/01smartest-shows-30-rock3.jpg" alt="" width="296" height="284" /><p class="wp-caption-text">Jessica Miglio/NBC</p></div>
<p>Developing good performance indicators is not easy.  The history of their use is littered with examples of how they can produce a distorted picture of performance and provide dysfunctional incentives.  Burt Perrin&#8217;s report to the OECD (Organization for Economic Co-operation and Development) <a href="http://www.oecd.org/dataoecd/4/10/2497163.pdf">Implementing the vision &#8211; addressing challenges to results-focused management and budgeting</a> summarizes a number of these and should be required reading for anyone developing performance information systems or results-based management systems.</p>
<p>Even if there is not formal fieldwork pilot testing performance indicators, there should be a &#8216;thought experiment&#8217; to check:</p>
<ol>
<li>Is it possible to score well on the performance indicator &#8211; but to actually be performing poorly?</li>
<li>Is it possibe to score badly on the performance indicator &#8211; but to actually be performing well?</li>
</ol>
<p>A recent example from Australia has shown the perils of not adequately testing  performance indicators.  <a href="http://www.arc.gov.au/era/default.htm">ERA </a>(Excellence in Research for Australia ), a new system  to rate the quality of academic researchers, has rated some highly productive researchers low because it has used an average of all their published work.  A recent <a href="Andrew Cockburn, director of the university's college of medicine, biology and environment, said that when compiling data for the ERA, a high-achieving geneticist's output became a drag on the results of the college. This was because of the proportion of &quot;building block&quot; work published in journals that were not of the highest ranking.  Professor Cockburn said the anomalous result would not have occurred had ERA focused on the academic's best work, rather than requiring all journal articles to be submitted for the period under review.  Start of sidebar. Skip to end of sidebar. Related Coverage      * ERA of changed metrics The Australian, 24 Aug 2010     * Critical thinking The Australian, 20 Apr 2010     * Too much parish pump in ranking of journals The Australian, 16 Feb 2010     * Deans' journal rankings spark debate The Australian, 9 Feb 2010     * Grant the hope of a struggling artist The Australian, 24 Nov 2009  End of sidebar. Return to start of sidebar.  He said the British system, in which academics submitted their best four or six papers, was preferable. &quot;If she had been able to submit six papers published in Nature and Science, they would have delivered a higher ranking.  &quot;Although her work was within my college the most outstanding, in the ERA metric it dragged the average college performance down. It shows the way these metrics are counted can have unfortunate consequences.&quot;">article </a>in The Australian outlined an example from The Australian National University:</p>
<blockquote><p>Andrew Cockburn, director of the[Australian National University's]  college of medicine,  biology and environment, said that when compiling data for the ERA, a  high-achieving geneticist&#8217;s output became a drag on the results of the  college. This was because of the proportion of &#8220;building block&#8221; work  published in journals that were not of the highest ranking. Professor  Cockburn said the anomalous result would not have occurred had ERA  focused on the academic&#8217;s best work, rather than requiring all journal  articles to be submitted for the period under review.</p></blockquote>
<blockquote><p>He said the British system, in which academics submitted their best  four or six papers, was preferable. &#8220;If she had been able to submit six  papers published in Nature and Science, they would have delivered a  higher ranking. &#8220;Although her work was within my college the most  outstanding, in the ERA metric it dragged the average college  performance down. It shows the way these metrics are counted can have unfortunate consequences.&#8221;</p></blockquote>
<p>[Full disclosure: as an Australian academic, I am not just a disinterested observer of this new system.  Like everyone working in evaluation or applied research generally, I  would probably have been better off under the <a href="http://www.dest.gov.au/NR/rdonlyres/AF74E4A9-C7DD-48A4-8D94-847FF35C6B97/7845/RQFPreferredModelPaper.pdf">Research Quality Framework</a> that the previous government had been developing,  which included assessment of the impact of research, not just publication and citation.  On the other hand, under the new system ithe AEA serial <a href="http://www.eval.org/Publications/NDE.asp">New Directions for Evaluation</a>, which previously didn&#8217;t count at all as it did not fit the traditional form of a journal, received the only A* (top) rating for evaluation journals, suddenly making my publication record look a whole lot better &#8211; from zero to hero overnight with no effort on my part!)</p>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/punished-for-productivity-poor-use-of-an-average-in-performance-evaluation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Oxford admissions essay: &#8220;simple, yet devilish&#8221; &#8230; An evaluation aptitude test?</title>
		<link>http://genuineevaluation.com/oxford-admissions-essay-simple-yet-devilish-an-evaluation-aptitude-test/</link>
		<comments>http://genuineevaluation.com/oxford-admissions-essay-simple-yet-devilish-an-evaluation-aptitude-test/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 04:03:18 +0000</pubDate>
		<dc:creator>Jane Davidson</dc:creator>
				<category><![CDATA[Education]]></category>
		<category><![CDATA[Personnel evaluation]]></category>
		<category><![CDATA[predictive validity]]></category>
		<category><![CDATA[student assessment]]></category>
		<category><![CDATA[UK]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=1235</guid>
		<description><![CDATA[Many thanks to Michael Quinn Patton for sending us through this gem (from the New York Times) about a rather interesting essay exam for selecting graduate students into All Souls College in Oxford, England. <a href="http://genuineevaluation.com/oxford-admissions-essay-simple-yet-devilish-an-evaluation-aptitude-test/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Foxford-admissions-essay-simple-yet-devilish-an-evaluation-aptitude-test%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Foxford-admissions-essay-simple-yet-devilish-an-evaluation-aptitude-test%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Many thanks to Michael Quinn Patton for sending us through this gem (from the New York Times) about a rather interesting essay exam for selecting graduate students into All Souls College in Oxford.</p>
<blockquote>
<h4><a href="http://www.nytimes.com/2010/05/28/world/europe/28oxford.html?hp" target="_blank">Oxford Tradition Comes to This: ‘Death’  (Expound)</a></h4>
<p>OXFORD, England — The exam was simple yet devilish, consisting of a  single noun (“water,” for instance, or “bias”) that applicants had three  hours somehow to spin into a coherent essay. An admissions requirement  for All Souls College here, it was meant to test intellectual agility,  but sometimes seemed to test only the ability to sound brilliant while  saying not much of anything.</p>
<p>“An exercise in showmanship to avoid answering the question,” is the way  the historian Robin Briggs describes his essay on “innocence” in 1964 &#8230;</p>
<p>“Brilliant fun,” a past applicant named Matthew Edward Harris wrote in  The Daily Telegraph recently, recalling his 2007 essay, on “harmony.”</p></blockquote>
<p><strong>But did it work?</strong> Apparently not &#8211; and that&#8217;s why the one-word essay test has now been scrapped after being used annually since 1932.</p>
<blockquote><p>“For a number of years, the one-word essay question had not proved to be  a very valuable way of providing insight into the merits of the  candidates,” said Sir John Vickers, the warden, or head, of the college. &#8230;</p>
<p>“Many candidates, including some of the best, seemed at a loss when  confronted with this exercise,” said Mr. Briggs, a longtime teacher of  modern history at Oxford.</p>
<p style="text-align: right;">Click to <a href="http://www.nytimes.com/2010/05/28/world/europe/28oxford.html?hp" target="_blank">view the full NY Times article</a></p>
</blockquote>
<p>There are numerous instances across the world where<strong> ineffective personnel </strong><strong>evaluation </strong><strong>and student assessment practices have persisted despite being poor predictors of future performance</strong>. Another that came across my desk just this morning was a survey of UK personnel selection practices showing that it is still the selection tools with the lowest predictive validity &#8211; such as informal, unstructured interviews and CVs &#8211; that are the most widely used. [Zibarras, L.D., &amp; Woods, S.A. (2010) A survey of UK selection practices across different organization sizes and industry sectors <em>Journal of Occupational and Organizational Psychology</em>, 83, 2, 499-511.]</p>
<p>Most people working in program and policy evaluation will agree that<strong> there are several intangible aptitudes and competencies that a good evaluator needs</strong> but that aren&#8217;t easily captured in the usual selection processes for evaluation graduate programs or for professional positions. So, perhaps we should have a light-hearted test run for the one-word essay exam as applied to our own profession &#8230;</p>
<h4>Today&#8217;s essay question: &#8216;Genuine&#8217; (Expound)</h4>
<p>You may reply in the &#8216;comments&#8217; section, below! Let&#8217;s see if we can get one entry from each continent or geographic region around the world (please state where you are beaming in from).</p>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/oxford-admissions-essay-simple-yet-devilish-an-evaluation-aptitude-test/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What constitutes &#8220;evidence&#8221;? Implications for cutting-edge, tailored treatments, and small sub-populations</title>
		<link>http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/</link>
		<comments>http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/#comments</comments>
		<pubDate>Tue, 18 May 2010 00:31:26 +0000</pubDate>
		<dc:creator>Jane Davidson</dc:creator>
				<category><![CDATA[Causal inference]]></category>
		<category><![CDATA[Community programs]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Government programs]]></category>
		<category><![CDATA[Health]]></category>
		<category><![CDATA[Strategic policy evaluation]]></category>
		<category><![CDATA[BES]]></category>
		<category><![CDATA[cutting-edge initiatives]]></category>
		<category><![CDATA[medicine]]></category>
		<category><![CDATA[RCTs]]></category>
		<category><![CDATA[small sub-populations]]></category>
		<category><![CDATA[tailored initiatives]]></category>
		<category><![CDATA[what works]]></category>
		<category><![CDATA[what works for whom?]]></category>
		<category><![CDATA[WWC]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=919</guid>
		<description><![CDATA[In the medical profession in particular, there are some very rigid beliefs about what constitutes good enough "evidence of effectiveness" to justify offering, recommending, allowing patients to try, or even just not vehemently opposing a particular type of treatment for a patient. 

There are some glimmers of hope in other sectors (e.g. in the Best Evidence Synthesis work here in New Zealand). But there are still three areas where there are very serious challenges in building a credible evidence base given the kinds of constraints and realities surrounding them. They are: (1) cutting-edge treatments;  (2) treatments that are by their very nature tailored/individualized rather than standardized across patients or populations; and (3) learning what works for small sub-populations <a href="http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Fwhat-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Fwhat-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Building on an earlier <a href="http://genuineevaluation.com/long-term-effects-what-to-do-with-them-and-without-them/" target="_blank">discussion Michael Scriven started about long-term effects (what to with them and without them)</a>, I&#8217;m interested in people&#8217;s thoughts on a related issue.</p>
<p>In the medical profession in particular, there are some very rigid beliefs about what constitutes good enough &#8220;evidence of effectiveness&#8221; to justify offering, recommending, allowing patients to try, or even just not vehemently opposing a particular type of treatment for a patient. [There are obviously some parallels in other sectors, such as education, social services, international development, criminal justice, etc, but let's start with some medical examples for now.]</p>
<p>There are some glimmers of hope in other sectors (e.g. in the Best Evidence Synthesis work here in New Zealand). But there are still three areas where there are very serious challenges in building a credible evidence base given the kinds of constraints and realities surrounding them. They are: (1) cutting-edge treatments;  (2) treatments that are <em>by their very nature </em>tailored/individualized rather than standardized across patients or populations; and (3) learning what works for small sub-populations.</p>
<h4>1. Cutting-edge treatments</h4>
<p>Advancements are being made in medical practice all the time, and many of these are initially developed by clinicians (doctors, specialists, surgeons) trying a new approach on a limited number of patients, e.g. when the standard treatments are either not working, or when there&#8217;s a plausible idea about how to improve benefits for patients.</p>
<p>In order for a new idea to be trialled on a larger scale, it must be picked up by individuals with a research/evaluation agenda, rather than just an ongoing medical practice. From there, there&#8217;s a very long and slow process from writing a grant, through getting it funded, conducting the evaluation, writing it  up, then submitting it to a peer-reviewed journal, going through the entire review process, before it is finally published and considered actual &#8220;evidence&#8221;. On top of this, top journals exhibit a strong preference for RCTs over other types of designs.</p>
<p>Harvard professor of anaesthesia, pediatrics, and medical ethics and chief of the Division of Critical Care Medicine at Boston Children&#8217;s Hospital Dr. Robert Truog, in a presentation entitled <a href="http://www.bioethics.nih.gov/slides04/truog.ppt" target="_blank">Ethical Conflicts in Randomized Controlled Trials</a>, lists <strong>eight approaches to learning about what works in medicine</strong>, in ascending order of confidence:</p>
<blockquote>
<ol>
<li>Anecdotal Case Reports</li>
<li>Case Series without Controls</li>
<li>Case Series with Literature Controls</li>
<li>Case Series with Historical Controls</li>
<li>Databases</li>
<li>Case / Control Observational Studies</li>
<li>Randomized Controlled Trials</li>
<li>Meta-analyses</li>
</ol>
</blockquote>
<p>Truog argues that RCTs are not the only way to learn, even in the medical profession: <em>&#8220;Phase I and    II trials, which precede RCTs, often provide strong evidence for    effectiveness.&#8221;</em></p>
<p><strong>When should we think about alternatives to the RCT?</strong> Truog lists four conditions:</p>
<ol>
<li>When therapies are potentially life-saving</li>
<li>When evaluating rapidly developing technologies (improvements in both experimental and control treatments may make the results of an RCT obsolete by the time it is published)</li>
<li>When RCTs are not the most efficient way to acquire knowledge</li>
<li>When the non-randomized data [are] compelling</li>
</ol>
<p>Cutting-edge treatments often provide several of the above conditions, and the reality is that formal RCTs are always going to be way behind the technology. Because of the timeframes involved, the results of RCTs are often &#8220;old news&#8221; by the time they appear in print. In addition, there are often ethical dilemmas in the rigid use of RCTs. As Robert Truog asks &#8230;</p>
<blockquote><p>&#8220;Who  wants to be the last patient enrolled in the control  arm of a positive  randomized controlled trial?&#8221;</p></blockquote>
<p>The same is equally true for a RCT of an educational, community health, international development, or business development intervention.</p>
<h4>2. Tailored/individualized and adaptive treatments</h4>
<p>In the medical and health professions, as in many other arenas, there are certain treatments (or programs/initiatives) that <em>by their very nature</em> must be completely tailored to the individual (or to the community, or to the organization) and/or that must be responsive to changing needs and need to be adapted over time.</p>
<p>One medical example of this is acupuncture and the use of Chinese herbs. Individuals with the same general Western diagnosis (e.g. depression, back pain, infertility), and even with the same basic underlying medical cause for that diagnosis (e.g. endometriosis, polycystic ovaries, diminished ovarian reserve), the Chinese medicine diagnosis of the underlying imbalances may differ substantially. A competent acupuncturist will proceed with a highly individualized treatment based on each person&#8217;s specific (Western and Eastern) diagnosis, will reassess at each session and tweak the treatment accordingly.</p>
<p>Clearly, this individualization and constant tweaking of treatment are at odds with the usual approach to RCTs, which is to standardize treatment and have each practitioner deliver it in exactly the same way. [There are some exceptions to this problem, e.g. <a href="http://infertility-acupuncture.info/infertility-acupuncture/ivf/" target="_blank">some RCTs have been conducted to evaluate specific acupuncture treatments before and after IVF transfer</a>, with statistically and practically significant effects documented. In fertility treatment, this covers just one very specific short-term application, but not the kinds of longer-term treatments that are also commonly used by couples experiencing infertility.]</p>
<p>An additional complication for evaluating acupuncture treatment is that diagnosis requires skilled professional judgment and (given that treatment cannot be simplistically standardized) treatment efficacy is highly dependent on the competence of the practitioner. A large-scale RCT would need to use several practitioners whose competence may vary widely, and this cause of variance could easily wash out effects.</p>
<p>This challenge is not limited to healthcare and medicine. Think about organizational development or community development initiatives. We have all heard countless examples of programs that really only worked amazingly well because of the passion of one or two highly committed people at key locations. Or that needed to be adapted locally to respond to changing needs and aspirations (or because they were initially not well enough understood). If the intervention couldn&#8217;t be standardized across multiple locations, it doesn&#8217;t fit the mold very well for an RCT.</p>
<h4>3. What works for small subpopulations?</h4>
<p>A third major challenge in working out &#8220;what works for whom&#8221; in medicine is that some patient subgroups have very specific combinations of factors that may lend themselves to particular kinds of treatments, but these populations are too small in number to even develop an RCT or any other quantitative design with sufficient statistical power to meet the usual requirements for publication. Or, the &#8220;target audience&#8221; for the findings is considered too narrow.</p>
<p>A good example is looking at the effectiveness of IVF treatment. It&#8217;s very easy to find a substantial sample size of women in their 30s with, say, blocked fallopian tubes or endometriosis &#8211; they often have insurance coverage for infertility or are eligible for publicly funded treatment, so there are plenty trying various IVF protocols (large N) and there is quite good knowledge about what works for them.</p>
<p>But suppose we wanted to understand what works for women over 40, or (even harder) over 42, who have specific diagnoses? First, the numbers are naturally lower for this group because most couples have completed their families by this age. For those still trying, the woman&#8217;s age and/or her specific diagnoses often mean that she is not eligible for insurance coverage or publicly funded treatment. So, there are far fewer trying IVF, and even fewer again for the specific diagnoses that are likely to make one ineligible for insurance or publicly funded treatment.</p>
<p>The reality is that some specific sub-populations will never be large enough in numbers to allow the use of RCTs to learn what works. But at the same time, certain clinicians will refuse to allow the patient to try treatment approaches that have not been supported by what they consider to be &#8220;solid&#8221; clinical trials.</p>
<p>At the same time, there are certain clinicians around the world who are known as top of their fields in dealing with specific types of case (such as women over 40). However, only some of them publish their findings, and often their work is sidelined by mainstream medicine as being &#8220;fringe&#8221; &#8211; and the limited sample sizes and only semi-standardized treatment protocols trigger further snorts of derision about the quality of their &#8220;evidence&#8221;.</p>
<p>The same is again true in education, community health, international development, business, and just about any other field one can name.</p>
<h4>Where does this leave us &#8211; and where to next?</h4>
<p>Right now, in medicine (and to varying degrees elsewhere), it&#8217;s only a small exaggeration to say:</p>
<ul>
<li>If you are seeking a &#8220;tried and true&#8221; (as supported by RCTs, or by other studies published in peer-reviewed journals) approach, you will only have access to &#8220;old&#8221; treatments and initiatives &#8211; and (in the case of RCT evidence) only those that can be completely standardized.</li>
<li>If you&#8217;re after something cutting-edge or that needs to be tailored or adapted mid-stream, you have to pin your hopes on anecdotal evidence (and hope your physician or funder will support you).</li>
<li>If you&#8217;re a member of a relatively large or typical   subgroup, your treatment can be informed by evidence from RCTs and other published studies with a decent sample size.</li>
<li>But if you&#8217;re in   a very small minority sub-population, all we have is &#8220;anecdotal case studies&#8221; and the   whole exercise is basically a crap-shoot.</li>
</ul>
<p>Here in Aotearoa New Zealand, we have seen some <strong>very high quality government-funded work integrating a range of qualitative, quantitative and mixed method evidence about what works in education</strong> &#8211; the <a href="http://www.educationcounts.govt.nz/themes/BES" target="_blank">Iterative Best Evidence Synthesis (BES)</a>. A short quote from the <a href="http://www.educationcounts.govt.nz/__data/assets/pdf_file/0016/6640/BES-Development-Guidelines-27-07-04.pdf" target="_blank">Guidelines for Generating a Best Evidence Synthesis Iteration</a> explains how evidence is selected for inclusion:</p>
<blockquote><p>The [New Zealand] Ministry of Education is using the term ‘best’ within the best evidence synthesis programme to describe a <em>body of evidence</em> that provides credible evidence, and explanations for, influences that have made, and can make a bigger difference to desirable learner outcomes for diverse learners simultaneously. The criterion for selection of evidence for a best evidence synthesis is that the research provides evidence about impacts on learner outcomes. &#8230;</p>
<p>This criterion for selection of evidence means that research from a wide range of methodological designs (including for example, action research studies, case studies, microgenetic studies of classroom processes, ethnographic-outcome focused studies, quasi-experimental research, multiple regression studies, longitudinal studies and experimental research) can make valued contributions to a best evidence synthesis. The point of synthesis is that a cumulative body of research, carefully interrogated, provides more explanatory power than findings from any one research study or design type. (p. 33)</p></blockquote>
<p>This is in stark contrast to the U.S.-based <a href="http://ies.ed.gov/ncee/wwc/references/idocviewer/Doc.aspx?docId=19&amp;tocId=4" target="_blank"> What Works Clearinghouse (WWC) evidence standards</a>:</p>
<blockquote><p>The WWC  reviews each study that passes eligibility screens to determine  whether the  study provides strong evidence (<em>Meets  Evidence  Standards</em>), weaker evidence (<em>Meets  Evidence Standards with  Reservations</em>), or insufficient evidence (<em>Does Not Meet Evidence  Standards</em>) for an  intervention’s effectiveness. Currently, only  well-designed and well-implemented  randomized controlled trials (RCTs)  are considered strong evidence, while  quasi-experimental designs (QEDs)  with equating may only meet standards with  reservations; evidence  standards for regression discontinuity and single-case  designs are  under development.</p></blockquote>
<p>As a humorous side note, Michael   Scriven recently (on EVALTALK) nicknamed the WWC the  &#8220;WWQNC,  standing for   What Works for Quantitative Nerds Clearinghouse  (pronounced  &#8216;WONKS&#8217;)&#8221;.</p>
<p>While it&#8217;s very heartening to see some more enlightened evidence synthesis work such as NZ&#8217;s BES,<strong> I am still not sure we yet have good evidence accumulation and synthesis solutions for:</strong></p>
<ol>
<li> cutting-edge treatments where the technology and thinking is changing  faster than RCTs (or even other large-scale long-term evaluation  designs) can usefully inform</li>
<li>individualized, tailored, and  adapt-as-you-go initiatives</li>
<li>small sub-populations that need to  know what&#8217;s going to work for them</li>
</ol>
<p><strong>Are there ways, in  medicine, to accumulate knowledge directly from  clinicians and  aggregate that to get approximate answers to these &#8220;what  works for whom  and under what conditions&#8221; questions?</strong> [I recently  had a discussion  with a medical academic who insisted it definitely was  NOT possible!]</p>
<p><strong>Are  there ways in which outcome data and other  learnings from localized  small-scale initiatives can be meaningfully  aggregated?</strong> I have been  working on several projects that attempt to  do just this (one in  special education, one in primary school literacy,  one for evaluating a  nationwide strategy designed to help M?ori (NZ  indigenous) students  enjoy education success <em>as M?ori</em>) but would  be interested how  others have gone about the same.</p>
<p>For more on RCTs, see also my short JMDE (2006) editorial: <a href="http://survey.ate.wmich.edu/jmde/index.php/jmde_1/article/view/35/45" target="_blank">The RCTs-Only Doctrine: Brakes on the Acquisition of Knowledge?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/what-constitutes-evidence-implications-for-cutting-edge-tailored-treatments-and-small-sub-populations/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Bad faith survey</title>
		<link>http://genuineevaluation.com/bad-faith-survey/</link>
		<comments>http://genuineevaluation.com/bad-faith-survey/#comments</comments>
		<pubDate>Tue, 11 May 2010 05:16:38 +0000</pubDate>
		<dc:creator>Patricia Rogers</dc:creator>
				<category><![CDATA[Appropriate reporting]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Learning from failure]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[audit]]></category>
		<category><![CDATA[GFC]]></category>
		<category><![CDATA[sampling]]></category>

		<guid isPermaLink="false">http://genuineevaluation.com/?p=1006</guid>
		<description><![CDATA[Over-sampling of particular population strata, and subsequent reweighting of the responses to match the population, might be appropriate sometimes, but not when it involves gathering and then discarding data about a politically contentious and high cost program. 

 <a href="http://genuineevaluation.com/bad-faith-survey/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fgenuineevaluation.com%2Fbad-faith-survey%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fgenuineevaluation.com%2Fbad-faith-survey%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Disconcerting reportsof a study was undertaken for the Australian Auditor-General&#8217;s<a href="http://http/www.anao.gov.au/uploads/documents/2009-10_Audit_Report_33.pdf"> </a><a href="http://www.anao.gov.au/uploads/documents/2009-10_Audit_Report_33.pdf">inquiry into the $A16.2 billion school building program </a>(a major element in the economic stimulus strategy in response to the Global Financial Crisis).</p>
<p>A survey of all school principals was undertaken and then largely discarded &#8211; and a smaller sample survey subsequently undertaken which oversampled private schools that were more supportive of the scheme.  Although the survey company states that the reported results reweighted the responses to reflect the population (no evidence is provided of this, and some commentators, such as <a href="http://http://blogs.news.com.au/heraldsun/andrewbolt/index.php/heraldsun/comments/principals_may_be_angrier_at_gillards_waste_than_you_were_told/">Andrew Bolt </a>in the Herald Sun newspaper seem to have not understood this or chosen to ignore it), there remains the issue of why anyone, particularly a government agency, would ask busy officials to reply to a questionnaire and then ignore their responses except for some listing of &#8220;themes&#8221; in their qualitative responses.</p>
<h2>An odd sampling strategy</h2>
<p><a href="http://www.theage.com.au/national/school-backlash-toned-down-20100507-ujtd.html">The Melbourne Age newspaper</a> reported:</p>
<blockquote><p>The Auditor-General originally asked 7951 primary school principals receiving money for new halls, libraries, gymnasiums or other buildings to volunteer their thoughts about the program last September. His office designed the survey and methodology, but a private company, Orima Research, conducted the research.  &#8230; As many as 3100 primary school principals responded to the survey request about the federal program, 75 per cent of which were public schools. Many of the comments were far from flattering.</p>
<p>But instead of using this raw data alone, Orima contacted more principals over a two to three-week period in October last year to ensure a &#8221;broader representation&#8221; of participating schools rather than the &#8221;passionate outbursts&#8221;.</p>
<p>A select sample of 620 principals was chosen from the augmented list to ensure a statistically robust response rate and representative result. Government schools made up a smaller percentage in this select sample &#8211; just 40 per cent, even though they comprised 71 per cent of all program recipients.</p></blockquote>
<p>A stratified random sample of 922 schools was selected for intensive followup and only these replies were reported numerically.</p>
<blockquote><p><strong>Table A 1</strong></p>
<p><strong>School principal survey response rates by sector, ANAO audit sample versus schools not actively followed up</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="91" valign="top">Sector</td>
<td width="65" valign="top">
<p align="right">Number of schools</p>
</td>
<td width="62" valign="top">
<p align="right">Schools in audit sample</p>
</td>
<td width="80" valign="top">
<p align="right">Completed audit survey</p>
</td>
<td width="72" valign="top">
<p align="right">Response rate</p>
</td>
<td width="62" valign="top">
<p align="right">Schools not in audit sample</p>
</td>
<td width="80" valign="top">
<p align="right">Completed full survey</p>
</td>
<td width="72" valign="top">
<p align="right">Response rate</p>
</td>
</tr>
<tr>
<td width="91" valign="top">Government</td>
<td width="65" valign="top">
<p align="right">5659</p>
</td>
<td width="62" valign="top">
<p align="right">415</p>
</td>
<td width="80" valign="top">
<p align="right">258</p>
</td>
<td width="72" valign="top">
<p align="right">62%</p>
</td>
<td width="62" valign="top">
<p align="right">5244</p>
</td>
<td width="80" valign="top">
<p align="right">2361</p>
</td>
<td width="72" valign="top">
<p align="right">45%</p>
</td>
</tr>
<tr>
<td width="91" valign="top">Catholic</td>
<td width="65" valign="top">
<p align="right">1332</p>
</td>
<td width="62" valign="top">
<p align="right">255</p>
</td>
<td width="80" valign="top">
<p align="right">167</p>
</td>
<td width="72" valign="top">
<p align="right">65%</p>
</td>
<td width="62" valign="top">
<p align="right">1077</p>
</td>
<td width="80" valign="top">
<p align="right">457</p>
</td>
<td width="72" valign="top">
<p align="right">42%</p>
</td>
</tr>
<tr>
<td width="91" valign="top">independent</td>
<td width="65" valign="top">
<p align="right">892</p>
</td>
<td width="62" valign="top">
<p align="right">255</p>
</td>
<td width="80" valign="top">
<p align="right">158</p>
</td>
<td width="72" valign="top">
<p align="right">62%</p>
</td>
<td width="62" valign="top">
<p align="right">637</p>
</td>
<td width="80" valign="top">
<p align="right">338</p>
</td>
<td width="72" valign="top">
<p align="right">53%</p>
</td>
</tr>
<tr>
<td width="91" valign="top">Joint</td>
<td width="65" valign="top">
<p align="right">68</p>
</td>
<td width="62" valign="top">
<p align="right">60</p>
</td>
<td width="80" valign="top">
<p align="right">39</p>
</td>
<td width="72" valign="top">
<p align="right">65%</p>
</td>
<td width="62" valign="top">
<p align="right">8</p>
</td>
<td width="80" valign="top">
<p align="right">3</p>
</td>
<td width="72" valign="top">
<p align="right">38%</p>
</td>
</tr>
<tr>
<td width="91" valign="top">Total</td>
<td width="65" valign="top">
<p align="right">7951</p>
</td>
<td width="62" valign="top">
<p align="right">985</p>
</td>
<td width="80" valign="top">
<p align="right">622</p>
</td>
<td width="72" valign="top">
<p align="right">63%</p>
</td>
<td width="62" valign="top">
<p align="right">6966</p>
</td>
<td width="80" valign="top">
<p align="right">3159</p>
</td>
<td width="72" valign="top"></td>
</tr>
</tbody>
</table>
</blockquote>
<p>These figures get even more interesting when used to calculate the proportion of schools in each sector included in the audit sample:</p>
<blockquote>
<table border="0" cellspacing="0" cellpadding="0" width="367">
<tbody>
<tr>
<td width="103" valign="top">Sector</td>
<td width="64" valign="top">
<p align="right">Number of schools</p>
</td>
<td width="64" valign="top">
<p align="right">Schools in audit sample</p>
</td>
<td width="64" valign="top">
<p align="right">Schools not in audit sample</p>
</td>
<td width="72" valign="top">
<p align="right">% of schools in audit sample</p>
</td>
</tr>
<tr>
<td width="103" valign="top">Government</td>
<td width="64" valign="top">
<p align="right">5659</p>
</td>
<td width="64" valign="top">
<p align="right">415</p>
</td>
<td width="64" valign="top">
<p align="right">5244</p>
</td>
<td width="72" valign="bottom">
<p align="right">7%</p>
</td>
</tr>
<tr>
<td width="103" valign="top">Catholic</td>
<td width="64" valign="top">
<p align="right">1332</p>
</td>
<td width="64" valign="top">
<p align="right">255</p>
</td>
<td width="64" valign="top">
<p align="right">1077</p>
</td>
<td width="72" valign="bottom">
<p align="right">19%</p>
</td>
</tr>
<tr>
<td width="103" valign="top">independent</td>
<td width="64" valign="top">
<p align="right">892</p>
</td>
<td width="64" valign="top">
<p align="right">255</p>
</td>
<td width="64" valign="top">
<p align="right">637</p>
</td>
<td width="72" valign="bottom">
<p align="right">29%</p>
</td>
</tr>
<tr>
<td width="103" valign="top">Joint</td>
<td width="64" valign="top">
<p align="right">68</p>
</td>
<td width="64" valign="top">
<p align="right">60</p>
</td>
<td width="64" valign="top">
<p align="right">8</p>
</td>
<td width="72" valign="bottom">
<p align="right">88%</p>
</td>
</tr>
<tr>
<td width="103" valign="top">Total</td>
<td width="64" valign="top">
<p align="right">7951</p>
</td>
<td width="64" valign="top">
<p align="right">985</p>
</td>
<td width="64" valign="top">
<p align="right">6966</p>
</td>
<td width="72" valign="bottom">
<p align="right">12%</p>
</td>
</tr>
</tbody>
</table>
</blockquote>
<p>Over sampling of small, heterogeneous sub-populations is a valid approach in many circumstances.  In this case the proportions sampled vary considerably, and it is hard to argue that government schools can be considered heterogenous.</p>
<p>This weighting of the responses wasparticularly problematic in view  of the differential pattern of responses:</p>
<blockquote><p>Elsewhere,  the report had noted that a majority of independent schools that had  taken the design, tendering and implementation into their own hands were  more satisfied on questions of value for money. Public schools,  the majority of which accepted a &#8221;cookie cutter&#8221; choice of building,  were far from happy. In fact, 65 per cent of the mainly government  school principals said they could not agree that they got value for  money with the building templates.</p></blockquote>
<p>So the effect of this  additional sampling to add more independent schools could have been to  increase the proportion who were satisfied &#8211; a serious skewing of the  actual situation, although The Age reported that:</p>
<blockquote><p>In an email  response, Orima Research said the sample size of government schools had  been reweighted to be statistically proportional.</p></blockquote>
<h2>Ignoring responses to a survey</h2>
<p>A bigger problem, however, is the proces of collecting data from literally thousands of schools and then not reporting it, although the reported referred to sending the survey to all schools perhaps for cathartic purposes:</p>
<blockquote><p>The survey was sent to all 7951 primary school principals in Australia. This provided an opportunity for every principal to give their views.</p></blockquote>
<p>While it was indeed sent to all 7,951 school principals, and 3,159  replied &#8211; <strong>these replies were barely used</strong>. Replies to the closed  ended questions were not included in the analysis.  The report makes this clear:</p>
<blockquote><p>The responses from these schools to the open?ended  questions in the ANAO survey were used to identify common themes.</p></blockquote>
<p><a href="http://www.abc.net.au/news/stories/2010/05/10/2894452.htm">ABC News</a> reported the comments of a statistical expert:</p>
<blockquote><p>Ray Chambers, a professor at the Centre for Statistical and Survey Methodology, examined the data from the report for ABC Online and said the method was &#8220;unusual&#8221;.</p>
<p>&#8220;It&#8217;s uncommon to ignore data from that many,&#8221; Professor Chambers said. &#8220;You wouldn&#8217;t go to the effort of collecting the information and not use it.  To particularly ignore the information, that is something they might have decided to do if they had been very worried that it showed a bias. Personally I would have thought they should have published the results with caveats &#8211; that would have prevented a lot of these questions about the process.&#8221;</p>
<p>The auditor-general&#8217;s office would not officially comment on the survey, saying the report speaks for itself.</p></blockquote>
<p>Anne Connolly has now done a follow up story, after seeking explanations from the Audit Office about the relative merits of stratified sampling compared to poor response rate surveys of the whole population.  While acknowledging that the census survey might have had more responses from dissatisfied principals, she asks the pointed question <a href="http://www.abc.net.au/news/stories/2010/05/10/2895442.htm?site=thedrum">&#8220;Why weren&#8217;t they heard?&#8221;</a></p>
<p>Asking busy people to complete a survey and then not using their responses is a poor approach to evaluation.  For a government agency to do this is unacceptable.</p>
<blockquote>
<p align="right">
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://genuineevaluation.com/bad-faith-survey/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

