Ben Goldacre’s recent paper ‘Building Evidence into Education’ has attracted a good deal of attention and debate (see for example here and here).
In it he argues that more experimental research should be undertaken in education, specifically randomized controlled trials (RCTs). By implication he criticises educational researchers for not doing this already, and indeed at points in the paper he states that some researchers actively resist such developments (though he provides no examples).
Let’s be clear, much of what Goldacre states makes good sense. He reviews the strengths of RCTs; he argues that they should be used more widely in educational research; he notes the complementary strengths of qualitative research; and he argues that much more time and money should be devoted to producing and disseminating high quality research in education – creating an ‘information architecture’ as he puts it.
Why then do I find the paper so frustrating? First, because Goldacre does not recognise or acknowledge the fact that educational researchers have been debating these issues for many years (along with most other social scientists). And secondly because the paper ends, as all these sorts of interventions tend to do, with a disciplinary ‘land grab’ for resources. He concludes ‘We need academics with quantitative research skills from outside academic education departments – economists, demographers, and more, to come in and share their skills…’. Oh yes, the economists, thank goodness for the economists, who have been so successful in modelling and developing our economy recently. Their RCTs have really helped with that.
In his own words, Goldacre’s paper is a ‘call to arms’. He sets up a rhetorical binary between educational research(ers) – ignorant, incompetent, uninterested in what might improve education – and proponents of randomized controlled trials – knowledgeable, skilled, only looking to identify what’s in the best interests of children. While ostensibly criticising politicians (Mr. Gove?) for foisting too many untried and untested schemes on education, he plays to the same trope of positioning educators as the ‘enemies of promise’.
Large parts of the paper draw on examples from Medicine. But given the ostensible focus on Education, might it not have been more useful to look at how these issues have been addressed in educational research over the years? The last time the issue was raised in the UK was probably David Hargreaves’s speech to the TTA in 1996 (Hargreaves 1996). This led (albeit indirectly) to a large programme of research initiated under the Teaching and Learning Research Programme. The programme featured many mixed methods research designs, including some experimental designs (Torrance 2008). It was led by Andrew Pollard (cf. Pollard 2007), one of the National Curriculum expert panel so recently ignored by Mr. Gove.
However debate in the field long predates this most recent manifestation. Campbell and Stanley’s (1963) classic contribution on ‘Experimental and Quasi-Experimental Designs’ reviews the problems and possibilities of developing RCTs in education in far more detail than Goldacre, noting especially “the intransigence of the environment…that is the experimenter’s lack of complete control”. In a text advocating experimental design, they nevertheless review threats to internal and external validity at great length and highlight the difficulties of running RCTs properly and effectively. In turn they acknowledge McCall’s (1923) ‘How to Experiment in Education’ and note that there have been regular periods of RCT advocacy and RCT disillusionment in educational research as the clear cut results that RCTs promise have been unforthcoming.
And here’s the rub. The answers to questions of public policy and educational evaluation are often not very clear (nor indeed are the questions sometimes). More circumspect proponents of experimental methods than Goldacre, acknowledge that in order for a causal relationship to be established, even within the narrow terms of an RCT, very specific questions have to be asked. In a collection of papers produced from a conference specifically convened to promote “Randomized Trials in Education Research”, Judith Gueron (2002) argues that while “random assignment . . . offers unique power in answering the ‘Does it make a difference?’ question” (p. 15), it is also the case that “[t]he key in large-scale projects is to answer a few questions well” (p. 40). In the same volume Thomas Cook and Monique Payne (2002) agree that
most randomized experiments test the influence of only a small subset of potential causes of an outcome, and often only one. . . . even at their most comprehensive, experiments can responsibly test only a modest number of the possible interactions between treatments. So, experiments are best when a causal question involves few variables [and] is sharply focused. (p. 152)
Thus RCTs can be very good at answering very specific questions. What they cannot do is produce the questions in the first place: that depends on much prior, often qualitative, investigation, not to mention value judgments about what is significant in the qualitative data and what is the nature of the problem to be addressed by a particular program intervention. Nor can RCTs provide an explanation of why something has happened. That will depend on much prior investigation and, if possible, parallel qualitative investigation of the phenomenon under study, to inform a developing analysis of what the researchers think may be happening.
Much of Goldacre’s paper is devoted to what RCT’s have achieved in medicine. There is little acknowledgement of the differences between medical and educational research. There is virtually no reference to the long history of RCTs in education (i.e. the actual evidence in this field) and how often they result in ‘no significant difference’ being reported between control and experimental groups, even when problems of design and conduct of RCTs have been overcome (or, perhaps, because they haven’t). Goldacre states that ‘there have been huge numbers of trials in education in other countries, such as the US’ (again, by implication, castigating educational researchers in the UK), but says nothing about the lack of definitive results. In fact recent findings from the United States have been disappointing. Viadero, Education Week, 1 April 2009, reports: ‘Like a steady drip from a leaky faucet, the experimental studies being released this school year by the federal Institute of Education Sciences are mostly producing the same results: “No effects,” “No effects,” “No effects”.
We should not be surprised. It was precisely the confounding problems of diverse implementation and interaction effects that produced so many “no significant difference” results in the 1960s in the context of the first wave of early childhood intervention and curriculum evaluation studies. Reflections on such results prompted the development and use of qualitative methods in evaluation studies in the1970s and 1980s. Of course it might still be argued that it is important to know that something doesn’t work. It can also be argued that this is how knowledge advances in science – especially the natural sciences – the accumulation of many negative results before something significant appears to emerge. But Goldacre’s paper makes no reference to such complications – it simply assumes that RCTs will prove what does work, in a very straightforward manner.
Furthermore, the paper assumes that educational researchers are ignorant of RCTS, but as we have seen, this is not the case. Quite the reverse, educational researchers know all too well the pitfalls as well as the possibilities of RCTs, and are appropriately cautious about what they can achieve. While it might still be argued that undertaking more RCTs will benefit education, it cannot be argued, as Goldacre does in his opening paragraph, that this will provide ‘better evidence about what works best’. RCTs simply don’t provide that level of certainty.
Nor are even positive results easily generalised and disseminated to other contexts. Without a reasonable understanding of why particular outcomes have occurred, along with identifying the range of unintended consequences that will almost inevitably accompany an innovation, it is very difficult to generalize outcomes and implement the innovation with any degree of success elsewhere. A good example of this is provided by California’s attempt to implement smaller class sizes off the back of the apparent success of the Tennessee “STAR” evaluation. The Tennessee experiment compared the effects of smaller class size on student achievement, but worked with a sample of schools. California attempted statewide implementation, producing more problems than they solved by creating teacher shortages, especially in poorer neighbourhoods in the state. There simply weren’t enough well-qualified teachers available to reduce class size statewide, and those that were tended to move to schools in richer neighbourhoods when more jobs in such schools became available (see Grissmer, Subotnik, & Orland, 2009).
RCTs might provide more evidence, different evidence, and, if properly funded and undertaken in the context of parallel, large scale, longitudinal, qualitative studies, ‘better’ evidence of what works and why, for different groups in different contexts. We certainly need more and better research. But ultimately this must be understood as providing a better resource for collaborative decision-making between researchers, teachers, students, parents and local authorities or clusters of schools. It cannot define what ‘works best’. There is no such thing in social action, across time, place and differing circumstances. To pretend otherwise is to assert the primacy of one particular research method over the provision of a wide range of different sorts of evidence to inform debate.
Replacing a system currently at the mercy of political whim, with a system driven by a narrow version of science, isn’t going to improve matters. Let’s produce better evidence by all means, but we have to be appropriately modest about what research can achieve, and research has to develop in tandem with developing better forms of community engagement with our schools.
Prof. Harry Torrance
Campbell D. and Stanley J. (1963) ‘Experimental and Quasi-experimental Designs for Research on Teaching’ in Gage N. (Ed) Handbook of Research on Teaching Houghton Mifflin, Boston.
Cook, T., & Payne, M. (2002) ‘Objecting to the objections to using random assignment in educational research’ in F. Mosteller & R. Boruch (Eds.), Evidence matters: Randomized trials in education research (pp. 150–178). Washington, DC: Brookings Institution Press.
Goldacre B. (2013) Building Evidence into Education Department for Education, London, Available at: https://www.gov.uk/government/news/building-evidence-into-education
Grissmer, D., Subotnik, R., & Orland, M. (2009). A Guide to incorporating multiple methods in randomized controlled trials to assess intervention effects. Available at http://www.apa.org/ed/schools/cpse/activities/mixed-methods.aspx
Gueron, J. (2002) ‘The politics of random assignment: Implementing studies and affecting policy’ in F. Mosteller & R. Boruch (Eds.), Evidence matters: randomized trials in education research (pp. 15–49). Washington, DC: Brookings Institution Press.
Hargreaves, D. (1996). Teaching as a research-based profession. Teacher Training Agency 1996 Annual Lecture. London: Teacher Training Agency.
McCall W. (1923) How to Experiment in Education, New York, MacMillan
Pollard, A. (2007) The UK’s Teaching and Learning Research Programme: findings and significance British Educational Research Journal 33, 5, 639-646
Torrance, H. (2008). Overview of ESRC research in education: A consultancy commissioned by ESRC: Final report. Available at http://www.sfre.ac.uk/uk/
Viadero, D. (2009, April 1). “No effects” studies raising eyebrows. Education Week. Available at http://www.projectcriss.com/newslinks/Research/MPR_EdWk–NoEffectsArticle.pdf