Testing for Reliability
The principle that scientists (and scholars generally) can build on past results means that past results ought to be replicable. Brownill et al (2016) write:
“This replication by different labs and different researchers enables scientific consensus to emerge because the scientific community becomes more confident that subsequent research examining the same question will not refute the findings.“
And MacMillan (2017) writes in his editorial “Replication Studies”:
“Replication studies are important as they essentially perform a check on work in order to verify the previous findings and to make sure, for example, they are not specific to one set of data or circumstance.“
Increasingly replication is also seen as a way to test for data falsification, on the presumption that unreliable results will not be replicable; but as with most forms of testing, it offers no simple answer.
How does Replication Work?
The ability to replicate results means that those doing the replication need exact information about how the original experiment was carried out. In physics and chemistry this means precise descriptions in lab books and in articles, and the same machines using the same calibration. In the social sciences, it can be much harder to reproduce the exact conditions, since they depend on human reactions and a variable environment. One well-known case comes from a study by Cornell social psychologist Daryl Bem, who did a word recognition test:
“[Bem] published his findings in the Journal of Personality and Social Psychology (JPSP) along with eight other experiments providing evidence for what he refers to as “psi”, or psychic effects. There is, needless to say, no shortage of scientists sceptical about his claims. Three research teams independently tried to replicate the effect Bem had reported and, when they could not, they faced serious obstacles to publishing their results.” (Yong, 2012)
The fact that the other research teams could not replicate the experiment successfully did not suggest to anyone that the data were fake (presumably the students could attest to that), but the failure did cast doubt on the apparent “psychic effects”. Since an exact replication using those Cornell students in that class with all the same social conditions was not possible, the question arises: how close to the original must a replication be to validate an original experiment?
Dennis and Valacich (2014) talk about “three fundamental categories” of replication:
- “Exact Replications: These articles are exact copies of the original article in terms of method and context. All measures, treatments statistical analyses, etc. are identical to those of the original study…
- Methodological Replications: These articles use exactly the same methods as the original study (i.e., measures, treatments, statistics etc.) but are conducted in a different context. …
- Conceptual Replications: These articles test exactly the same research questions or hypotheses, but use different measures, treatments, analyses and/or context….“
Since the Cornell students were not available for the replications, the replications presumably come under the “methodological” category, or perhaps even the “conceptual”. Dennis and Valacich (2014)comment: “Conceptual replications are the strongest form of replication because they ensure that there is nothing idiosyncratic about the wording of items, the execution of treatments, or the culture of the original context that would limit the research conclusions.”
In any case these replication types represent a significant contribution to knowledge by confirming or throwing skepticism on the earlier results. Why then did the research teams have trouble publishing their results?
Most journals do not encourage replications. A study that strikes readers as new and exciting and generates attention is a plus, whereas a study that appears to cover old ground, even if it has scholarly value, is less likely to get through the peer review process. Lucy Goodchild van Hilten (2015) writes:
“Publication bias affects the body of scientific knowledge in different ways, including skewing it towards statistically significant or “positive” results. This means that the results of thousands of experiments that fail to confirm the efficacy of a treatment or vaccine – including the outcomes of clinical trials – fail to see the light of day.“
This may be changing and the degree to which it is true depends in part on the academic discipline. David McMillan (2017) writes:
“Cogent Economics & Finance recognises the importance of replication studies. As an indicator of this importance, we now welcome research papers that focus on replication and whose ultimate acceptance depends on the accuracy and thoroughness of the work rather than seeking a ‘new’ result.“
If other journals follow this trend, there could be significantly more testing of scholarly results. Nonetheless a problem remains. Except for the design time, replicating results costs almost as much as doing the original experiment and if the results are in fact exactly the same, it is unlikely to be published. Some fields solve the problem with a repeat-and-extend approach where replication is tied to new features that explicitly build on the replicated results. Much depends on the culture of the discipline.
For all of its problems, replication remains one of the most effective and reliable tools for uncovering flaws and fake data, and should be used more widely.
Bem (2015) did a further “meta-analysis of 90 [replication] experiments from 33 laboratories in 14 countries …” which he claims supports his hypothesis. He published this meta-analysis in an open-access journal for the life sciences that charges $1000 for an article of this length, and Bem explicitly declared that he had no grant support. If nothing else, this is a sign of how difficult it is to continue the discourse in standard academic venues.
Ms. Melanie Rügenhagen (MA) suggested the topic and assisted with the research. Prof. Dr. Joan Luft, provided research content.
Bem D, Tressoldi PE, Rabeyron T and Duggan M. 2015. “Feeling the Future: A Meta-Analysis of 90 Experiments on the Anomalous Anticipation of Random Future Events.” F1000Research 4:1188. Available online.
Brownill, Sue, Dennis, Alan R., Binny, Samuel, Tan, Barney, Valacich , Joseph and Whitley, Edgar A. 2016. “Replication Research: Opportunities, Experiences and Challenges.” In Thirty Seventh International Conference on Information Systems. Dublin, Ireland. Available online.
Dennis, Alan R, and Joseph S Valacich. 2014. “A Replication Manifesto.” AIS Transactions on Replication Research 1 (1): 1–5.
Goodchild van Hilten, Lucy. 2015. “Why It’s Time to Publish Research ‘Failures.’” Elsevier Connect. Available online.
McMillan, David. 2017. “Replication Studies.” Cogent Economics and Finance, 2017. Available online.
Yong, Ed. 2012. “Replication Studies: Bad Copy.” Nature 485 (7398): 298–300. Available online.