An Introduction to the Column
By Michael Seadle, published on 11 April 2018
What is Information Integrity?
Information integrity is fundamentally about what makes information true or false, both at the scholarly level (research integrity) and for public and policy discourse. There are reports about false information almost daily. A recent example involves the BBC, which has long been a model for the integrity of its reporting. (Sweney, 2018) This column will focus mainly on the scholarly aspects of information integrity, but the effect of integrity problems on policy matters (public health issues, for example) will not be ignored.
The topic includes a broad range of problems, including data falsification, image manipulation, and plagiarism. While plagiarism is perhaps the most prominent issue, it is primarily an ethical and legal issue and generally does not undermine scholarship that builds on it because the results are not necessarily false. This column will discuss all aspects of information integrity, but will focus especially on data problems, since no generalized detection tools exist, though a few disciplines (such as psychology) are working on them.
A core concept in my book on “Quantifying Research Integrity” (Seadle, 2017) is the greyscale approach: integrity issues rarely separate neatly into simple black and white, guilty or innocent, categories. Many scholarly works have imperfections, and problematic works may still contain valid information. From the viewpoint of a university or a publisher, formal decision-making processes involving punishments and retractions may make black-and-white decisions about integrity problems preferable, but such black-and-white decisions can themselves be an integrity issue, since an overly simplistic label is at least partly untrue.
Scholarly literature contains a wealth of examples of integrity problems going well back in historical time. Today there are tools for investigating plagiarism and for examining some kinds of image manipulation. Data falsification presents more of a challenge because of its variety and complexity. Simple cases such as that of Diederik Stapel, who admitted manufacturing his results, are rarer than scholars who make poor choices about data or its interpretation. (Bhattacharjee, 2013) Unintentional error is also an information problem, even if it is not falsification.
Selecting problematic research may have lasting effects on political discourse as well as on scholarship. While the evidence for climate change appears to be overwhelming, studies by a small number of skeptics have given oil and coal lobbies in the US a tool for opposing effective measures to reduce hydrocarbons in the atmosphere. Natural science builds on the ability to reproduce results, and when many scientists produce the same results based on a wide range of measures, the conclusions are normally accepted as valid. Lay persons unfamiliar with the scholarly literature sometimes select flawed studies that confirm their own personal preferences.
Other more historical examples of selection bias can be found in claims about the inferiority of people in the US who were not of northern European descent — not merely those from Africa, but also from Italy, Ireland, and eastern Europe. Such claims were popular among the right wing in many European countries in the Nazi era, and are still popular among some groups today. A basis for them reaches back to Christoph Meiners (Grundriß der Geschichte der Menschheit, 1785) in the 18th century and is as modern as “The Bell Curve” by Richard Herrnstein and Charles Murray (1994). These studies did not fake their data and used scientific methods that seemed appropriate at the time, but they were selective about what evidence they included, and today it is widely accepted that the exclusions skewed results in a particular direction.
Selection bias may have social and cultural origins that can change over time. For those who believe in the inerrancy of Holy Scripture, the data confirming evolution is invalid. A scholar of research integrity needs in some sense to be an historian, in order to understand the research in time and place, and to be an ethnographer, in order to understand integrity violations across cultures and disciplines. No one should imagine that integrity research involves simple labels.
The Research Integrity Literature
This column will focus on discussing papers about research integrity and will look at specific cases, whose complexity gives opportunities to apply a greyscale analysis. There are many good sources of information, not the least of which is Retraction Watch (Oransky, 2018), which provides an excellent news feed and classifies cases of retractions by type and field. Retractions may represent only part of the problem, simply because discovering problems is hard and because false positives may distract from more important issues. The ability to reproduce results is a classic hallmark of good science, but there is good evidence that results in behavioral and social science studies are harder to reproduce than natural-science results for the simple reason that social circumstances change.
The goal of this column is scholarly, not investigative. It does not actively seek out new cases where research integrity may have been violated, but seeks to examine existing cases in order to apply a greyscale understanding of what happened and what the consequences are. As Principal Investigator for the research integrity part of the HEADT Centre, I will be the primary columnist, but others will likely contribute as well, including Dr. Thorsten Beck, who specializes in image manipulation.
Bhattacharjee, Yuduit. 2013. “The Mind of a Con Man.” New York Times, April 26, 2013. Available online.
Seadle, Michael. 2017. Quantifying Research Integrity. Morgan Claypool: Synthesis Lectures on Information Concepts, Retrieval, and Services. Available online.
Sweney, Mark. 2018. “No Title.” New York Times, April 4, 2018. Available online.
Oransky, Ivan, and Adam Marcus. 2018. “Retraction Watch.” 2018. Available online.
Honest Error: a Look at the Literature
By Michael Seadle, published on 19 April 2018
Problems with data are arguably the most serious issue for information integrity in the research world, because they undermine the ability of scholars to build on past results. These problems come in many variations, including people who make up fake data, people who manipulate data to get specific results, and people who leave out data or sources. Each of these represent some form of misconduct when done deliberately. Nonetheless not everyone is guilty of malicious intent. Ordinary negligence plays a role too. The results remain unreliable and irreproducible, but the persons involved may be innocent of intentional wrongdoing. This column looks at the scholarly literature on “honest” errors.
Resnik (2012) explains that recognizing honest error is important but hard:
“It is important to distinguish between misconduct and honest error or a difference of scientific opinion to prevent unnecessary and time-consuming misconduct proceedings, protect scientists from harm, and avoid deterring researchers from using novel methods or proposing controversial hypotheses. … the line between misconduct and honest error or a scientific dispute is often unclear,”
Precisely what constitutes honest error may depend on personal judgment. An older study by Nath (2006) in the Medical Journal of Australia looked at ” [a]ll retractions of English language publications indexed in MEDLINE between 1982 and 2002…” and “[t]wo reviewers categorised the reasons for retraction of each article…”. Nath concluded that:
“Of the 395 articles retracted between 1982 and 2002, 107 (27.1%) were retracted because of scientific misconduct, 244 (61.8%) because of unintentional errors, and 44 (11.1%) could not be categorised.”
The percentage of unintentional errors suggests surprisingly high rate of unintentional error. While it is possible that misconduct has increased significantly over time (see below for more recent numbers), the more likely lesson here is that it matters how the classification is made. It is hard to know how accurate the classifications of misconduct are under circumstances where the assumption of innocence is not always strictly observed after an accusation has been made.
Estimates of Size
Later studies do not confirm the Nath estimate about the number of unintentional errors. An article by Arturo Casadevall (2014) argues that
“Analysis of the retraction notices for 423 articles indexed in PubMed revealed that the most common causes of error-related retraction are laboratory errors, analytical errors, and irreproducible results. … The database used for this study includes 2047 English language articles identified as retracted articles in PubMed as of May 3, 2012…“
This suggests that the cause of just under 12% of the PubMed retractions are essentially ordinary human error. A different study by Moylan and Kowalczuk (2016) looks at the BioMed Central journals finds a similar percentage:
“Honest error accounted for 17 retractions (13%) of which 10 articles (7%) were published in error. … A total of 13 articles (10%) of retractions were due to problems with the data. Often these issues occurred through honest error in how the data were handled, for example … although in some cases it is difficult to determine whether honest error or misconduct was the cause. “
Daniele Fanelli (2016) offers a somewhat higher percentage of honest error:
“However, retractions reliably ascribed to honest error account for less than 20% of the total, and are often a source of dispute among authors and a legal headache for journal editors. The recalcitrance of scientists asked to retract work is not surprising. Even when they are honest and proactive, they have much to lose: a paper, their time and perhaps their reputation. Much reluctance to retract errors would be avoided if we could easily distinguish between ‘good’ and ‘bad’ retractions.
In this case good retractions are generally ones where the authors recognize their own mistake and ask for the paper to be withdrawn. Fanelli (2016) makes the further argument that:
“Self-retractions should be considered legitimate publications that scientists would treat as evidence of integrity. Self-retractions from prestigious journals would be valued more highly, because they imply that a higher sacrifice was paid for the common good.“
This could, as he notes, be open to abuse, but some abuse could well be tolerable in the interests of providing an incentive for researchers to withdraw misleading results so that they do not mislead other scholars. Considering present publication pressure and the effect of public opinion, researchers may be unwilling to admit honest errors because they will be thought guilty of misconduct. It may be hard to escape censure regardless of the choice.
One of the measurements that can help define honest error is the degree to which errors confirm the desired conclusions. This is not to say that every error in favor of the authors’ arguments is dishonest, but errors that weaken the conclusion are more likely unintentional. There is of course a human tendency to believe confirming results and to doubt disruptive ones, and a part of research training that may need more emphasis is a healthy skepticism toward desired results. Another form of measurement has to do with the frequency of error. Everyone makes some errors. When authors repeatedly make errors, it may be reasonable to think that the errors follow a standard distribution where some are for and some against the conclusions. A pattern that is consistently in favour of the desired conclusion may imply more bias than honesty.
Those judging integrity should not forget that honest errors exist, and that people under career or social pressure may be more error prone without particular ill intent.
Casadevall, Arturo, R. Grant Steen, and Ferric C. Fang. 2014. “Sources of Error in the Retracted Scientific Literature.” FASEB Journal 28 (9): 3847–55. Available online.
Fanelli, Daniele. 2016. “Set up a ‘self-Retraction’ System for Honest Errors.” Nature. Available online.
Moylan, Elizabeth C., and Maria K. Kowalczuk. 2016. “Why Articles Are Retracted: A Retrospective Cross-Sectional Study of Retraction Notices at BioMed Central.” BMJ Open 6 (11). Available online.
Nath, Sara B., Steven C. Marcus, and Benjamin G. Druss. 2006. “Retractions in the Research Literature: Misconduct or Mistakes?” Medical Journal of Australia. Available online.
Resnik, David B., and C. Neal Stewart. 2012. “Misconduct versus Honest Error and Scientific Disagreement.” Accountability in Research. Available online.
Data Falsification: Lessons from a Case
By Michael Seadle, published on 25 April 2018
Data falsification cases generally take time to discover, and generally require someone who is motivated enough to look for problems. Falsification should theoretically be found in the course of peer review, and sometimes is, but journals do not routinely make public the detailed results of peer review. Data falsification can also be hard to prove with certainty. This column will look at a case from social psychology that arose in the wake of the Diederik Stapel retractions. Stapel admitted his guilt and his name is now routinely part of discussions about data falsification. The 2014 case under discussion here is somewhat different because the author of the retracted papers still insists on his innocence. Since the person’s name is irrelevant to the scholarly discussion, this column will refer to him only as JF. Anyone who really wants to learn his name need only look at the reference.
The issue in the JF case involves datasets whose results are statistically too perfect. An unnamed whistleblower did an analysis:
“The chances of this happening were one in 508,000,000,000,000,000,000, he claimed.” (Kolfschooten, 2014)
The whistleblower is apparently known to the university and to the National Board for Research Integrity (LOWI) in the Netherlands (Kolfschooten, 2014). Maintaining the whistleblower’s anonymity seems legitimate as long as due process is followed and the accused has a reasonable chance to respond. Just how much opportunity JF had to respond is unclear from published sources. He implied that the opportunity was limited in an open letter to Retraction Watch (Amarcus41, 2014):
“The rapid publication of the results of the LOWI and UvA [University of Amsterdam] case happened quite unexpectedly, the negative evaluation came unexpectedly, too. Note that we were all sworn to secrecy by the LOWI, so please understand that I have to write this letter in zero time. Because the LOWI, from my point of view, did not receive much more information than was available for the preliminary, UvA-evaluation, and because I did never did something even vaguely related to questionable research practices, I expected a verdict of not guilty. … I do feel like the victim of an incredible witch hunt directed at psychologists after the Stapel-affair.“
JF appears not to have kept the original data, only his summary of the results, which is a lesson to other scholars not to be too ready to clean their files in case the original data are needed. Investigators also raised suspicions about the data in the thesis of one of JF’s doctoral students. The doctoral student was declared innocent of wrongdoing, because the data came from JF. For JF the trouble did not stop:
“A panel of statistical experts from UvA that embarked on a second, more comprehensive investigation found “strong evidence for low veracity” of the results in all three papers, as well as in five others.” (Kolfschooten, 2016)
And “… as part of a settlement with the German Society for Psychology (DGPs)” JF agreed to further retractions (Palus, 2016). The weight of opinion has been strongly against JF to the point that he left the academic world for private practice. (Stern, 2017)
In a sense the case is closed, but questions remain. Accusations of fraud tend to come in groups, perhaps because an initial case inspires people to look more carefully, and perhaps because opinion shifts away from a presumption of innocence. After the Stapel case, Uri Simonsohn built a statistical tool to detect the possibility of certain kinds of fraud where the data patterns were too perfect to be believed (Enserink, 2013). There is no evidence that this tool was involved in JF’s case, but the principle appears to be the same: the data were just too perfect, not merely once, but in paper after paper. Of course high quality data are what scholars need to get publications. The push to get perfect data is strong.
One should not forget how complex the creation of a research data set is, and that experienced researchers learn how to get good results without necessarily faking or directly manipulating the data. Selecting participants is an art in a world where genuine random selection is often impossible. A highly successful scholar might unconsciously seek just the right subjects without obvious tampering, and might learn how to ask exactly the right questions in exactly the right way to elicit exactly the right responses without further manipulation. Perhaps this seems implausible, but highly successful researchers must do something different or they would not be quite so untypical.
In any particular case, repeated perfect results must seem unlikely, but it may be less unlikely that factors other than outright fraud could play a role. In the case of JF, the investigation seems never to have considered other reasons.
One of the lessons from this case for researchers young and old is to keep all of the experimental data over a longer period. The lack of original data was a factor in this case that counted strongly against JF.
Amarcus41. 2014. “Social Psychologist Förster Denies Misconduct, Calls Charge ‘Terrible Misjudgment.’” Retraction Watch. 2014. Available online.
Enserink, Martin. 2012. “Fraud-Detection Tool Could Shake up Psychology.” Science 337 (6090). American Association for the Advancement of Science: 21–22. Available online.
Kolfschooten, Frank van. 2014. “Scientific Integrity. Fresh Misconduct Charges Hit Dutch Social Psychology.” Science (New York, N.Y.) 344 (6184). American Association for the Advancement of Science: 566–67. Available online.
Kolfschooten, Frank van. 2016. “No Tenure for German Social Psychologist Accused of Data Manipulation.” Science, July. Available online.
Palus, Shannon. 2016. “Psychologist Jens Förster Earns Second and Third Retractions as Part of Settlement.” Retraction Watch. 2016. Available online.
Stern, Victoria. 2017. “Psychologist under Fire Leaves University to Start Private Practice – Retraction Watch.” Retraction Watch. 2017-12-12. Available online.
Is exposure enough? The aftermath of article retraction
By Michael Seadle, published on 2 May 2018
Justice is often slow. Articles with integrity problems can stay in print without any warning label for years. Chen (2013) wrote:
“We found that it takes about 2 years, on average, to retract an article and another 2 years to see a substantial decrease of citations to the retracted article.”
Two years may well even underestimate the time to retraction, since the accusation often triggers formal investigations at universities and at journals, before either institution is ready to take action. As soon as an accusation becomes public, the press typically pushes for swift action, and university authorities typically want to make the problem go away, without much concern for the assumption of innocence that is part of democratic justice systems. One of the constant themes of this column is that integrity problems are sometimes more complex than the accusations imply. Nonetheless two years is a long time, during which ideas can become easily established.
From a journal perspective, the commercial value of an article declines sharply two years after publication, though value over time varies greatly with the field: humanities articles generally have a longer half-life than articles in the natural sciences or medicine. Most researchers in most fields will have read an article before two years are up, if it is at all relevant to their work. This means that an article that a publisher has retracted after two years has already exhausted a significant part of its commercial value and is intellectually present in the minds of the scholarly community. Two years more for a decrease in citations is hardly surprising, since scholars who read a paper are unlikely to go back to read it again. Likely they have a digital copy or a paper copy and work from that for their own new article.
Authors may also ignore a retraction for a variety of reasons that may depend on the reason for the retraction. As Madlock-Brown and Eichmann (2015) wrote:
“There are many reasons articles may be retracted, some more problematic than others.“
A work that was retracted for plagiarism, for example, may still contain worthwhile information, despite the ethical and copyright violations. Readers may also discount retractions for procedural or peer review issues. Self-citation plays a role too.
“18% of authors self-cite retracted work post retraction with only 10% of those authors also citing the retraction notice.” (Madlock-Brown & Eichmann, 2015)
What exactly authors are citing from their own retracted paper may matter. It is not quite fair to assume that everything in a paper is contaminated because of a retraction. The degree to which an integrity violation in one part of a paper affects others may depend on the field. A humanities paper may, for example, draw multiple conclusions, only one of which the retraction affects. The assumption that everything in a retracted paper is flawed is part of the black-or-white thinking that currently pervades the integrity literature.
The interesting question is whether the flawed portions of a retracted work, especially faked or manipulated data, continues in the minds of scholars after the integrity violation is discovered and established beyond reasonable doubt. Greitemeyer (2014) writes:
“… numerous studies have shown that corrections do not work as intended, in that individuals are influenced in their later judgments by misinformation even after correction. For instance, Loftus (1979) found that after witnessing an event, exposure to misleading information makes a person often report something that was only suggested. This phenomenon has been labeled the misinformation effect…“
In some ways this is not surprising. If the original article made a clear and cogent argument that seemed on the face of it to be reasonable, a memory of and even a belief in the argument may persist.
“Once a belief is formed, people generate explanations that fit the evidence. These explanations continue to imply that the belief is correct even after exposure to evidence that invalidates the evidence once used to support one’s belief.” (Greitemeyer, 2014)
An interesting example can be found in the retracted study by Diederik Stapel where he asks travelers to choose a chair next to a Dutch-African or a Dutch-Caucasian. (Stapel & Lindenberg, 2011) The data may have been fake, but the conclusion felt so plausible that it remained in the minds of many. Indeed, this reference to a retracted work is an example of why such citations may take place.
The good news is that researchers who are accused and exonerated may not suffer long term damage to their reputation. Greitemeyer and Sagioglou (2015) writes:
“The present research suggests that people do abandon their attitude toward an accused researcher after learning that the researcher has been exonerated. In both studies, participants in the exoneration condition had a more favorable attitude toward the researcher than participants in the uncorrected accusation condition. Moreover, in the exoneration condition, participants’ post-exoneration attitude was more favorable than their pre-exoneration attitude.“
This should be a comforting thought to those who are exonerated, but those cases seem to be rare. Interestingly enough Greitemeyer and Sagioglou (2015) begin with the example discussed in last week’s column, and note: “…it is important to keep in mind that the LOWI concluded that it cannot be determined whether Förster had manipulated the data.” Thus far he has not been exonerated and may well have given up hope. For others it may offer a grain of comfort after a time of stress.
Ms. Vera Hillebrand (MA) suggested the topic and the title. She also provided most of the references.
Chen, Chaomei, Zhigang Hu, Jared Milbank, and Timothy Schultz. 2013. “A Visual Analytic Study of Retracted Articles in Scientific Literature.” Journal of the American Society for Information Science and Technology 64 (2): 234–53. Available online.
Greitemeyer, Tobias. “Article retracted, but the message lives on.” Psychonomic bulletin & review 21, no. 2 (2014): 557-561. Available online.
Greitemeyer, Tobias and Sagioglou, Christina. 2015. “Does Exonerating an Accused Researcher Restore the Researcher’s Credibility?” PloS One 10 (5). Available online.
Madlock-Brown, C.R. & Eichmann, D. 2015. “The (Lack of) Impact of Retraction on Citation Networks.” Sci Eng Ethics 21 (127). Available online.
Stapel, Diederik A, and Siegwart Lindenberg. 2011. “Coping with Chaos: How Disordered Contexts Promote Stereotyping and Discrimination.” Science 332 (6026): 251–253. Available online.
Guilt and Innocence in Plagiarism
By Michael Seadle, published on 16 May 2018
Burden of Proof
Jochen Zenthöfer wrote an article in the Frankfurter Allgemeine newspaper on 18 April 2018 in which he expresses concern about the number of plagiarism cases under consideration at German universities. As he notes, the cases come largely from the VroniPlag Wiki. His article is the focus of this column.
There is an assumption in most western legal systems that a person is innocent until proven guilty, but, as Zenthöfer (2018) notes, this principle derives from criminal law:
“Als Grund führt die HU das Prinzip der Unschuldsvermutung an, das freilich nur im – hier nicht einschlägigen – Strafrecht gilt.” [The reason given by the HU is the principle of the presumption of innocence, which only applies in criminal law, and is not relevant here. – my translation]
The author seems to imply that the presumption of innocence ought to be ignored in a process that could destroy a career and strip a person of the means of livelihood. When the accuser is an official body, such as a commission of a university, that has done a careful analysis and presents a well-founded conclusion, it may be reasonable to put the burden of proof of innocence on the accused, but when a self-constituted group such as the VroniPlag Wiki makes an accusation, the universities involved have an obligation to investigate thoroughly and carefully to see whether the accusation is legitimate.
In conducting an investigation into plagiarism, appropriate standards need to be considered. In talking about “Policies and Initiatives Aimed at Addressing Research Misconduct in High-Income Countries,” Resnik (2013) refers to the COPE guidelines (2018), which define plagiarism as occurring:
“When somebody presents the work of others (data, words or theories) as if they were his/her own and without proper acknowledgment.” (Cope, 2018)
While this definition is comprehensive, it gives no explicit measure to determine what actually constitutes plagiarism. Mere text overlap is insufficient. A short factual statement, such as “Berlin is the capital of Germany” gets thousands of hits in a Google search, and could not reasonably be called plagiarism. The phrase in the guidelines about “proper acknowledgment” is equally inspecific, not merely because citation styles vary, but because expectations vary about exactly how and where in the text to put the reference.
Paraphrasing is not the same as plagiarism. Lee (2015) explains rules for paraphrasing in the American Psychological Association Style Blog:
“A paraphrase restates someone else’s words in a new way. For example, you might put a sentence into your own words, or you might summarize what another author or set of authors found. When you include a paraphrase in a paper, you are required to include only the author and date in the citation.”
This definition leaves latitude for understanding what “in your own words” means, which does not necessarily imply avoiding all the original words and phrases. When paraphrasing, it is almost impossible to avoid content-carrying words or phrases that have a particular meaning. Nonetheless there are plagiarism hunters who see plagiarism in every overlap.
When evaluating a work for plagiarism, it is important to have rational metrics. Copying a complete paragraph word for word (without quotes) is plagiarism. Copying a complete long sentence word for word suggests plagiarism. A case in which a majority of the words in a paragraph or sentence match words in the same order in another text could be deliberate plagiarism that the author tried to obscure, or it might be a case of good verbal memory or it might be that there was a logic to the order and the word choice. Absolute uniqueness of language is not necessarily the hallmark of good scholarship.
Companies like iThenticate are very careful only to talk about the percentage of plagiarism in terms of the number of words in the whole work. VroniPlag counts plagiarism in terms of how many pages have hits (“Anzahl Seiten mit Funden”), which means that even a page with a mere nine words (a set of four words and a set of five words) adds to the page count. (see VroniPlag). This exaggerates the impression of the problem to a point that could be considered misrepresentation in any scholarly work.
In my book on “Quantifying Research Integrity” (December 2016), I suggest a grey-scale measure for plagiarism cases where the number of contiguous words are measured in a particular unit, such as a paragraph or sentence. One can disagree with the exact numbers, but using transparent metrics as a standard matters. Exactly where copying occurs matters too. It is less surprising to have word overlap in a literature review than in conclusions, and facts and standard phrases in an academic discipline need to be deducted.
Ultimately decisions about plagiarism depend on the distinction between negligence and gross negligence. The former implies sloppiness, while the latter represents actual misconduct. Hunting for plagiarism may have a game-like quality for those who spend their free time doing it that pushes the volunteers toward judgments that increase the number of hits without regard to the distinction between negligence and gross negligence.
Certainly plagiarism is an ethical and copyright problem, but its long term actual harm to modern scholarship may be modest. The real harm comes to the personal integrity of the person doing the plagiarism. Integrity matters and certainly instances of plagiarism need to be caught, but the current focus on hunting plagiarism may actually be a distraction from the more important task of identifying problems with falsified or manipulated data. False data undermines the foundations of scholarship (especially the natural sciences) in ways that plagiarism does not.
The popularity of plagiarism hunting grows in part from tools that make it easy to compare texts word for word. Some British universities distinguish between actual plagiarism and the appearance of plagiarism by requiring students to submit their own papers to a plagiarism checker like Turnitin. King’s College London even allows students to submit their works multiple times (see “Submitting Assessments Online“). While this is a measure to prevent plagiarism, it serves also as a recognition that inadvertent copying is common and does not necessarily involve fraudulent intent.
Ms. Melanie Rügenhagen (MA) assisted with the research.
COPE (Committee on Publication Ethics). 2018. “Plagiarism.” 2018. Available online.
Lee, Chelsea. 2015. “When and How to Include Page Numbers in APA Style Citations.” American Psychological Association: APA Style Blog. 2015. Available online.
Resnik, David B., and Zubin Master. 2013. “Policies and Initiatives Aimed at Addressing Research Misconduct in High-Income Countries.” PLoS Medicine 10 (3). Public Library of Science: e1001406. Available online.
Seadle, Michael. 2016. Quantifying Research Integrity. Morgan Claypool: Synthesis Lectures on Information Concepts, Retrieval, and Services. Available online.
Zenthöfer, Jochen. 2018. “Wie Universitäten Auf Plagiate in Doktorarbeiten Reagieren: Auch Mit Diebstahl Kann Man Es Weit Bringen.” Frankfurter Allgemeine, April 18, 2018. Available online.
By Michael Seadle, published on 23 May 2018
Testing for Reliability
The principle that scientists (and scholars generally) can build on past results means that past results ought to be replicable. Brownill et al (2016) write:
“This replication by different labs and different researchers enables scientific consensus to emerge because the scientific community becomes more confident that subsequent research examining the same question will not refute the findings.“
And MacMillan (2017) writes in his editorial “Replication Studies”:
“Replication studies are important as they essentially perform a check on work in order to verify the previous findings and to make sure, for example, they are not specific to one set of data or circumstance.“
Increasingly replication is also seen as a way to test for data falsification, on the presumption that unreliable results will not be replicable; but as with most forms of testing, it offers no simple answer.
How does Replication Work?
The ability to replicate results means that those doing the replication need exact information about how the original experiment was carried out. In physics and chemistry this means precise descriptions in lab books and in articles, and the same machines using the same calibration. In the social sciences, it can be much harder to reproduce the exact conditions, since they depend on human reactions and a variable environment. One well-known case comes from a study by Cornell social psychologist Daryl Bem, who did a word recognition test:
“[Bem] published his findings in the Journal of Personality and Social Psychology (JPSP) along with eight other experiments providing evidence for what he refers to as “psi”, or psychic effects. There is, needless to say, no shortage of scientists sceptical about his claims. Three research teams independently tried to replicate the effect Bem had reported and, when they could not, they faced serious obstacles to publishing their results.” (Yong, 2012)
The fact that the other research teams could not replicate the experiment successfully did not suggest to anyone that the data were fake (presumably the students could attest to that), but the failure did cast doubt on the apparent “psychic effects”. Since an exact replication using those Cornell students in that class with all the same social conditions was not possible, the question arises: how close to the original must a replication be to validate an original experiment?
Dennis and Valacich (2014) talk about “three fundamental categories” of replication:
- “Exact Replications: These articles are exact copies of the original article in terms of method and context. All measures, treatments statistical analyses, etc. are identical to those of the original study…
- Methodological Replications: These articles use exactly the same methods as the original study (i.e., measures, treatments, statistics etc.) but are conducted in a different context. …
- Conceptual Replications: These articles test exactly the same research questions or hypotheses, but use different measures, treatments, analyses and/or context….“
Since the Cornell students were not available for the replications, the replications presumably come under the “methodological” category, or perhaps even the “conceptual”. Dennis and Valacich (2014) comment: “Conceptual replications are the strongest form of replication because they ensure that there is nothing idiosyncratic about the wording of items, the execution of treatments, or the culture of the original context that would limit the research conclusions.”
In any case these replication types represent a significant contribution to knowledge by confirming or throwing skepticism on the earlier results. Why then did the research teams have trouble publishing their results?
Most journals do not encourage replications. A study that strikes readers as new and exciting and generates attention is a plus, whereas a study that appears to cover old ground, even if it has scholarly value, is less likely to get through the peer review process. Lucy Goodchild van Hilten (2015) writes:
“Publication bias affects the body of scientific knowledge in different ways, including skewing it towards statistically significant or “positive” results. This means that the results of thousands of experiments that fail to confirm the efficacy of a treatment or vaccine – including the outcomes of clinical trials – fail to see the light of day.“
This may be changing and the degree to which it is true depends in part on the academic discipline. David McMillan (2017) writes:
“Cogent Economics & Finance recognises the importance of replication studies. As an indicator of this importance, we now welcome research papers that focus on replication and whose ultimate acceptance depends on the accuracy and thoroughness of the work rather than seeking a ‘new’ result.“
If other journals follow this trend, there could be significantly more testing of scholarly results. Nonetheless a problem remains. Except for the design time, replicating results costs almost as much as doing the original experiment and if the results are in fact exactly the same, it is unlikely to be published. Some fields solve the problem with a repeat-and-extend approach where replication is tied to new features that explicitly build on the replicated results. Much depends on the culture of the discipline.
For all of its problems, replication remains one of the most effective and reliable tools for uncovering flaws and fake data, and should be used more widely.
Bem (2015) did a further “meta-analysis of 90 [replication] experiments from 33 laboratories in 14 countries …” which he claims supports his hypothesis. He published this meta-analysis in an open-access journal for the life sciences that charges $1000 for an article of this length, and Bem explicitly declared that he had no grant support. If nothing else, this is a sign of how difficult it is to continue the discourse in standard academic venues.
Ms. Melanie Rügenhagen (MA) suggested the topic and assisted with the research. Prof. Dr. Joan Luft, provided research content.
Bem D, Tressoldi PE, Rabeyron T and Duggan M. 2015. “Feeling the Future: A Meta-Analysis of 90 Experiments on the Anomalous Anticipation of Random Future Events.” F1000Research 4:1188. Available online.
Brownill, Sue, Dennis, Alan R., Binny, Samuel, Tan, Barney, Valacich , Joseph and Whitley, Edgar A. 2016. “Replication Research: Opportunities, Experiences and Challenges.” In Thirty Seventh International Conference on Information Systems. Dublin, Ireland. Available online.
Goodchild van Hilten, Lucy. 2015. “Why It’s Time to Publish Research ‘Failures.’” Elsevier Connect. Available online.
McMillan, David. 2017. “Replication Studies.” Cogent Economics and Finance, 2017. Available online.
Yong, Ed. 2012. “Replication Studies: Bad Copy.” Nature 485 (7398): 298–300. Available online.
Replication in Qualitative Research
By Melanie Rügenhagen and Michael Seadle, published on 13 June 2018
Replication is difficult to apply to qualitative studies in so far as it means recreating the exact conditions of the original study — a condition that is often impossible in the real world. The key question then becomes: “how close to the original must a replication be to validate an original experiment?” (Seadle, 2018)
This question is particularly important because of the widespread belief that only quantitative research is replicable. Leppink (2017) writes:
“Unfortunately, the heuristic of equating a qualitative–quantitative distinction with that of a multiple–single truths distinction is closely linked with the popular belief that replication research has relevance for quantitative research only. In fact, the usefulness of replication research has not rarely been narrowed down even further to repeating randomised controlled experiments.” (Leppink, 2017)
Dennis and Valacich (2014) suggest three categories for replication studies, only one of which is “exact” (see the column from 23 May 2018). The conceptual and methodological categories are both relevant to qualitative research, because the participants and the context can vary as long as the replication tests the inherent goals and concepts, as well as the methodological framework of the original. In other words, successful qualitative replications can provide a confirmation of the hypotheses at a higher level of generalisation. Even when the specific contexts change. What matters is that the concepts and outcomes remain constant. As Polit and Beck (2010) write:
“If concepts, relationships, patterns, and successful interventions can be confirmed in multiple contexts, varied times, and with different types of people, confidence in their validity and applicability will be strengthened.” (Polit & Beck, 2010)
These authors support the use of replication in qualitative research, and argue that replication is the best way to confirm the results of a study:
“Knowledge does not come simply by testing a new theory, using a new instrument, or inventing a new construct (or, worse, giving an inventive label to an old construct). Knowledge grows through confirmation. Many theses and dissertations would likely have a bigger impact on nursing practice if they were replications that yielded systematic, confirmatory evidence—or if they revealed restrictions on generalized conclusions.” (Polit & Beck, 2010)
How can one ensure that the evidence is systematic? Leppink (2017) suggests that researchers in all kinds of studies have to decide when they no longer need more data in order to answer their research question and calls this concept saturation.
It is important to remember that qualitative research normally does not generalise about results beyond the community involved in the samples, which sets a very limited and specific context for the research question. At some point researchers need to decide when their question is answered, stop their inquiries, and come to a conclusion. Leppink (2017) writes:
“If saturation was achieved, one might expect that a replication of the study with a very similar group of participants would result in very similar findings. If the replication study leads to substantially different findings, this would provide evidence against the saturation assumption made by the researchers in the initial study.”
Saturation means that the answer to a research question is complete, and becomes a core element of the “systematic, confirmatory evidence” (Polit & Beck, 2010) for analyzing validity. It can also help to provide metrics by uncovering the degree to which a study may be flawed or even intentionally manipulated.
Nonetheless there are barriers. While a range of studies based on the same concepts and methodology can lead to insights about whether a phenomenon is true, not knowing exactly how the original researchers conducted their studies may make replication impossible (Leppink, 2017). This makes describing the methodology particularly important.
None of this is easy. Replication studies remain a stepchild in the world of academic publishing. Gleditsch and Janz (2016) write about efforts to encourage replicating their own research area (international relations):
“Nevertheless, progress has been slow, and many journals still have no policy on replication or fail to follow up in practice.”
The problem is simple. There is no fame to be gained in showing that someone else’s ideas and conclusions are in fact correct, and it is hardly surprising that ambitious researchers avoid doing replications, especially for qualitative research, where the risk of failing is high and succeeding only makes readers think that the original study was done well.
Gleditsch, Nils Petter, and Nicole Janz. 2016. “Replication in International Relations.” International Studies Perspectives, ekv003. Available online.
Polit, Denise F., and Cheryl Tatano Beck. 2010. “Generalization in Quantitative and Qualitative Research: Myths and Strategies.” International Journal of Nursing Studies 47 (11): 1451–58. Available online.
Michael Seadle. 2018. “Replication Testing.” Column on Information Integrity 2/2018. Published on 23 May 2018. Available online.
Leppink, Jimmie. 2017. “Revisiting the Quantitative–Qualitative-Mixed Methods Labels: Research Questions, Developments, and the Need for Replication.” Journal of Taibah University Medical Sciences 12 (2). Elsevier B.V.: 97–101. Available online.