Information integrity is fundamentally about what makes information true or false, both at the scholarly level (research integrity) and for public and policy discourse. There are reports about false information almost daily. A recent example involves the BBC, which has long been a model for the integrity of its reporting (Sweney, 2018). This column will focus mainly on the scholarly aspects of information integrity, but the effect of integrity problems on policy matters (public health issues, for example) will not be ignored.
The topic includes a broad range of problems, including data falsification, image manipulation, and plagiarism. While plagiarism is perhaps the most prominent issue, it is primarily an ethical and legal issue and generally does not undermine scholarship that builds on it because the results are not necessarily false. This column will discuss all aspects of information integrity, but will focus especially on data problems, since no generalized detection tools exist, though a few disciplines (such as psychology) are working on them.
A core concept in my book on “Quantifying Research Integrity” (Seadle, 2017) is the greyscale approach: integrity issues rarely separate neatly into simple black and white, guilty or innocent, categories. Many scholarly works have imperfections, and problematic works may still contain valid information. From the viewpoint of a university or a publisher, formal decision-making processes involving punishments and retractions may make black-and-white decisions about integrity problems preferable, but such black-and-white decisions can themselves be an integrity issue, since an overly simplistic label is at least partly untrue.
Scholarly literature contains a wealth of examples of integrity problems going well back in historical time. Today there are tools for investigating plagiarism and for examining some kinds of image manipulation. Data falsification presents more of a challenge because of its variety and complexity. Simple cases such as that of Diederik Stapel, who admitted manufacturing his results, are rarer than scholars who make poor choices about data or its interpretation (Bhattacharjee, 2013). Unintentional error is also an information problem, even if it is not falsification.
Selecting problematic research may have lasting effects on political discourse as well as on scholarship. While the evidence for climate change appears to be overwhelming, studies by a small number of skeptics have given oil and coal lobbies in the US a tool for opposing effective measures to reduce hydrocarbons in the atmosphere. Natural science builds on the ability to reproduce results, and when many scientists produce the same results based on a wide range of measures, the conclusions are normally accepted as valid. Lay persons unfamiliar with the scholarly literature sometimes select flawed studies that confirm their own personal preferences.
Other more historical examples of selection bias can be found in claims about the inferiority of people in the US who were not of northern European descent — not merely those from Africa, but also from Italy, Ireland, and eastern Europe. Such claims were popular among the right wing in many European countries in the Nazi era, and are still popular among some groups today. A basis for them reaches back to Christoph Meiners (Grundriß der Geschichte der Menschheit, 1785) in the 18th century and is as modern as “The Bell Curve” by Richard Herrnstein and Charles Murray (1994). These studies did not fake their data and used scientific methods that seemed appropriate at the time, but they were selective about what evidence they included, and today it is widely accepted that the exclusions skewed results in a particular direction.
Selection bias may have social and cultural origins that can change over time. For those who believe in the inerrancy of Holy Scripture, the data confirming evolution is invalid. A scholar of research integrity needs in some sense to be an historian, in order to understand the research in time and place, and to be an ethnographer, in order to understand integrity violations across cultures and disciplines. No one should imagine that integrity research involves simple labels.
This column will focus on discussing papers about research integrity and will look at specific cases, whose complexity gives opportunities to apply a greyscale analysis. There are many good sources of information, not the least of which is Retraction Watch (Oransky, 2018), which provides an excellent news feed and classifies cases of retractions by type and field. Retractions may represent only part of the problem, simply because discovering problems is hard and because false positives may distract from more important issues. The ability to reproduce results is a classic hallmark of good science, but there is good evidence that results in behavioral and social science studies are harder to reproduce than natural-science results for the simple reason that social circumstances change.
The goal of this column is scholarly, not investigative. It does not actively seek out new cases where research integrity may have been violated, but seeks to examine existing cases in order to apply a greyscale understanding of what happened and what the consequences are. As Principal Investigator for the research integrity part of the HEADT Centre, I will be the primary columnist, but others will likely contribute as well, including Dr. Thorsten Beck, who specializes in image manipulation.
Bhattacharjee, Yuduit. 2013. “The Mind of a Con Man.” New York Times, April 26, 2013. Available online.
Seadle, Michael. 2017. Quantifying Research Integrity. Morgan Claypool: Synthesis Lectures on Information Concepts, Retrieval, and Services. Available online.
Sweney, Mark. 2018. “No Title.” New York Times, April 4, 2018. Available online.
Oransky, Ivan, and Adam Marcus. 2018. “Retraction Watch.” 2018. Available online.