Burden of Proof
Jochen Zenthöfer wrote an article in the Frankfurter Allgemeine newspaper on 18 April 2018 in which he expresses concern about the number of plagiarism cases under consideration at German universities. As he notes, the cases come largely from the VroniPlag Wiki. His article is the focus of this column.
There is an assumption in most western legal systems that a person is innocent until proven guilty, but, as Zenthöfer (2018) notes, this principle derives from criminal law:
“Als Grund führt die HU das Prinzip der Unschuldsvermutung an, das freilich nur im – hier nicht einschlägigen – Strafrecht gilt.” [The reason given by the HU is the principle of the presumption of innocence, which only applies in criminal law, and is not relevant here. – my translation]
The author seems to imply that the presumption of innocence ought to be ignored in a process that could destroy a career and strip a person of the means of livelihood. When the accuser is an official body, such as a commission of a university, that has done a careful analysis and presents a well-founded conclusion, it may be reasonable to put the burden of proof of innocence on the accused, but when a self-constituted group such as the VroniPlag Wiki makes an accusation, the universities involved have an obligation to investigate thoroughly and carefully to see whether the accusation is legitimate.
In conducting an investigation into plagiarism, appropriate standards need to be considered. In talking about “Policies and Initiatives Aimed at Addressing Research Misconduct in High-Income Countries,” Resnik (2013) refers to the COPE guidelines (2018), which define plagiarism as occurring:
“When somebody presents the work of others (data, words or theories) as if they were his/her own and without proper acknowledgment.” (Cope, 2018)
While this definition is comprehensive, it gives no explicit measure to determine what actually constitutes plagiarism. Mere text overlap is insufficient. A short factual statement, such as “Berlin is the capital of Germany” gets thousands of hits in a Google search, and could not reasonably be called plagiarism. The phrase in the guidelines about “proper acknowledgment” is equally inspecific, not merely because citation styles vary, but because expectations vary about exactly how and where in the text to put the reference.
Paraphrasing is not the same as plagiarism. Lee (2015) explains rules for paraphrasing in the American Psychological Association Style Blog:
“A paraphrase restates someone else’s words in a new way. For example, you might put a sentence into your own words, or you might summarize what another author or set of authors found. When you include a paraphrase in a paper, you are required to include only the author and date in the citation.”
This definition leaves latitude for understanding what “in your own words” means, which does not necessarily imply avoiding all the original words and phrases. When paraphrasing, it is almost impossible to avoid content-carrying words or phrases that have a particular meaning. Nonetheless there are plagiarism hunters who see plagiarism in every overlap.
When evaluating a work for plagiarism, it is important to have rational metrics. Copying a complete paragraph word for word (without quotes) is plagiarism. Copying a complete long sentence word for word suggests plagiarism. A case in which a majority of the words in a paragraph or sentence match words in the same order in another text could be deliberate plagiarism that the author tried to obscure, or it might be a case of good verbal memory or it might be that there was a logic to the order and the word choice. Absolute uniqueness of language is not necessarily the hallmark of good scholarship.
Companies like iThenticate are very careful only to talk about the percentage of plagiarism in terms of the number of words in the whole work. VroniPlag counts plagiarism in terms of how many pages have hits (“Anzahl Seiten mit Funden”), which means that even a page with a mere nine words (a set of four words and a set of five words) adds to the page count. (see VroniPlag). This exaggerates the impression of the problem to a point that could be considered misrepresentation in any scholarly work.
In my book on “Quantifying Research Integrity” (December 2016), I suggest a grey-scale measure for plagiarism cases where the number of contiguous words are measured in a particular unit, such as a paragraph or sentence. One can disagree with the exact numbers, but using transparent metrics as a standard matters. Exactly where copying occurs matters too. It is less surprising to have word overlap in a literature review than in conclusions, and facts and standard phrases in an academic discipline need to be deducted.
Ultimately decisions about plagiarism depend on the distinction between negligence and gross negligence. The former implies sloppiness, while the latter represents actual misconduct. Hunting for plagiarism may have a game-like quality for those who spend their free time doing it that pushes the volunteers toward judgments that increase the number of hits without regard to the distinction between negligence and gross negligence.
Certainly plagiarism is an ethical and copyright problem, but its long term actual harm to modern scholarship may be modest. The real harm comes to the personal integrity of the person doing the plagiarism. Integrity matters and certainly instances of plagiarism need to be caught, but the current focus on hunting plagiarism may actually be a distraction from the more important task of identifying problems with falsified or manipulated data. False data undermines the foundations of scholarship (especially the natural sciences) in ways that plagiarism does not.
The popularity of plagiarism hunting grows in part from tools that make it easy to compare texts word for word. Some British universities distinguish between actual plagiarism and the appearance of plagiarism by requiring students to submit their own papers to a plagiarism checker like Turnitin. King’s College London even allows students to submit their works multiple times (see “Submitting Assessments Online“). While this is a measure to prevent plagiarism, it serves also as a recognition that inadvertent copying is common and does not necessarily involve fraudulent intent.
Ms. Melanie Rügenhagen (MA) assisted with the research.
COPE (Committee on Publication Ethics). 2018. “Plagiarism.” 2018. Available online.
Lee, Chelsea. 2015. “When and How to Include Page Numbers in APA Style Citations.” American Psychological Association: APA Style Blog. 2015. Available online.
Resnik, David B., and Zubin Master. 2013. “Policies and Initiatives Aimed at Addressing Research Misconduct in High-Income Countries.” PLoS Medicine 10 (3). Public Library of Science: e1001406. Available online.
Seadle, Michael. 2016. Quantifying Research Integrity. Morgan Claypool: Synthesis Lectures on Information Concepts, Retrieval, and Services. Available online.
Zenthöfer, Jochen. 2018. “Wie Universitäten Auf Plagiate in Doktorarbeiten Reagieren: Auch Mit Diebstahl Kann Man Es Weit Bringen.” Frankfurter Allgemeine, April 18, 2018. Available online.