Institutions typically treat research integrity violations as black and white, right or wrong. The result is that the wide range of grayscale nuances that separate accident, carelessness, and bad practice from deliberate fraud and malpractice often get lost. This lecture looks at how to quantify the grayscale range in three kinds of research integrity violations: plagiarism, data falsification, and image manipulation.
Quantification works best with plagiarism, because the essential one-to-one matching algorithms are well known and established tools for detecting when matches exist. Questions remain, however, of how many matching words of what kind in what location in which discipline constitute reasonable suspicion of fraudulent intent. Different disciplines take different perspectives on quantity and location. Quantification is harder with data falsification, because the original data are often not available, and because experimental replication remains surprisingly difficult. The same is true with image manipulation, where tools exist for detecting certain kinds of manipulations, but where the tools are also easily defeated.
This lecture looks at how to prevent violations of research integrity from a pragmatic viewpoint, and at what steps can institutions and publishers take to discourage problems beyond the usual ethical admonitions. There are no simple answers, but two measures can help: the systematic use of detection tools and requiring original data and images. These alone do not suffice, but they represent a start.
The scholarly community needs a better awareness of the complexity of research integrity decisions. Only an open and wide-spread international discussion can bring about a consensus on where the boundary lines are and when grayscale problems shade into black. One goal of this work is to move that discussion forward.
Over the last decade inappropriate image manipulations have become a serious concern in a variety of sectors of society, such as in the news, in politics or the entertainment sector. Digital image editing programs are nowadays very powerful and constantly change how we produce and understand images. In academia images play a very important role and due to a number of fraud incidents image manipulation gained more and more attention.
Given the number of scientific papers that contain problematic images (without necessarily representing fraudulent intend) and the fact that many retractions happen due to the inappropriate use of images there is definitely a need to take effective measures against inappropriate manipulations. But this is far easier said than done, since often is not trivial to
However, this is the first of a series of blog posts that deal with the simple question of how image manipulations can be detected. The field of research that is occupied with the detection of image manipulation is called ‘image forensics’. Forensic experts analyze whether there is evidence that makes an image suspicious and they gather all the clues that can possibly be found in order to make informed judgments about the appropriateness of the image. This can include aspects like compression, meta data, or lighting. It can be executed through mere observation or by applying suitable algorithms. Forensic analysis plays a practical role for insurance companies, in crime investigations and all thinkable fields in which images possess evidential value.
There are a number of free online resources available on the web that promise to support image analysis. Collections of forensic tools are available at https://29a.ch/photo-forensics ; http://fotoforensics.com/ ; or at http://www.getghiro.org/ to only name a few.
THE ERROR LEVEL ANALYSIS TOOL
One tool that all of these collections include is “Error Level Analysis” (hereinafter: ELA). Jonas Wagner, the developer of “Forensically” explains the tool on his web site as follows:
“This tool compares the original image to a recompressed version. This can make manipulated regions stand out in various ways. For example they can be darker or brighter than similar regions which have not been manipulated.“ (https://29a.ch/photo-forensics/#help)
The tool is designed to identify those areas within an image that are on a different compression level. When manipulations have been carried out on a JPEG image (e.g. with elements added or removed) the ELA analysis tool is expected to identify and mark all of the manipulated regions, since the resave of the image puts the original image and the added element on different compression levels.
Let us now see how this works out in practice:
Below you see the unaltered (but downsized) version of a random snapshot I took last summer near the river Elbe in Saxony, Germany, with a Sony Alpha 6000 Camera:
This is the ELA analysis result of the digital image using the free online resource “Forensically”:
The image appears consistently dark with only a few regions standing slightly out because of the original lighting. The edges of the objects appear a little lighter than the rest of the image and in some areas they show a little violet touch, while the sun appears as a black uniform stain. I then uploaded the original image to Adobe Photoshop and added and changed a number of features in the picture. For example I included a PNG of the moon and duplicated it on the upper left side of the image. Moreover, I inserted a swarm of birds and removed some disturbing stains from the glass in the foreground with the Photoshop erasure tool and, last not least I copied the flower from the milk can and duplicated it (see images below).
List of manipulations:
1_Added and duplicated moon on different saturation levels:
3_Removed stains (you can clearly see the round marks of the erasure tool):
4_Copied and pasted flowers on the milk carton
This is how the resulting image looks like:
Note: I did intentionally not alter any of the standard settings, like contrast or brightness over the entire image, since ELA results could be affected.
After I carried out these rather basic manipulations I saved the image from Photoshop as JPEG and uploaded the image to “Forensically” to analyze it with the ELA tool (the tool only allows the analysis of JPEG and TIFF images). This is how the result of the ELA analysis looks like:
Here are some details:
Moon area: Clearly visible. It is noteworthy that the ELA analysis tool shows all 5 objects highlighted uniformly and does not represent the different saturation levels.
Added birds: The highlighting is clearly recognizable.
Flower area: Edges appear almost like the edges of unaltered objects in the photo. The highlighting of the copied and pasted objects is not clearly distinguishable from other structures in the image. In other words, if I did not know about the manipulation I would not have recognized it.
Removed stains on the glass: The area appears almost like the unaltered version. From the ELA analysis, traces of the erasure tool are not recognizable.
This short experiment showed some of the strong and some of the weak sides of the ELA analysis tool. The tool clearly identified elements that were introduced to the picture after a single resave. (Any further resave would decrease the quality of the JPEG and consequently influence the ELA result.). ELA did not reveal other manipulations like the copying of the flower element or the removing of stains on the glass, which definitely limits the usefulness of the tool. However, since ELA at least allowed to identify some of the manipulations it can be recommended as one possible tool to start with when analyzing images. Still, the user should be aware that regions which stand out do not necessarily imply manipulation. Jonas Wagner points out in the help section of “Forensically”: “The results of this tool can be misleading (…)”.
Another aspect that must be mentioned is that it requires a good bit of experience before you get results. The levels are not self-explanatory and interpreting the ELA results definitely requires some visual training (as well as reading through tutorials). One interesting insight I gained from working with this tool is that whenever a JPEG is uploaded to Photoshop it acquires a characteristic “rainbow effect” which can better be observed with the levels slightly altered, like in the example below (JPEG Quality 90, Error Scale 53, Magnifier Enhancement: None, Opacity at 0.64).
The characteristic rainbow effect that reveals an image has been uploaded to Photoshop (or another Adobe Product):
In sum, Error Level Analysis opens the mind of the user for a more systematic evaluation of what is visible and what can be hidden in a picture. It helps revealing some of the hidden features, but is definitely not a tool that can stand for itself or that produces all-inclusive forensic results for the uninitiated user.
More tools will be evaluated soon – please visit HEADT.EU for upcoming posts.
HEADT Centre 2017
Prof. Michael Seadle gave two lectures via Skype on 24 November 2016 about the research integrity work of the HEADT Centre to students in the Scientific Writing in English Course at the National Library of Technology/Czech Technical University in Prague.
Today images play a predominant role in public communication – in advertisements, the broadcasting industry as well as the Internet. Images massively influence how events are perceived – they attract attention and shape worldviews. At the same time images are often intentionally altered to serve a given purpose. In scholarship like in other fields of society images are highly valued as a key currency in an economy of attention. Through digital image editing programs it is relatively easy to produce images or to enhance their visual qualities and thereby to create images that are cleaned up or beautified.
In face of such operations, how do scholars actually conceive image manipulations? How do biologists, computer scientists, art historians, or designers judge their liberty when altering images? Where do they draw the line between appropriate image editing and fraudulent image manipulation? Such are the questions that are raised in the book “Shaping Images – Scholarly Perspectives on Image Manipulation”, published by De Gruyter 12 September 2016. The book includes the perspective of scholars with different disciplinary backgrounds – many of which are associated to the Cluster of Excellence Image Knowledge Gestaltung, located at Humboldt-Universität zu Berlin – while other participants have a background in the museum world.
Many of the scholars represented in this volume agree that image manipulation must remain transparent at all times in order to avoid inappropriate data falsification. The strategies in dealing with image manipulation and the levels of liberty scholars claim for themselves are compared and the question is raised whether the integrity of images can be preserved in times in which digital image editing programs blur the boundaries between what is possible and what is acceptable.
The team working on Research Integrity at the HEADT Centre carried out research at the Annual Conference of Computer Linguistics (ACL) in Berlin, August 7-9. Scholars and other visitors at the conference site had the opportunity to participate in an interactive online survey with focus on the decision-making processes that are involved in evaluating textual similarities and/or plagiarism cases. We are going to present results of the survey on our website soon.