Finding papers produced by paper mills has become a major headache for many of the world’s largest publishers over the past year, and they’re largely playing catch-up since sleuths began identifying them a few years ago. But there may be a new way: Earlier this month, Adam Day, a data scientist at SAGE Publishing, posted a preprint on arXiv that used a variety of methods to search for duplication in peer review comments, based on the likelihood that paper mills “create fake referee accounts and use them to submit fake peer-review reports.” We asked Day several questions about the approach.
The publication of papers from paper mills represents cancer in the body of scientific literature, does real damage and can be dangerous. They can also tarnish the reputation of a title. Retraction Watch interviews Adam Day to talk about the identification of fabricated peer reviews and papers.
Adam Day (AD): This all started when an eagle-eyed editor at SAGE Publishing noticed that 2 different referees had left identical comments on 2 different peer-reviews. That seemed like a sure-sign that someone was attempting to game our peer-review system and it gave us the idea to survey our peer-review comments for more cases like this.
Initially, we treated the problem as being much like a plagiarism search. Just like when we search for plagiarism, we have a big collection of documents and we are looking for duplication of text in documents written by different authors. Most publishers are familiar with plagiarism-detection tools like iThenticate. However, after researching a long list of plagiarism-detection tools, none were found that were ideally suited to the task. We didn’t want to reinvent the wheel, so we built and tested some simple search methods. These are easy to implement and so we hope that the preprint helps others to perform the same searches.