Introduction

This open access paper, published in September 2023, provides an analysis that research institutions should share with its research community.  ChatGPT, even the paid version – ChatGPT-4, hallucinates references.  Even with the real references it finds, it makes too many errors in the referencing style.  As this paper observes, LLMs like ChatGPT is a language system that does not genuinely understand the question its asked, material it finds or the outputs it produces.  Even if a researcher uses systems like this for inspiration, they should not let it play any direct role in the final paper/output.  If it does, there is a good chance what it produces will be fake and incorrectly formatted fake at that.

ChatGPT-3 (Chat Generative Pre-trained Transformer 3), released to the public in November 2022 by OpenAI, uses elements of artificial intelligence—including natural language processing (NLP), machine learning, and deep learning—to produce or alter texts in ways that mimic human writing or speech. Among other things, ChatGPT can respond to specific or open-ended questions; engage in conversation; summarize, translate, or edit text provided by the user or included in its information base; and generate original text based on the user’s instructions1. Its ability to generate short reports and papers has led to concerns regarding educational and academic integrity2,3,4,5,6,7,8,9.

It is important to realize, however, that ChatGPT is fundamentally not an information-processing tool, but a language-processing tool. It mimics the texts—not necessarily the substantive content—found in its information base10,11. ChatGPT has also been known to “hallucinate”—to provide factually incorrect responses—although OpenAI reports that this is less of a problem with ChatGPT-4, released in March 2023, than with earlier versions of the software12,13,14,15.

This study investigates one particular type of hallucination: fabricated bibliographic citations that do not correspond to actual scholarly works. For 84 documents generated by GPT-3.5 and GPT-4, we determine the percentage of the 636 cited works that are fabricated rather than real, along with the percentage of the works (articles, chapters, books and websites) for which the larger publication or organization (journal, book, publisher) is also fabricated. For the citations that correspond to real works, we also assess the prevalence of various citation errors (e.g., incorrect author names, article titles, dates, journal titles, volume numbers, issue numbers, page numbers, and publisher/organization names). Finally, we briefly investigate citation formatting errors and the characteristics of the real and fabricated hyperlinks included in ChatGPT citations.

Walters, W.H. & Wilder, E.I. (2023) Fabrication and errors in the bibliographic citations generated by ChatGPT. Scientific Reports (13)14045  https://doi.org/10.1038/s41598-023-41032-5
Publisher (Open Access): https://www.nature.com/articles/s41598-023-41032-5