Abstract
Although chatbots such as ChatGPT can facilitate cost-effective text generation and editing, factually incorrect responses (hallucinations) limit their utility. This study evaluates one particular type of hallucination: fabricated bibliographic citations that do not represent actual scholarly works. We used ChatGPT-3.5 and ChatGPT-4 to produce short literature reviews on 42 multidisciplinary topics, compiling data on the 636 bibliographic citations (references) found in the 84 papers. We then searched multiple databases and websites to determine the prevalence of fabricated citations, to identify errors in the citations to non-fabricated papers, and to evaluate adherence to APA citation format. Within this set of documents, 55% of the GPT-3.5 citations but just 18% of the GPT-4 citations are fabricated. Likewise, 43% of the real (non-fabricated) GPT-3.5 citations but just 24% of the real GPT-4 citations include substantive citation errors. Although GPT-4 is a major improvement over GPT-3.5, problems remain.
Introduction
This open access paper, published in September 2023, provides an analysis that research institutions should share with its research community. ChatGPT, even the paid version – ChatGPT-4, hallucinates references. Even with the real references it finds, it makes too many errors in the referencing style. As this paper observes, LLMs like ChatGPT is a language system that does not genuinely understand the question its asked, material it finds or the outputs it produces. Even if a researcher uses systems like this for inspiration, they should not let it play any direct role in the final paper/output. If it does, there is a good chance what it produces will be fake and incorrectly formatted fake at that.
It is important to realize, however, that ChatGPT is fundamentally not an information-processing tool, but a language-processing tool. It mimics the texts—not necessarily the substantive content—found in its information base10,11. ChatGPT has also been known to “hallucinate”—to provide factually incorrect responses—although OpenAI reports that this is less of a problem with ChatGPT-4, released in March 2023, than with earlier versions of the software12,13,14,15.
This study investigates one particular type of hallucination: fabricated bibliographic citations that do not correspond to actual scholarly works. For 84 documents generated by GPT-3.5 and GPT-4, we determine the percentage of the 636 cited works that are fabricated rather than real, along with the percentage of the works (articles, chapters, books and websites) for which the larger publication or organization (journal, book, publisher) is also fabricated. For the citations that correspond to real works, we also assess the prevalence of various citation errors (e.g., incorrect author names, article titles, dates, journal titles, volume numbers, issue numbers, page numbers, and publisher/organization names). Finally, we briefly investigate citation formatting errors and the characteristics of the real and fabricated hyperlinks included in ChatGPT citations.
Walters, W.H. & Wilder, E.I. (2023) Fabrication and errors in the bibliographic citations generated by ChatGPT. Scientific Reports (13)14045 https://doi.org/10.1038/s41598-023-41032-5
Publisher (Open Access): https://www.nature.com/articles/s41598-023-41032-5