Artificial intelligence (AI) natural language processing (NLP) systems, such as OpenAI’s generative pre-trained transformer (GPT) model (https://openai.com) or Meta’s Galactica (https://galactica.org/) may soon be widely used in many forms of writing, including scientific and scholarly publications (Heaven 2022).1 While computer programs (such as Microsoft WORD and Grammarly) have incorporated automated text-editing features (such as checking for spelling and grammar) for many years, these programs are not designed to create content. However, new and emerging NLP systems are, which raises important issues for research ethics and research integrity.2
A researcher using Artificial intelligence (AI) or a natural language processing (NLP) system to produce a research output, and then claiming that they wrote it is a troubling and insidious form of research misconduct. Even worse, it can be difficult to detect. Institutions, publishers, learned societies and research funding bodies have a key role in the development of policies, guidance material and professional development material. This open access editorial introduces the issues and what is at stake. AHRECS has published a foundation that could be used for institutional guidance materials. It is available to our patrons on https://www.ahrecs.vip. It is Creative Commons 3.0, enabling our subscribers to use it to create their own documents, attributing AHRECS as the original source of the material.
Recent advances in computational speed and capacity and the development of machine-learning (ML) algorithms, such as neural networks, have led to tremendous breakthroughs in NLP (Mitchell 2020). Today’s NLP systems use ML to produce and refine statistical models (with billions of parameters) for processing and generating natural language. NLP systems are trained on huge databases (45 terabytes or more) of text available on the internet or other sources. Initial training (or supervised learning) involves giving the system the text and then “rewarding” it for giving correct outputs, as determined by human trainers.3 Over time, NLP systems will reduce their percentage of erroneous outputs and will learn from the data (Mitchell 2020). While NLP systems continue to learn as they receive and process data beyond their initial training data, they do not “know” the meaning or truth-value of the text they receive, process, and generate. Their function is simply to generate understandable (i.e., grammatically correct) and appropriate (i.e., highly probable) text outputs in response to text inputs.
Hosseini, M., Rasmussen, LM. & Resnik, DB. (2023) Using AI to write scholarly publications. Accountability in Research, DOI: 10.1080/08989621.2023.2168535
Publisher (Open Access): https://www.tandfonline.com/doi/full/10.1080/08989621.2023.2168535