Publishers of big journals ramp up efforts to ensure more transparency over what material has been fed into the likes of ChatGPT
Academic publishers have called for more protections and greater transparency over the way artificial intelligence chatbots are trained, amid a string of lawsuits seeking to protect copyrighted material.
There is considerable commentary by creatives, academics and other content creators, expressing concern at their work and intellectual property have been used artificial intelligence systems, such as LLMs, such a ChatGPT. These concerns aren’t helped when the creators of those systems will not divulge the training data or the sample material that was used to train them and the systems do not attribute the sources they have used, much less attribute them. This Times Higher Education story reports calls from academic publishers in response to this issue.
Data “is going to prove to be the moat that companies protect themselves with against the onslaught of generative AI, especially large language models”, predicted Toby Walsh, Scientia professor of artificial intelligence at UNSW Sydney.
“I can’t imagine the publishers are going to watch as their intellectual property is ingested unpaid.”
Thomas Lancaster, a senior teaching fellow in computing at Imperial College London, agreed. “There are academic publishers out there who are very protective of their copyright, so I’m sure some are actively trying to work out what content is included in the GPT-4 archive,” he said.
“I wouldn’t be surprised if we see academic lawsuits in the future, but I suspect a lot will depend on any precedents that come through from the current claims.”