Artificial intelligence (AI) chatbots can write abstracts of fake research papers so convincingly that scientists often fail to spot them, according to a preprint posted on the bioRxiv server in late December1. Researchers are divided on the impact of science.
“I’m very concerned,” said Sandra Wachter, who studies technology and regulation at the University of Oxford in England, who was not involved in the study. “If we are now in a situation where experts cannot determine what is true or what is false, we lose the middleman we desperately need to guide us through complex topics,” she added.
Chatbot ChatGPT created realistic and intelligent writing Respond to user prompts. It is a ‘large language model‘, a neural network-based system that learns to perform tasks by digesting large amounts of existing human-generated text. San Francisco, Calif.-based software company OpenAI released the tool on Nov. 30, and it’s free to use.
Since its publication, researchers have been address ethical issues around its use, as most of its output is indistinguishable from human-written text.Scientists publish preprints2 and editorial3 Written by ChatGPT. Now, a group led by Catherine Gao at Northwestern University in Chicago, Illinois, is using ChatGPT to generate abstracts of human research papers to test whether scientists can spot them.
Researchers ask chatbots to JAMA, New England Journal of Medicine, british medical journal, Lancet and natural medicineThey then compared these abstracts with the original abstracts through a plagiarism detector and an AI output detector, and asked a team of medical researchers to spot fabricated abstracts.
Low-key and mysterious
The abstracts generated by ChatGPT passed the plagiarism checker without a hitch: the median originality score was 100%, indicating that no plagiarism was detected. The AI output detector found 66% of generated summaries. But the human reviewers didn’t do much better: they only correctly identified 68% of the generated abstracts and 86% of the real abstracts. They incorrectly identified 32% of generated summaries as real summaries and 14% of real summaries as generated.
“ChatGPT produces credible scientific abstracts,” Gao and colleagues said in a preprint. “The ethical and acceptable boundaries of using large language models to aid scientific writing remain to be determined.”
If scientists can’t be sure the research is true, there could be “dire consequences,” Wachter said. Because the research they are reading is fabricated, the researcher can be pulled down a flawed line of investigation, which is a problem for researchers, “with implications for society as a whole, because scientific research is in our society played such a huge role.” For example, this could mean that research-based decisions are incorrect, she adds.
But “it’s unlikely that any serious scientist would use ChatGPT to generate summaries,” says Arvind Narayanan, a computer scientist at Princeton University in New Jersey. Whether the generated summaries can be detected is “irrelevant,” he adds. “The question is whether the tool can generate accurate and compelling summaries. It can’t, so the benefits of using ChatGPT are minimal and the disadvantages are significant,” he said.
Irene Solaiman, researching the societal impact of artificial intelligence hug face, an artificial intelligence company based in New York and Paris, is concerned about the reliance of scientific thinking on large language models. “These models are trained on information from the past, and social and scientific progress often comes from thinking, or being open to thinking, unlike in the past,” she added.
The authors suggest that those evaluating scientific communication such as research papers and conference proceedings should develop policies that exclude the use of AI-generated text. If agencies choose to allow the technology to be used in certain circumstances, they should develop clear rules around disclosure. Earlier this month, the 40th International Conference on Machine Learning, a major AI conference to be held in Honolulu, Hawaii, in July, announced a ban on papers written using ChatGPT and other AI language tools.
Solaiman added that in fields where disinformation could endanger people’s safety, such as medicine, journals may have to take a more stringent approach to verifying the accuracy of information.
Solutions to these problems should not focus on chatbots themselves, Narayanan said, “but rather on the perverse incentives that lead to such behavior, such as universities counting the number of papers they review for hiring and promotion, without regard to their quality or impact.”
This article is reproduced with permission, has been reprinted first published January 12, 2023.