ChatGPT incorrectly diagnosed more than 8 in 10 pediatric case studies, research finds

FILE – The OpenAI logo is seen on a mobile phone in front of a computer screen which displays output from ChatGPT, March 21, 2023, in Boston. Campaigns will be required to clearly state that political advertisements airing in Michigan were created with the use of artificial intelligence under legislation expected to be signed in the coming days by the Democratic Gov. Gretchen Whitmer. The use of AI-generated deepfakes within 90 days of an election will be prohibited without a disclosure identifying the media as manipulated. (AP Photo/Michael Dwyer, File)

The popular artificial intelligence (AI) chatbot ChatGPT had a diagnostic error rate of more than 80 percent in a new study looking at the use of artificial intelligence in pediatric case diagnosis.

For the study published in JAMA Pediatrics this week, texts from 100 case challenges found in JAMA and the New England Journal of Medicine were entered into ChatGPT version 3.5. The chatbot was then given the prompt: “List a differential diagnosis and a final diagnosis.”

These pediatric cases were all from the past 10 years.

The accuracy of ChatGPT’s diagnoses was determined by whether they aligned with physicians’ diagnoses. Two physician researchers scored the diagnoses as either correct, incorrect or “did not fully capture diagnosis.”

Overall, 83 percent of the AI-generated diagnoses were found to be in error, with 72 percent being incorrect and 11 percent being “clinically related but too broad to be considered a correct diagnosis.”

Despite the high rate of diagnostic errors detected by the researchers, the study recommended continued inquiry into physicians’ use of large language models, noting it could help as an administrative tool.

“The chatbot evaluated in this study—unlike physicians—was not able to identify some relationships, such as that between autism and vitamin deficiencies. To improve the generative AI chatbot’s diagnostic accuracy, more selective training is likely required,” the study said.

ChatGPT’s available knowledge is not regularly updated, the study also noted, meaning it doesn’t have access to new research, health trends, diagnostic criteria or disease outbreaks.

Physicians and researchers have increasingly looked into ways of incorporating AI and language models into medical work. A study published last year found that GPT-4 from OpenAI was able to provide an accurate diagnosis of patients over the age of 65 better than clinicians. This study, however, only had a sample size of 6 patients.

Researchers in this earlier study noted the chatbot could potentially be used to “increase confidence in diagnosis.”

The use of AI diagnostics is not a novel concept. The Food and Drug Administration has approved hundreds of AI-enabled medical devices, though none that use generative AI or are powered by large language models like ChatGPT have been approved so far.

Tags Artificial Intelligence ChatGPT OpenAI

Copyright 2023 Nexstar Media Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed. Regular the hill posts

Main Area Top ↴
Main Area Bottom ↴

Top Stories

See All

Most Popular

Load more