- Authors believe the problem is likely due to the level of randomness built into the current version of ChatGPT4.
- ChatGPT could be good at helping you think through a problem, but it’s not good at giving the answer.
It would be unwise to rely on ChatGPT for some health assessments, such as whether a patient with chest pain needs to be hospitalised amid ChatGPT’s reported ability to pass medical exams.
In a research published in the journal PLOS ONE, ChatGPT provided inconsistent conclusions in a study involving thousands of simulated cases of patients with chest pain and reported different heart risk assessment levels for the exact same patient data.
Moreover, the generative AI system also failed to match the traditional methods physicians use to judge a patient’s cardiac risk.
Dr. Thomas Heston, a researcher with Washington State University’s Elson S. Floyd College of Medicine and the lead author, said that ChatGPT was not acting in a consistent manner.
A lot of variation
“Given the exact same data, ChatGPT would give a score of low risk, then next time an intermediate risk, and occasionally, it would go as far as giving a high risk.”
The authors believe the problem is likely due to the level of randomness built into the current version of the software, ChatGPT4, which helps it vary its responses to simulate natural language. This same randomness, however, does not work well for healthcare uses that require a single, consistent answer, Heston said.
“We found there was a lot of variation, and that variation in approach can be dangerous. It can be a useful tool, but I think the technology is going a lot faster than our understanding of it, so it’s critically important that we do a lot of research, especially in these high-stakes clinical situations.”
Despite the negative findings of this study, Heston sees great potential for generative AI in health care – with further development.
“ChatGPT could be excellent at creating a differential diagnosis and that’s probably one of its greatest strengths,” said Heston.
“If you don’t quite know what’s going on with a patient, you could ask it to give the top five diagnoses and the reasoning behind each one. So it could be good at helping you think through a problem, but it’s not good at giving the answer.”