Artificial intelligence has a serious problem with hallucination, and it’s only getting worse.

IT leaders expect new technologies to keep improving, but this isn’t currently the case with AI models: Since ChatGPT grabbed the world’s attention after its official release by OpenAI in November 2022, the accuracy of many reasoning AI models has decreased.  

However, AI models have improved their math skills, which is to be expected. But they are failing when it comes to providing accurate general information.

The newest OpenAI system, o3, hallucinates 33% of the time when performing OpenAI’s PersonQA benchmark test, which involves answering questions about public figures.

The 33% failure rate is more than twice that of the previous reasoning model, o1. Another new OpenAI model, o4-mini, performed even worse, with a 48% error rate.

Another OpenAI test, SimpleQA, answers general questions. The error rates for o3 and o4-mini on this test were 51% and 79%, respectively. That’s significantly worse than the error rate of the previous system, o1, which was 44%.

As for GPT 4-5, the newest upgrade to ChatGPT, its hallucination rate for the SimpleQA test is 37%. That means only 6 out of 10 answers are correct. Not very impressive, is it?

Exactly why reasoning models are so often wrong isn’t clearly understood. One problem is that AI models are trained on data, and the quality of their output is dependent, in part, on the quality of the data that they ingest. Another problem is that AI models are often programmed to provide an answer even if they don’t possess the necessary information, so when they don’t have the required information, they are essentially guessing when they churn out an answer.

Every Problem Has a Solution

Given the frequency of mistakes by AI models, IT leaders should consider these two corrective approaches:

Educate employees about AI models and the widespread problem of hallucinations. One of the most helpful ways to do this is to share and discuss newsworthy information about hallucinations, such as this New York Times article, “A.I. Is Getting More Powerful but Its Hallucinations are Getting Worse.”

Many employees are expected to use AI models, but not enough receive any AI training. Every IT leader needs to train their employees about using AI models, and one of the facets of that training is to ensure that they review all AI-generated outputs and verify the accuracy of the information.

This is a pressing workplace issue because U.S. employees use of AI tools is continually increasing. The percentage of U.S. workers who say they use AI in their role a few times a year is now 40%, up from 21% two years ago. And 8% of U.S. workers use AI daily.

Employees should be educated on AI, so they use it responsibly. Or these hallucinations may well become nightmares that come back to haunt you and your organization.

Share Button