Washington State University professor Mesut Cicek and his research team repeatedly tested ChatGPT by giving it hypotheses taken from scientific papers. The goal was to see if the AI could correctly determine whether each claim was supported by research or not — in other words, whether it was true or false.

In total, the team evaluated more than 700 hypotheses and asked the same question 10 times for each one to measure consistency.

Accuracy Results and Limits of AI Performance

When the experiment was first conducted in 2024, ChatGPT answered correctly 76.5% of the time. In a follow-up test in 2025, accuracy rose slightly to 80%. However, once the researchers adjusted for random guessing, the results looked far less impressive. The AI performed only about 60% better than chance, a level closer to a low D than to strong reliability.

The system had the most difficulty identifying false statements, correctly labeling them only 16.4% of the time. It also showed notable inconsistency. Even when given the exact same prompt 10 times, ChatGPT produced consistent answers only about 73% of the time.

Inconsistent Answers Raise Concerns

“We’re not just talking about accuracy, we’re talking about inconsistency, because if you ask the same question again and again, you come up with different answers,” said Cicek, an associate professor in the Department of Marketing and International Business in WSU’s Carson College of Business and lead author of the new publication.

“We used 10 prompts with the same exact question. Everything was identical. It would answer true. Next, it says it’s false. It’s true, it’s false, false, true. There were several cases where there were five true, five false.”

AI Fluency vs. Real Understanding

The findings, published in the Rutgers Business Review, highlight the importance of using caution when relying on AI for important decisions, especially those that require nuanced or complex reasoning. While generative AI can produce smooth, convincing language, it does not yet demonstrate the same level of conceptual understanding.

According to Cicek, these results suggest that artificial general intelligence capable of truly “thinking” may still be further away than many expect.

“Current AI tools don’t understand the world the way we do — they don’t have a ‘brain,'” Cicek said. “They just memorize, and they can give you some insight, but they don’t understand what they’re talking about.”

Study Design and Methods

Cicek worked with co-authors Sevincgul Ulu of Southern Illinois University, Can Uslay of Rutgers University, and Kate Karniouchina of Northeastern University.

The team used 719 hypotheses from scientific studies published in business journals since 2021. These types of questions often involve nuance, with multiple factors influencing whether a hypothesis is supported. Reducing such complexity to a simple true or false judgment requires careful reasoning.

The researchers tested the free version of ChatGPT-3.5 in 2024 and the updated ChatGPT-5 mini in 2025. Overall, performance remained similar across both versions. After adjusting for random chance, which gives a 50% probability of a correct answer, the AI’s effectiveness was only about 60% above chance in both years.

Key Weakness in AI Reasoning

The results point to a fundamental limitation of large language model AI systems. Although they can generate fluent and persuasive responses, they often struggle to reason through complicated questions. This can lead to answers that sound convincing but are actually incorrect, Cicek said.

Why Experts Urge Caution With AI

Based on these findings, the researchers recommend that business leaders verify AI-generated information and approach it with skepticism. They also emphasize the need for training to better understand what AI systems can and cannot do effectively.

Although this study focused specifically on ChatGPT, Cicek noted that similar experiments with other AI tools have produced comparable outcomes. The work also builds on earlier research pointing to caution around AI hype. A 2024 national survey found that consumers were less likely to purchase products when they were marketed with a focus on AI.

“Always be skeptical,” he said. “I’m not against AI. I’m using it. But you need to be very careful.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *