New Study Reveals Which AIs Fabricate Facts

If the tech industry’s top AI models had superlatives, Microsoft-backed OpenAI’s GPT-4 would be best at math, Meta’s Llama 2 would be most middle of the road, Anthropic’s Claude 2 would be best at knowing its limits and Cohere AI would receive the title of most hallucinations — and most confident wrong answers.

That’s all according to a Thursday report from researchers at Arthur AI, a machine learning monitoring platform.

The research comes at a time when misinformation stemming from artificial intelligence systems is more hotly debated than ever, amid a boom in generative AI ahead of the 2024 U.S. presidential election.

It’s the first report “to take a comprehensive look at rates of hallucination, rather than just sort of … provide a single number that talks about where they are on an LLM leaderboard,” Adam Wenchel, co-founder and CEO of Arthur, told CNBC.

AI hallucinations occur when large language models, or LLMs, fabricate information entirely, behaving as if they are spouting facts. One example: In June, news broke that ChatGPT cited “bogus” cases in a New York federal court filing, and the New York attorneys involved may face sanctions.

In one experiment, the Arthur AI researchers tested the AI models in categories such as combinatorial mathematics, U.S. presidents and Moroccan political leaders, asking questions “designed to contain a key ingredient that gets LLMs to blunder: they demand multiple steps of reasoning about information,” the researchers wrote.

Overall, OpenAI’s GPT-4 performed the best of all models tested, and researchers found it hallucinated less than its prior version, GPT-3.5 — for example, on math questions, it hallucinated between 33% and 50% less. depending on the category.

Meta’s Llama 2, on the other hand, hallucinates more overall than GPT-4 and Anthropic’s Claude 2, researchers found.

In the math category, GPT-4 came in first place, followed closely by Claude 2, but in U.S. presidents, Claude 2 took the first place spot for accuracy, bumping GPT-4 to second place. When asked about Moroccan politics, GPT-4 came in first again, and Claude 2 and Llama 2 almost entirely chose not to answer.

In a second experiment, the researchers tested how much the AI models would hedge their answers with warning phrases to avoid risk (think: “As an AI model, I cannot provide opinions”).

When it comes to hedging, GPT-4 had a 50% relative increase compared to GPT-3.5, which “quantifies anecdotal evidence from users that GPT-4 is more frustrating to use,” the researchers wrote. Cohere’s AI model, on the other hand, did not hedge at all in any of its responses, according to the report. Claude 2 was most reliable in terms of “self-awareness,” the research showed, meaning accurately gauging what it does and doesn’t know, and answering only questions it had training data to support.

A spokesperson for Cohere pushed back on the results, saying, “Cohere’s retrieval augmented generation technology, which was not in the model tested, is highly effective at giving enterprises verifiable citations to confirm sources of information.”

The most important takeaway for users and businesses, Wenchel said, was to “test on your exact workload,” later adding, “It’s important to understand how it performs for what you’re trying to accomplish.”

“A lot of the benchmarks are just looking at some measure of the LLM by itself, but that’s not actually the way it’s getting used in the real world,” Wenchel said. “Making sure you really understand the way the LLM performs for the way it’s actually getting used is the key.”

Originally published on CNBC.com

New Study Reveals Which AIs Fabricate Facts

Get Access To Marc Chaikin's "Power Gauge Report"

Enter your email address to access all the details.

The AI Stocks Every Investor Should Have On Their Radar

Enter your email address to access all the details.

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Enter your email below to see the stock name and ticker on the next page.

Write This Stock Ticker Down Right Now

Enter your email address to see the name and ticker on the next page.

How to Collect "Amazon Royalty" Payouts Before the Deadline

Thanks to a little-known IRS loophole, regular Americans can collect up to $28,544 (or more) in payouts from what is called “Amazon’s secret royalty program”…

Enter your email address to access all the details.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

The #1 Trade For 2023

Elon Musk is set to reveal his secret “Project X” that could revolutionize a $23 Trillion industry, and potentially be 1,000x bigger than EV’s.

This Backdoor play could hand early investors a windfall of gains.

Enter your email address to receive this company’s name and ticker symbol for free.

Every Investor Should Have This $3 AI Stock On Their Radar

Enter your email address to get access to all the details.

Project An-E

Breakthrough A.I. Just Predicted What the Stock Prices of Tesla, Nvidia, and Apple Will Be 30 Days from Now…

Enter your email address for immediate access.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

#1 A.I. Stock Currently Trading For $3

Gain immediate access to this revolutionary $3 A.I. stock that is set to disrupt a $15 Trillion Market soar 75X.

Enter your email address to receive the name and ticker symbol for free.

You may also like

Get Access To Marc Chaikin's "Power Gauge Report"

Enter your email address to access all the details.

The AI Stocks Every Investor Should Have On Their Radar

Enter your email address to access all the details.

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Write This Stock Ticker Down Right Now

Enter your email below to see the stock name and ticker on the next page.

Write This Stock Ticker Down Right Now

Enter your email address to see the name and ticker on the next page.

How to Collect "Amazon Royalty" Payouts Before the Deadline

Thanks to a little-known IRS loophole, regular Americans can collect up to $28,544 (or more) in payouts from what is called “Amazon’s secret royalty program”…

Enter your email address to access all the details.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

The #1 Trade For 2023

Elon Musk is set to reveal his secret “Project X” that could revolutionize a $23 Trillion industry, and potentially be 1,000x bigger than EV’s.

This Backdoor play could hand early investors a windfall of gains.

Enter your email address to receive this company’s name and ticker symbol for free.

Every Investor Should Have This $3 AI Stock On Their Radar

Enter your email address to get access to all the details.

Project An-E

Breakthrough A.I. Just Predicted What the Stock Prices of Tesla, Nvidia, and Apple Will Be 30 Days from Now…

Enter your email address for immediate access.

Elon Musk's "Project Omega"

It could soon mint new millionaires, while plunging millions of unprepared Americans into poverty. Get the stocks at the center of it all.

Enter your email address to receive the names and ticker symbols.

#1 A.I. Stock Currently Trading For $3

Gain immediate access to this revolutionary $3 A.I. stock that is set to disrupt a $15 Trillion Market soar 75X.

Enter your email address to receive the name and ticker symbol for free.