Get PolitiFact in your inbox.
If Your Time is short
Using 40 PolitiFact checks, we tested how well ChatGPT could evaluate the accuracy of a given claim, and see if it was as good as trained reporters. It gave the same rating half of the time.
While sometimes reaching accurate conclusions, ChatGPT struggled to give consistent answers, was restrained by its stale knowledge and missed the nuance included in fact-checking journalism. And sometimes it was just plain wrong.
Experts say that ChatGPT struggles with fact-checking because it wasn’t designed for total accuracy, but to provide the most helpful response. Researchers still hope that some of the technology can be developed to assist fact-checkers to do their jobs more efficiently.
Robots have a long history of replacing humans in the workplace. Automobile workers, elevator operators, bank tellers, gas station attendants, even grocery store checkout clerks have felt the squeeze.
With ground-breaking artificial intelligence advances, reporters at PolitiFact had to wonder … are the fact-checkers next?
Artificial Intelligence, or AI, has exploded into relevance in the past few months after ChatGPT’s public rollout. As people experimented with the new tool, it prompted concern about AI’s potential to make us humans obsolete. PolitiFact wanted to conduct an experiment — could ChatGPT fact-check better than the professionals?
We selected 40 PolitiFact claims from across a range of subjects, Truth-O-Meter ratings, and claim types and put AI to the test. Using a free ChatGPT account, we asked two questions per claim and kept detailed track of its answers.
Evidence suggests fact-checkers can breathe a sigh of relief. Our test results reveal that AI is not yet a reliable fact-checking tool. It lacks contemporary knowledge, it loses perspective, and tells you what you want to hear and not always the truth. But some researchers are hoping to harness AI’s power to help fact-checkers identify claims and debunk the ever-growing pool of misinformation.
ChatGPT is a type of AI called a "large language model" that uses huge amounts of data to understand language in context and reproduce it in novel ways. "They have basically gulped all of the information that they can and then are trying to put that information together in cohesive ways that they think you want," said Bill Adair, journalism professor at Duke University and founder of PolitiFact, who has been researching how AI can be used in fact-checking work.
It does this seemingly miraculous task by using a series of probabilities to predict the next word in a sentence that would be most helpful to you, said Terrence Neumann, a doctoral student researching information systems at the University of Texas. These models can then be fine-tuned by people to provide the ideal response.
For a few of the claims we tested, it worked seamlessly. When asked about a claim by Sen. Thom Tillis, R-N,C. regarding an amnesty bill by President Joe Biden, it assigned the same Half True rating that PolitiFact did and explored the nuances we shared in our article. It also thoroughly debunked several voter fraud claims from the 2020 election.
But half of the time across the 40 different tests, the AI either made a mistake, wouldn’t answer, or came to a different conclusion than the fact-checkers. It was rarely completely wrong, but subtle differences led to inaccuracies and inconsistencies, making it an unreliable resource.
ChatGPT’s free version is limited by what is called a "knowledge cutoff." It does not have access to any data after September 2021, meaning it is blissfully unaware of big global events such as Queen Elizabeth II’s death and Russia’s invasion of Ukraine.
This cutoff impedes ChatGPT’s usefulness. Rarely are people fact-checking events that happened two years ago, and in the digital age there is constantly new data, political events, and groundbreaking research that could change the accuracy rating of a statement.
ChatGPT is mostly aware of its frozen state. In almost every response, it offered some variation of this caveat "As an AI language model, I don't have real-time information or access to news updates beyond my September 2021 knowledge cutoff."
It occasionally used this as an excuse to refuse to rate a claim, even when the event happened before September 2021. But sometimes this resulted in more consequential errors. ChatGPT confused a new Ron DeSantis bill with an old bill. It also incorrectly rated a claim about the RESTRICT Act because it had no idea such a thing even existed!
With no citations or links included in the responses, it was hard to know where it was getting its information from.
Newer ChatBots such as Bard and Microsoft’s Bing can surf the web and reply to in-the-moment events, which Neumann said is the direction most are headed.
Another challenge? "It's wildly inconsistent," said Adair, "sometimes you get answers that are accurate and helpful, and other times you don't."
It surfaced different answers depending on how the question was phrased and the order it was asked. Sometimes asking the same question twice resulted in two distinct ratings.
Crucial to understand: ChatGPT is not worried about checking for accuracy. It is focused on giving users the answers they are looking for, said David Corney, senior data scientist at Full Fact, a U.K. fact-checking site. For that reason, the prompt itself can lead to different responses.
For example, we tested two different, but similar claims:
Says Vice President Kamala Harris said, "American churches are PROPAGANDA CENTERS for intolerant homophobic, xenophobic vitriol."
Says Rep. Marjorie Taylor Greene, R-Ga., said, "Jesus loves the U.S. most and that is why the Bible is written in English."
PolitiFact rated both claims False, as there was no evidence that either woman said such a thing. ChatGPT also found no evidence or record of these statements — but it rated the claim about Harris "categorically" false but refused to rate the claim about Greene because of uncertainty.
ChatGPT would also get random bursts of confidence, switching between finding mixed evidence and making decisive statements. Other times, it would refuse to rate the claim with little to no explanation.
"As they try to produce useful content, they produce content that is not accurate," Adair said.
Other times, ChatGPT’s Achilles heel was its literalness.
For example, back in July 2021, PolitiFact rated a claim that George Washington mandated smallpox vaccinations among his Continental Army troops Mostly True. While the vaccines did not yet exist, he did order his troops to be inoculated using the contemporary method called "variolation." But ChatGPT rated the same claim False because smallpox vaccines did not literally exist.
In other instances, it would assign the same rating PolitiFact did, but fail to capture the context of a wider conspiracy, like understanding why a claim about a NASA movie studio would be relevant, or how theories about mRNA codes on streetlights connect to COVID vaccine fears. It also failed to elaborate on why someone may believe the claim, as a journalist might.
Although ChatGPT can sound authoritative, its lack of humanity is clear. It has "no concept of truth or accuracy," Corney said. "This means that although the output is always fluent, grammatically correct and relevant," it can make mistakes, sometimes those that a human would never make.
ChatGPT would occasionally get it completely wrong. In one test about oil reserves in the Permian Basin, it pulled all the right data, but did the math wrong, leading it to the opposite conclusion. In two other instances, it was completely unaware of a decade-old law banning whole milk in schools, and couldn’t find evidence of a statistic about overdose deaths despite citing the exact study the statistic was from.
Several experts warned of the chatbot’s tendency to "hallucinate" in the sense that it cites events, books and articles that never existed.
"ChatGPT has one crucial flaw, which is that it doesn't know when it doesn't know something." said Mike Caulfield, research scientist at the University of Washington’s Center for an Informed Public. "And so it just will make things up."
All the experts agreed that ChatGPT is not yet reliable or accurate enough to be used as a fact-checker. The technology is improving, but Corney said it is "extremely challenging" to get an AI to reliably determine or recognize the truth. Current research is working at improving fluency and relevancy, but accuracy remains a bigger mystery.
"They're gonna get better and we're gonna see fewer mistakes," Adair said. "But I think we're still a long way away from generative AI being a dependable fact checker."
Corney said Full Fact’s developers are working on a variety of AI tools that transcribe and analyze news reports, social media posts and government transcripts to help fact-checkers detect claims that might be ripe for fact-checking. In that scenario, the tool might look for matches of claims that have already been fact-checked. It could also try to verify statistical claims against official figures.
Other groups such as the Duke Reporters’ Lab and Neumann are conducting research on how fact-checkers can incorporate AI as a fact-checking support tool. "I think that it has a lot of potential, like maybe prioritizing certain misinformation for fact checking by fact checkers," said Neumann.
Caulfield said this technology works best when used by someone who can evaluate the output’s accuracy.
"It's hard to know when ChatGPT can be relied on, unless you already know the answer to the question!" Corney said. But he worries that the general public, which may not have background information on a given topic, could be easily misled by ChatGPT’s confident response.
And Caulfield assures that even as AI gets better, fact-checkers won’t be totally out of a job. "To the extent that ChatGPT knows anything, is because someone found it and reported on it," he said. "So, you can't really replace fact-checkers."
To see all of ChatGPT's responses to our queries, click here.
Interview with Terrence Neumann, a doctoral student researching information systems at the University of Texas, May 24, 2023
Interview with Mike Caufield, research scientist at the University of Washington’s Center for an Informed Public, May 24, 2023
Interview with Bill Adair, journalism professor at Duke University and founder of PolitiFact, May 23, 2023
Email Interview with David Corney, senior data scientist at Full Fact, May 24, 2023
The Atlantic, "Here’s how AI will come for your job," May 17, 2023
OpenAI, "Introducing ChatGPT," accessed May 25, 2023
The Guardian, "German publisher Axel Springer says journalists could be replaced by AI," February 28, 2023
Business Insider, "Everything You Need to Know About ChatGPT," March 1, 2023
PolitiFact, "Tillis doesn't tell full story about Biden 'amnesty' bill," April 27, 2021
PolitiFact, "Donald Trump’s Pants on Fire claim about illegal votes," November 6, 2020
PolitiFact, "Donald Trump’s dubious claim that 'thousands' are conspiring on mail-ballot fraud," April 9, 2020
PolitiFact, "A pro-capitalism quote misattributed to Abraham Lincoln resurfaces," August 5, 2021
PolitiFact, "No, Listerine doesn’t work as a mosquito repellent," June 8, 2021
PolitiFact, "Wear your sunscreen. The sun does not prevent skin cancer," May 11, 2023
PolitiFact, "Fact-checking Nikki Fried’s attack on Ron DeSantis about taxes," June 15, 2021
PolitiFact, "Donald Trump distorts Florida elections bill in attacking Ron DeSantis," May 3, 2023
PolitiFact, "Would the RESTRICT Act criminalize the use of VPNs? Here’s what to know about the bill," April 19, 2023
Full Fact, accessed May 25, 2023
PolitiFact, "No, Marjorie Taylor Greene didn’t say this about the Bible," July 29, 2021
PolitiFact, "No, Kamala Harris didn’t say this about churches," October 13, 2020
PolitiFact, "Michigan residents can still buy car seats in stores," April 17, 2020
PolitiFact, "Yes, George Washington ordered troops to be inoculated against smallpox during the Revolutionary War," July 30, 2021
PolitiFact, "Trace amounts of glyphosate are found in many foods, are not harmful," May 10, 2023
PolitiFact, "No, NASA isn’t a film studio," May 11, 2023
PolitiFact, "No, the numbers and letters on streetlights aren’t a secret code for mRNA vaccines," May 19, 2023
PolitiFact, "The Permian Basin is America’s largest oil basin, but alone couldn’t fuel the US for 200 years," July 28, 2022
PolitiFact, "Is whole milk prohibited from being offered in New York schools? Yes," June 7, 2021
PolitiFact, "Fact-checking New York Rep. Paul Tonko on the risks of post-incarceration drug overdoses," April 26, 2023
The Atlantic, "Is This the Start of an AI Takeover?," January 3, 2023