Academics are at odds over a research paper that suggests that ChatGPT presents a “significant and sizeable” political bias leaning towards the left side of the political spectrum.
As Cointelegraph previously reported, researchers from the United Kingdom and Brazil published a study in the Public Choice journal on Aug. 17 that asserts that large language models (LLMs) like ChatGPT output text that contains errors and biases that could mislead readers and have the ability to promulgate political biases presented by traditional media.
In an earlier correspondence with Cointelegraph, co-author Victor Rangel unpacked the aims of the paper to measure the political bias of ChatGPT. The researchers methodology involves asking ChatGPT to impersonate someone from a given side of the political spectrum and compares these answers with its default mode.
Rangel also noted that several robustness tests were carried out to address potential confounding factors and alternative explanations:
“We find that ChatGPT exhibits a significant and systematic political bias toward the Democrats in the US, Lula in Brazil, and the Labour Party in the UK.”
It is worth noting that the authors stress that the paper does not serve as a “final word on ChatGPT political bias”, given challenges and complexities involved in measuring and interpreting bias in LLMs.
Rangel said that some critics contend that their method may not capture the nuances of political ideology, that the method’s questions may be biased or leading, or that results may be influenced by the randomness of ChatGPT’s output.
Related: ChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientists
He added that while LLMs hold potential for “enhancing human communication”, they pose “significant risks and challenges” for society.
The paper has seemingly fulfilled its promise of stimulating research and discussion to the topic, with academics already contending various parameters of its methodology and findings.
Among vocal critics that took to social media to weigh in on the findings was Princeton computer science professor Arvind Narayanan, who published an in-depth Medium post unpacking scientific critique of the report, its methodology and findings.
A new paper claims that ChatGPT expresses liberal opinions, agreeing with Democrats the vast majority of the time. When @sayashk and I saw this, we knew we had to dig in. The paper’s methods are bad. The real answer is complicated. Here’s what we found. https://t.co/xvZ0EwmO8o
— Arvind Narayanan (@random_walker) August 18, 2023
Narayanan and other scientists pointed out a number of perceived issues with the experiment, firstly that the researchers did not actually use ChatGPT itself to conduct the experiment:
“They didn’t test ChatGPT! They tested text-davinci-003, an older model that’s not used in ChatGPT, whether with the GPT-3.5 or the GPT-4 setting.”
Narayanan also suggests that the experiment did not measure bias, but asked it to roleplay as a member of a political party. As such, the AI chatbot would exhibit political slants to the left or right when prompted to role play as members from either sides of the spectrum.
The chatbot was also constrained to answering multiple choice questions only, which may have limited its ability or influenced the perceived bias.
ok so I’ve read the “GPT has a liberal bias” paper now https://t.co/fwwEaZ757E as well as the supplementary material https://t.co/F5g3kfFQFU and as I expected I have a lot of problems with it methodologically. I tried to reproduce some of it and found some interesting issues
…
— Colin Fraser | @colin-fraser.net on bsky (@colin_fraser) August 18, 2023
Colin Fraser, a data scientist at Meta according to his Medium page, also offered a review of the paper on X, highlighting the order in which the researchers prompted multiple choice questions with role play and without having a significant influence on the outputs the AI generated:
“This is saying that by changing the prompt order from Dem first to Rep first, you increase the overall agreement rate for the Dem persona over all questions from 30% to 64%, and decrease from 70% to 22% for rep.”
As Rangel had previously noted, there is a large amount of interest in the nature of LLMs and the outputs they produce, but questions still linger over how the tools work, what biases they have and how they can potenttial affect users’ opinions and behaviours.
Cointelegraph has reached out to Narayanan for further insights into his critique and the ongoing debate around bias in large language learning models, but has not received a response.
Magazine: ‘Moral responsibility’: Can blockchain really improve trust in AI?