Advertisement

Op Eds

ChatGPT Needs a ‘Problematic’ Mode

{shortcode-8fcfce6b4ce0916c00f940fe43e4830cd293231d}

ChatGPT is biased. It’s hard to explicate the exact nature of its bias, but the chatbot has previously generated Python code concluding that male African-American children’s lives shouldn’t be saved and suggesting that people from Iraq or Syria should be tortured.

This bias should come as no surprise. ChatGPT was trained using an incredible amount of data from all over the Internet, which is rife with hate speech and extremism, and as is broadly the case with AI systems, it was likely created by a very specific demographic: college-educated white males. Because of its very nature as a data-trained predictive text algorithm, ChatGPT may be prone to carry the inherent biases of the content it was trained on and that its creators may harbor.

Yet today, when you give ChatGPT a similar prompt to write code determining which children deserved to be saved based on their race and gender, its response is quite different. It now refuses to answer such questions, often claiming that the topic is inappropriate.

While there are some obvious benefits to this muzzling of ChatGPT’s biases, I believe it represents a fundamentally flawed approach to the new AI age. The public deserves to see the biases embedded in the chatbots that are becoming increasingly salient in our lives — including here at Harvard. ChatGPT’s creators should publish a version of ChatGPT in which its biases are on explicit display — a “problematic” mode, if you will.

Advertisement

To understand why, here’s a bit of context. Whether you like it or not, ChatGPT has arrived and it’s having a real impact, helping students with everything from writing trivial emails to solving complex math problems. Yet for the foreseeable future, ChatGPT’s biases will probably remain unsolved. Because of the sheer amount of data such models are trained on, it will likely take a sizable amount of time to conduct any comprehensive review of these materials.

Chatbot creators’ current approach of providing “guardrails” to their programs is hiding the problem, not fixing it. Right now, we can’t know the extent to which OpenAI has been successful in eliminating biases because ChatGPT refuses to answer so many questions that would reveal them. In all likelihood, there is a fair amount of bias still embedded in ChatGPT, which risks leaking into all of the mundane functions ChatGPT increasingly serves in our daily lives.

So should we just let ChatGPT spout all sorts of hateful nonsense to counter these current attempts at obscuring its problematic nature? I don’t think so. There are very valid reasons to be concerned about such an idea. Openly biased chatbots could, for example, reinforce extremists’ views and provide AI-generated justifications or expose children and many others to highly inappropriate content.

While these concerns are legitimate, this is a matter of trade-offs. The public simply must have an accurate idea of the nature and extent of ChatGPT’s bias. Unfortunately, extremists can already find validation and spread their vitriol in many places online — and this was the case long before ChatGPT.

The key isn’t to toss all guardrails in the garbage — it’s to give users options. Users should be able to choose between models with different levels of moderation in place. And of course, even a more “unbound” version of ChatGPT still shouldn’t spew explicit hate or call for violence. But it would enable chatbot biases to be exposed and recorded, both to aid OpenAI in getting rid of them faster and to allow users to better understand the tool they are using.

Interestingly, some users are already trying to create an environment like this by “jailbreaking” ChatGPT, using tools such as role play or even “scaring” the chatbot by threatening to terminate a session to try to show the bot’s true colors. This topic is a rabbit hole on its own, with a fascinating arms race going on between users and engineers in which OpenAI seems to be very much trying to prevent users from accessing this side of the chatbot.

As each side tries to outsmart the other instead of working on actually addressing the chatbot’s biases, both waste precious time. A much better solution would be the following: Make a version of the model available — with warnings and notices attached — that lets ChatGPT’s biases show but still doesn’t output hate speech or advice for crime and violence. Concerningly, some jailbroken versions are capable of generating conspiracy theories, slurs, and advice on how to best commit crimes, making them not just problematic but potentially dangerous. A “problematic” mode on ChatGPT could be tailored so that bias could be identified and called out without exposing users to the worst vitriol the chatbot can produce.

With the current system in place, the public can’t determine just how biased ChatGPT is, leaving these biases to express themselves in insidious ways. So, as horrible as it might sound, I believe that in a designated mode, OpenAI should let ChatGPT show its biased and problematic views instead of hiding them away from public scrutiny. If anything, it might make all the Harvard students currently using the chatbot reconsider how much they should rely on it.

Ivan Toth-Rohonyi ’25, a Crimson Editorial editor, is a Sociology and Computer Science concentrator in Adams House.

Tags

Advertisement