The events of the 1989 Tiananmen Square protests continue to be a subject of mystery. One of the most iconic figures from those protests is the man known to the world only as “Tank Man,” a figure who courageously stood in front of a line of military tanks.
While many details about him, including his fate, remain unknown, one thing is absolutely certain: Tank Man did not take a selfie with the tanks behind him.
Yet Google displayed an AI-generated image this week implying otherwise as its top search result for “Tank Man,” a result of indexing a popular Reddit post.
Much of the conversation accompanying the recent astronomical rise in AI capacity has focused on the pragmatic, material risks of such technology, such as machine bias and AI safety. But there have been far fewer discussions of the epistemic challenges that the technology’s outputs — like the distorted Tank Man — pose to our common knowledge foundation.
As students encased in a broader research community, we should be concerned by how generative AI threatens to tear apart the fabric of intellectual discourse. Moreover, regardless of the posts that we occupy after leaving Cambridge, we must be prepared for the possibility that the impacts of AI will reach across every corner of our lives. In short, we are living in a transformational moment, and it’s an open question how we and society writ large rise to the challenge, if at all.
The progress made by generative image tools over the last decade has been unprecedented. In 2014, image generation software could barely patch together a pixelated, distorted human face, and now OpenAI’s latest release of DALL-E can capture the intricate nuances of fantastical prompts, including “a vast landscape made entirely of various meats.”
Putting these advances in dialogue with philosopher Regina Rini’s scholarship yields some concerning conclusions. Rini argues that certain types of media — for her, specifically video and audio recordings — function as epistemic backstops, or foundational layers of knowledge that underpin our collective informational environments. It’s similar to the notion of “pulling the receipts” to prove your involvement in a certain situation, as opposed to just relying on the credibility of your assurance.
The rapid rise of indiscernible AI-generated images and videos poses a central challenge to this informational role, undermining a core epistemic function of our media. In addition to the harm presented in each individual case, the recurring inability to decide between real and fake images might result in us reflexively distrusting all recordings, thereby eroding a common basis of knowledge.
When I conversed via email with Matthew Kopec, the program director of Harvard’s Embedded EthiCS program, he shared this sentiment, writing that “these tools pose a serious threat to the health of our information ecosystems.”
At first glance, one can argue these advancements are but the latest development in a long history of media distortion: The vision of a pure, misinformation-free information ecosystem was always apocryphal. We’ve been building our lives on messy portraits for as long as we can remember. From the horribly-colored sepia Instagram filters of the early 2010’s to the teeth-whitening madness of FaceTune, distortions are now the expectation on social media, not the exception.
And tenuous information has been part of our ecosystem for decades and centuries. Part of the demand of being educated citizens is distinguishing between misinformation of all sorts — whether government propaganda, corporate ad campaigns, or straight-up fake news — and reality. The Tiananmen Square protests, fittingly, exemplify the struggle to find truth amidst the noise.
Up until now, though, the tools at our disposal have been relatively adequate. We evaluate the trustworthiness of where information comes from, how it’s been generated, and the context in which it’s presented. We ask our aunt where she got her most recent political news from, and we know when Photoshopped images look a little bit too good to be true.
But the newest generation of imaging tools threatens to change that.
For starters, human intuition is no match for this tech as humans can no longer differentiate between human and AI-generated content. Ana Andrijevic — a visiting researcher at Harvard Law School writing her dissertation on the impact of AI on copyright — pointed me to a 2019 survey by Pew Research Center, taken before the advent of Generative AI, which found that while 63 percent of respondents recognized the issue of made-up images, an almost equal number thought it was too much to ask the average person to recognize them.
“I am sure that we would reach similar conclusions today,” she wrote in an email. “Even if we can ask users to be even more critical of the images they are confronted with, I don’t think we can validly ask them to be able to detect on their own whether they are confronted with a Generative AI image.”
Intuitively, the solution to a technical problem is one of more technology, but this approach is similarly fruitless. Multiple platforms have started to acknowledge that they have an AI content problem. TikTok announced earlier this week that creators must explicitly disclose AI-generated content, while Instagram is supposedly working on an AI-detection tool of its own.
But the experts I spoke with were pessimistic about such initiatives. Mehtab Khan, a fellow at Harvard’s Berkman Klein Center for Internet and Society studying generative AI tools, wrote in an email that “current practices are clearly insufficient to deal with the challenges posed by generative AI.”
Human moderation, for one, is a nonstarter.
“The task of moderating online content will be even more difficult when misinformation can appear solely in image form, with no caption needed to have the desired misleading effect,” Kopec wrote.
With the issue of human moderators out of the way, what about digital tools? That also doesn’t work. As Andrijevic described it to me in an email, there is a “clear imbalance between the development and release of AI tools” and the technologies designed to detect them.
She pointed out that nearly all of the aforementioned AI-detection tools that companies have created — whether OpenAI’s DALL-E or Google DeepMind’s SynthID — only work for images created through the same technology. For example, Andrijevic wrote, Google’s technology “cannot be applied to images created by other tools, such as OpenAI’s Dall-E, or Midjourney.”
Given the inevitable flood of pseudo-real images and our current inability to distinguish them, we might finally be approaching the situation that Rini feared. The trust in basic sources of information required for her intellectual exchange will become increasingly strained, and, to be honest, it seems that we don’t have a clear solution. The task that lies before us, then, is to imagine what comes next.
“As the father of two young kids, I can only admit that I have absolutely no idea what digital literacy will need to look like in five or ten years and that this ignorance on my part deeply concerns me,” Kopec wrote.
“I hope we, as a society, put our best minds to work on that puzzle,” he added.
Of course, what is to come in the future, particularly with regard to emerging technologies, is necessarily speculative. But to address these challenges requires a concerted effort from all parties affected — technologists at AI companies, policymakers in DC, and philosophers like Rini — to solidify the informational ground we stand on. Otherwise, we risk living in a history constantly revising itself.
Andy Z. Wang ’23, an Associate News Editor, is a Social Studies and Philosophy concentrator in Winthrop House. His column, ‘Cogito, Clicko Sum," runs on triweekly Wednesdays.
{shortcode-6df99b9ed23f37ffc4eed572adc08c53966c3e48}
Read more in Opinion
Food for Thought