News

How Cambridge Is Fighting the Trump Administration in Court

News

How Grievances at the Harvard Law Review Became Ammunition for the White House

News

A Divinity School Program Became a Political Liability. In One Semester, Harvard Took It Apart.

News

In Fight Against Trump, Harvard Goes From Media Lockdown to the Limelight

News

The Changing Meaning and Lasting Power of the Harvard Name

News

Can Harvard Bring Students’ Focus Back to the Classroom?

News

Harvard Activists Have a New Reason To Protest. Does Palestine Fit In?

‘Definitely Unethical’: Datamatch Leak First Exposed by UCLA Students in Private Discord

By Jo B. Lemann and Neil H. Shah, Crimson Staff Writers March 4, 2024

{shortcode-f46da15d80e2a729c53fbb26cb98110bdb9a9f2a}

Updated March 4, 2024, at 11:35 a.m.

{shortcode-24643cedbe14221289878261864001a8ceef067a}ne week before Harvard student Sungjoo Yoon ’27 revealed Datamatch data vulnerabilities via an anonymous website, a group of UCLA students first discovered that they could access and scrape the site’s user data — information they exposed in a private Discord server.

While Yoon said he did not peruse the Datamatch information and expressed a commitment to securing user data on his site, the UCLA students didn’t abide by the same set of standards. They had also discovered some vulnerabilities beyond Yoon’s findings, such as exploits that allowed them to manipulate Datamatch’s built-in conversation starters and direct messages.

One student used a script to gather the profile data of at least 16,000 users who submitted a profile picture. The student compiled a file with both public profile information, including names, schools, dorms, and class years, as well as three details that could have been set to private — Rice Purity Test scores, Zodiac signs, and MBTI personality types.

Some members of the group targeted specific users and made crude and offensive remarks about their profiles, including Rice Purity Test scores, in a private Discord server.

Stanford freshman Nazar D. Khan — who was in the server with the UCLA undergraduates — acted as the students’ liaison with Yoon, who was not a member of the server. Yoon reached out to Khan after learning about the security vulnerabilities through Instagram, purporting to be a journalist and expressing a desire to reveal Datamatch’s privacy flaws through a news outlet.

Khan wrote in a follow-up statement after this article’s publication that his role was “fundamentally that of a mediator between Sungjoo Yoon and a group of UCLA students who had first identified vulnerabilities within the Datamatch system.”

“The actions taken by Yoon and the offensive behavior exhibited by the UCLA students directly oppose my values and ethics,” Khan added.

Yoon exposed the information anonymously without first contacting Datamatch through official channels or providing the organization a chance to address the security flaws. While Khan did not have access to Datamatch himself, he explained the vulnerabilities to Yoon and worked through them with him over a Zoom call.

The Crimson verified this information through interviews with Yoon, one of the UCLA students, and Khan. A transcript of the students’ Discord server was also shared with The Crimson on the condition that messages would not be attributed to users by name. This transcript contained an early version of the file with the compiled Datamatch user profile data.

Yoon, who mentioned an anonymous tip on his website, confirmed that Khan had reached out to him — though Yoon said he had also been told about the vulnerabilities from a separate “member of the computing community.”

“We are deeply saddened and disappointed to learn more about how the students who originally took advantage of the security breach decided to use that information,” Datamatch Co-President Lily E. Liu ’25 wrote in a statement to The Crimson.

“As this year’s Datamatch is finished and the website is closed, it should not be possible for additional actors to acquire data,” Liu added. “If loopholes remain, we will quickly close them the moment we find them or are notified of them.”

‘The Cybersecurity Equivalent of Freeballing Jorts’

In the Discord server, the group first discussed their discovery of Datamatch’s data vulnerability on Feb. 18, and began to search for the Rice Purity scores of specific individuals — many of whom had set the information to private.

The Rice Purity Test — which was created for undergraduates at Rice University to measure their maturity level throughout college — asks 100 questions about sexual activity, drug use, and other illicit activities. Students receive a lower score based on how many activities they claim to have participated in.

The students began mocking users’ Rice Purity scores and shamed those with lower Rice Purity scores. They specifically targeted users with Islamic names, commenting on whether their Rice Purity scores made them “halal” or “haram.”

“I thought datamatch was cool initially but I changed my mind after seeing all those sinners,” one student wrote in the server.

Some messages in the server included highly offensive language directed at the users whose profiles they searched for — including sexist, racist, and ableist remarks.

The group also specifically sought out the Rice Purity scores of members of Datamatch’s team, mocking them for leaving their own data vulnerable.

“how the hell does this guy get laid,” one member wrote about a Datamatch developer.

The chat also contained sexually explicit discussions of certain students whose photos and Rice Purity scores were shared among the group.

Another member wrote that they would “love to do an inside job” on a female member of the Datamatch team.

The students in the server collectively expressed surprise at what they perceived as Datamatch’s failure to secure the information that they had gathered.

“this is the cybersecurity equivalent of freeballing jorts,” a student wrote.

The group also discovered that they were able to spoof the “conversation starter” messages that users can send on the Datamatch interface. They shared screenshots of sexually explicit parody starter messages they created and sent them to one another. They also demonstrated how they were able to change the timestamp on these messages.

Khan, the Stanford student, wrote in his follow-up statement that he finds “the actions of those members of the group who delved into personal data and made derogatory remarks abhorrent.”

“I condemn such actions in the strongest terms,” he added.

{shortcode-fd150233d280bb759d1e48cfb7c1fcba472e8ec1}

In their statement, Liu — the Datamatch co-president — acknowledged the message spoofing but said that the Datamatch team had “determined after careful inspection” that users were only able to do this in chats they were participants in.

“While this is something we will fix in future versions, it only affects the two users in the modified chat and provides a similar effect to editing a message in the chat (such as on iMessage),” she added.

The organization previously sent an email to all users last week implicitly acknowledging Yoon’s site and apologized for any private data that had been made public as a result of the data vulnerability.

“A user with a Datamatch account could gain access to the Rice Purity score, MBTI type, and Zodiac sign of any other user of Datamatch, regardless of whether or not these attributes were set to public or private by the user,” they wrote.

Liu, in response to a comment request for this article, wrote on Sunday that these were the only pieces of “incorrectly configured” information.

Beyond looking at specific users, the group’s members also discussed potential ways they could use the aggregated data. For instance, they discussed potentially analyzing the data they compiled to predict users’ Rice Purity scores based on their profiles.

“I want to analyze the data and match the rice purity score with face,” one member wrote.

While it is unclear if any of the members of this server still possess a copy of the user profile archive, it was shared amongst the members of Discord server. At the time The Crimson’s copy of the transcript was generated, the full archive had already been deleted but an earlier version was still accessible.

A member also joked about selling the file containing the scraped user profiles on the “dark web” or posting it on the anonymous internet forum 4chan.

“you'd actually probably find a buyer lol,” another member responded. “cuz think about it you now have data on so many ivy league kids.”

‘Writing a Criticism’

On Feb. 19, Khan posted an Instagram story containing a redacted screenshot of a Datamatch user profile — with the caption “Rip datamatch.me and the fact that literally send private data through their user search.”

Yoon — from his own Instagram account — replied to Khan’s story, describing himself as a “student journalist at Harvard currently writing a criticism of Datamatch and its presence on campus” and asking Khan to elaborate beyond the post.

Over a Zoom call, Yoon and Khan experimented with the Datamatch site. Khan worked with Yoon over the call to relay parts of what the UCLA group had found.

While their descriptions of the process behind the tip were in alignment, they disagreed over the nature in which Yoon had agreed to take the information public.

“Yoon posed as a journalist,” Khan wrote in a statement, adding that he expected Yoon to post a piece online rather than “outright posting the data online and the steps to take” to obtain it.

{shortcode-d7f98a012f63bb3bd5d354cd1e526c25a3619825}

Yoon responded to Khan’s criticism by describing his site as “definitionally a work of activist journalism” — due to the “3 pages of written social commentary and data [he] aggregated that’s intended to raise awareness about data privacy.”

Per screenshots of Instagram messages obtained by The Crimson, Yoon had told Khan that he “knew some people on the [Datamatch] team” and that his goal was to “force them to tighten up ultimately.”

In their messages, Yoon also said he intended to reach out to The Crimson, though The Crimson was not contacted prior to the publication of his site. Yoon wrote that he intended to first try to publish the project through a newspaper before resorting to publishing using his own “Harvard Ethical Hacking Project” if the story was not picked up.

“Hoping for the former but you know how it is nowadays,” Yoon wrote.

In an interview, Yoon admitted to being unfamiliar with “The Crimson’s organizational structure,” but said he had floated the story to individuals who “were loosely associated with The Crimson.”

“I realized I don’t think it’s going to get as much attention about data privacy if I do that — if I submitted a guest opinion or told someone about it and they loosely reported on it — as if I did something that was a little more attention-grabbing, like I did,” he added.

The two students also offered conflicting information on whether or not Yoon was supposed to credit others publicly when he released his site.

Khan wrote in a statement that “he promised to give full credit in his article,” while Yoon wrote in a text that “it was agreed upon at two separate points in time that if I were to platform this independently, that everyone would be anonymized for a multitude of reasons.”

“I find it disingenuous that after this project gained traction, that these cowards who initially refused to attach their name are now trying to claim clout,” Yoon added.

Khan called Yoon’s characterization “a straight up lie” in response to a request for comment.

“That’s very funny considering we asked for full credit and he said that he’ll do that,” Khan wrote.

“To clarify I was ambivalent at first about attaching our names but we did in the end,” he added.

‘It’s Probably Not Legal What We Did’

In the Discord server, the members reflected on their conduct in compiling the user data, noting that scraping the data might be unethical or illegal.

The anonymous UCLA student, in an interview, admitted that the server’s members ultimately didn’t claim public credit for discovering the vulnerability because they understood their actions could have consequences.

“What we were doing was definitely unethical,” they said.

In the server, one student initially defended the group’s conduct — using the argument that the user data was only available to the server’s members.

“its only unethical if i’m like leaking the data,” the student wrote.

{shortcode-da16dc08fa54dc000112897ebbdcd202fd5d2506}

As they were in the process of scraping the user data, one member of the group wondered aloud if they should alert the Datamatch team. However, another user pushed back, saying that they should first download the data and then figure it out.

“I wonder if they’ll sue us or something,” the first user wrote.

“they can’t it’s literally their fault,” the other user responded. “you can sue them tho.”

But once they finished compiling the user data, the group began to more seriously consider the potential implications of their actions. One student called the information the group accessed “seriously creepy” and “dangerous.”

“I'm now thinking it’s probably not legal what we did,” another student wrote.

In particular, one student — the one who had scraped Datamatch for user profile data — decided to delete the completed file, saying “i’m not sure i want it to get out of hand.”

The students discussed at some length if what they’d done was illegal — debating whether their intentions of exposing Datamatch vulnerabilities would protect them if they went public with their discoveries or informed the Datamatch team.

“I would rather go down doing what I think is right than compromising my beliefs n doing what others think is right,” one student wrote.

In her statement, Liu wrote that Datamatch plans “to thoroughly re-evaluate the design of our website and our security infrastructure.”

“The best that we can do moving forward is ensure that knowledge of these vulnerabilities is well documented and heavily emphasized in future iterations of Datamatch,” Liu wrote.

“With that being said, we plan to reduce the amount of public user information that is available via the Search function and rate limit our API to make scraping data more difficult,” she added.

—Staff writer Jo B. Lemann can be reached at jo.lemann@thecrimson.com. Follow her on X @Jo_Lemann.

—Staff writer Neil H. Shah can be reached at neil.shah@thecrimson.com. Follow him on X @neilhshah15.

The Harvard Crimson

The Harvard Crimson

‘Definitely Unethical’: Datamatch Leak First Exposed by UCLA Students in Private Discord

‘The Cybersecurity Equivalent of Freeballing Jorts’

‘Writing a Criticism’

‘It’s Probably Not Legal What We Did’

Tags

From Our Advertisers

The Harvard Crimson

‘Definitely Unethical’: Datamatch Leak First Exposed by UCLA Students in Private Discord

‘The Cybersecurity Equivalent of Freeballing Jorts’

‘Writing a Criticism’

‘It’s Probably Not Legal What We Did’

Tags

MOST READ

From Our Advertisers