Advertisement

SYSTEM WARNING: Don't even THINK about cheating in this class!

Call it the Big Brother of introductory computer science courses--always watching, anticipating students' every move, a little mysterious.

Every year, students in Computer Science 50: "Introduction to Computer Science" (CS50) debate whether the course's instructors really use a special program to weed out cheaters and plagiarists.

But the software is real, instructors say--and it is highly effective in tracking cheating.

Advertisement

"I always have students who say to me, 'Do you really have a software that checks for cheating?'" says Gordon McKay Professor of Computer Science Stuart M. Sheiber, who teaches CS50. "They think we're making this up to put the fear of God in them."

The software, which automatically scans and compares every problem set for similarities, works "astoundingly well," instructors say.

Over the last few years, CS50 has gained a reputation for sending a disproportionate number of students to the Administrative Board to answer charges of academic dishonesty.

"Of the students who have been subject to disciplinary proceedings by the Ad Board, a surprisingly high percentage have come from CS50," Sheiber says. "But it's not because CS50 students are cheating more. It's just that we're more effective in detecting cheating."

Sheiber says there is no evidence that CS50 experiences more incidents of cheating than other large courses. Students may not think twice about sharing problem set answers in large Core classes where up to a dozen different TFs separately evaluate written assignments.

In fact, he says widespread knowledge of the course's plagiarism software might actually ensure that fewer students get a little help from their friends on CS50 assignments.

The Invisible Hand

CS50 is one of only a handful of courses at Harvard where every assignment is submitted electronically and scanned by a custom-developed plagiarism detection program.

The software, which was developed several years ago by a graduate student in computer science, has since been maintained and used annually as a computer aid to detect cheating.

The program compares assignments with each other and with work handed in for previous years and computes a similarity score. Human TFs can then examine very similar papers more closely.

Beyond these sketchy details, the software remains shrouded in mystery--Sheiber insists the program's secrecy is "part of the effectiveness of the software."

A silent step along the grading process, few undergraduates ever feel its presence.

Undergraduate TFs, for example, are never allowed to deal with the software themselves, according to Michael W. Bodell '00, who was a teaching fellow for the course for its last two terms.

Only graduate students and Sheiber himself handle the program, Bodell says.

Undergraduate TF Leslie S. P. Yeh '00, who works closely with her CS50 students, calls the program "invisible to the TFs."

"We know that it's real, we know that it's run, but otherwise we don't have much to do with it," Yeh says.

Fact or Fiction

Some students claim the cheating software is a CS50 urban legend--insisting that TFs use the threat of the program alone to encourage original work.

"I don't know if it's a myth," says computer science concentrator Toshi J. Clark '03, who took CS50 last semester. "[But] it gives incentive to do it by yourself."

Even Sheiber says that the existence and use of the software is "widely known but not widely believed."

He says he has been approached by students who see the software as a cheating bogeyman.

Sheiber says students frequently doubt the software because they misunderstand its function. Software that determines plagiarism outright, he says, seems like an impossible tool for any course to have.

"And we don't," Sheiber says. "We have a computer aid program that helps people determine plagiarism."

"People who have experience reading code can have a much better notion of what constitutes inappropriate similarities in code," Sheiber says. "The decision about what constitutes plagiarism is made not made by software--it's made by the people."

And yet the idea that the software is a myth persists.

Sheiber says he once interviewed a TF for the course who had himself graduated from the course. At the end of the interview, the alum begged to be let in on the secret--was the program real, he asked?

" I don't know what we could do [to convince students that] there is a program, and that it works astoundingly well, that students are found out when they go beyond collaborative project guidelines, short of repeating it," Sheiber says.

TFs, including Bodell and Yeh, say the course's instructors stress throughout the semester that the system exists.

The Real Thing

But those students who are caught by the system don't doubt that it's real.

Since the software not only compares code but also scans other parts of the assignment--including verbal responses--it often catches plagiarism that is not even in the code portion of the assignment.

"It looks for any similarities, and there are different parts to the problem sets," says a student in Pforzheimer House who was required to withdraw from Harvard for a year after the Ad Board decided he had plagiarized a portion of a CS50 assignment.

"There's another part [of the problem set] that asks questions about the assignment," he says. "I borrowed a friend's paragraph and used it. The program picked it up."

He points out that other courses at Harvard don't have a sophisticated method for finding out just how much collaboration goes into a problem set.

"Students copy from each other all the time," he says. "[But] other problem sets are handed in informally to TFs. There is no one program that goes through them, and it's not like one person or entity has the chance to compare assignments."

He agrees that there is merit to the program, but he says until the program is expanded to other courses, the discrepancy in grading methods might just be unfair.

"Is CS any more important a subject? Does it require that students adhere to its policies more strictly?" he asks, adding that there may be some legitimacy into looking in to expanding this methodology to other courses with problem sets.

The Magical Disk

Sheiber also says he sometimes feels that CS50 students get the short end of the educational stick because the methods for catching cases of plagiarism are far more sophisticated than in any other course.

Since CS50 assignments are submitted as disks, it is easy for TFs to plug each one into a similarity comparison program.

"The thing that makes [plagiarism detection] possible is not the genre of the work," he says. "It's the fact that work is submitted in electronic form."

There is no reason, he says, that this methodology cannot be extended to other courses at Harvard if there is interest.

The software can be adapted to any subject as long as work is submitted electronically, he says.

"You can have computer aids comparing English and philosophy papers," he says. "If some English professor wanted to use [the software], we'd give it to them, and it would be equally effective."

Sheiber says other schools are starting to use similar systems to analyze papers in the humanities, including services offered on the Internet. One such site, plagiarism.org, does line-by-line analysis of student papers, noting passages that are similar to material currently offered on the Web.

But Sheiber says he does not expect humanities departments to express any interest in the software.

"I've had zero interest from other people on campus," he says, "To my mind, this software is not a big deal."

According to Alexander S. Aiken, associate professor of computer science at the University of California at Berkeley, high-tech cheating in computer science departments was a precursor to the ways the Internet is now being used to cheat in the humanities.

Over the last several years, online paper mills and other cheating aides have sprung up on the Internet, offering custom-written papers and essays at as much as $20 a page--a phenomenon with which Aiken says humanities departments will soon have to grapple.

"For 25 years, computer science students have been on computers and sharing infrastructure," he says. "You can look at that as a microcosm of what was going to happen when everyone got online."

Beyond the Yard

While the similarity-checking software may not be widely used at Harvard, other colleges routinely use analogous programs to curb cheating in their CS departments.

In 1994 Aiken designed a similar detection system for Berkeley called Measure of Software Similarity (MOSS).

The Internet service, which allows instructors to submit a set of programs for comparison, was made public two and a half years ago. Since then, more than 1,200 users worldwide--representing both individual instructors and entire departments--have subscribed to the free service.

"The computer sciences have had this problem for a long time," says Aiken, who was inspired to develop MOSS after finding that plagiarism tools available at that time were inadequate to keep up with common code-copying.

Like Harvard's program, the MOSS program catches similarities in submitted files and leaves the final plagiarism analysis up to the instructor.

Aiken says that MOSS is used routinely in Berkeley's introductory as well as junior and senior computer science courses.

Although not all computer science professors use the system, it is currently in use in five courses with 200 to 300 students each.

Aiken says the program allows professors to focus on teaching rather than worrying about cases of academic dishonesty.

But according to Aiken, cheating still persists.

"If chances for getting caught are fairly low, it's fair to say that in a random class of 50 to 100 students, about 10 percent are cheating," Aiken says. " I have anecdotal evidence from colleagues that have used MOSS that confirms it--10 percent is what you get when people are supposed to be doing the right thing."

Not all colleges employ plagiarism detection software in their computer science courses. Course instructors still manage to catch code similarities when they grade problem sets by hand--as a group of Dartmouth students recently learned.

Last month, Rex Dwyer, a visiting computer science professor at Dartmouth, accused about 40 students in his introductory computer science class of stealing answers to a homework assignment from the course's website.

Dwyer alleged that the students had accessed solutions to a particularly difficult homework assignment on the site, which had previously been protected with a password, according to the student newspaper, the Dartmouth. Dwyer had not changed the password after a class demonstration.

Dartmouth's main disciplinary committee, the Committee on Standards, is still investigating the charges.

Cooperation and Collaboration

But Harvard is making no new effort to curb cheating in courses outside computer science.

Dean of the College Harry R. Lewis '68, who also teaches Computer Science 121, says that computer science has to be considered separately.

"CS is different, because the capacity for copying is so large," he says.

And although paper mills on the Internet are selling essays and reports to college students all over the country, Lewis says the current system works fine.

"I'm not aware of anyone doing anything systematic," he says. "In some cases, cheating is easily detected-- the [plagiarized] work is either of a distinctive style or of horrible quality. People notice these things."

And Lewis says CS classes vary in the amount of collaboration they permit. In his own class, for instance, students are allowed to collaborate in small groups, as long as the written work they submit is original.

"In fields like computer science and engineering, in the real world, collaboration is good," he says.

--Joyce K. McIntyre contributed to the reporting of this story.

Recommended Articles

Advertisement