HMS Researchers Develop AI Model to Detect Disease-Causing Gene Variants


Researchers at Harvard Medical School and Oxford University developed a new artificial intelligence model that can determine whether a gene variant is benign or disease-causing.

EVE — short for evolutionary model of variant effect — is an AI tool that has analyzed more than 36 million variants across 3,219 disease genes and classified more than 256,000 variants of unknown significance, according to their paper published Wednesday in the peer-reviewed scientific journal Nature.

Using complex machine learning methods, EVE examines patterns of genetic variation in nonhuman species to detect variants involved in causing disease and ultimately extrapolates those results to the human genome.

The model examines the relative prevalence of a certain variant across genomic regions in order to assess its likelihood of causing disease, according to Mafalda Dias, a postdoctoral researcher at HMS and one of the study’s first co-authors.


“The assumption behind the model is that, if a variant is very common in evolution, it is probably because it’s not pathogenic,” she said. “But if it never occurs, it’s very rare and probably there is something wrong with it, because evolution has gotten rid of it in a sense."

Unlike existing AI models that rely on previously-established human labelling of gene mutation effects, EVE is an unsupervised technology that solely uses deep learning methods to model complex genomic patterns, according to Jonathan Frazer, another postdoctoral researcher at HMS and first co-author of the study.

“It’s the ability of the deep learning methods to pick up on these patterns that allows us to be able to say if a variant is highly improbable under the model, and therefore it’s never been seen in evolution, and therefore it’s probably likely to be disease causing,” he said.

Frazer also said that the motivation for this study stemmed from global, scientific efforts to sequence all life on earth and understand evolution.

The long-term goal of EVE is to help provide additional guidance for clinical geneticists in diagnosing patients with genetic-related diseases, according to Pascal Notin, another co-first author of the study and a graduate student at Oxford University.

“It is more of a tool that will help a physician make the correct diagnosis,” he said. “This is not what will drive the diagnosis itself.”

Frazer added that he believes the development of EVE and other genomic technologies is only “scratching the surface” of the potential clinical applications of data-driven tools.

“You can imagine the distant future where we can build models of the whole genome and a patient can come in and they can give their entire genome and we can interpret all the variants that are there and all of the interactions,” he said. “I suppose that’s the long-term dream.”

Dias also said that continued developments in machine learning can be used to effectively analyze the vast and growing amounts of genomic sequencing data.

“I really think that we are in a time where these cross paths between biological data and advances in machine learning can be really, really fruitful,” she said.

—Staff writer Ariel H. Kim can be reached at

—Staff writer Anjeli R. Macaranas can be reached at