The Kendall Square Codebreakers

Jonathan M. Bloom ’04 bounced between groups of biologists and computer scientists, leaping from conversation to conversation with an energy that could make you forget, for a moment, that he is a mathematician. He wore a pair of wire glasses with thick lenses that jutted out from his messy brown hair. His ideas seemed to come from his feet, traveling like a wave through his legs and torso before finally exploding into words.

The crowd of scientists grabbed breakfast sandwiches and sipped lattes made by the barista a few steps away. The space was large and open, framed by glass and metal walls that opened into a garden. The surrounding lobby was decorated with large abstract canvases that vaguely resembled red and orange cells in a petri dish, created by a former artist-in-residence. If you didn’t know better, you would think you were at Google or Facebook, not one of the most important genetics laboratories in the world.

Bloom is a member of the Broad Institute, a joint venture between Harvard, MIT, and the five Harvard-affiliated teaching hospitals that seeks to understand the links between genetics and disease. Like many people at the Broad (rhymes with code), Bloom has little formal training in biology — his undergraduate and graduate degrees are both in theoretical mathematics. Eric S. Lander, a legendary geneticist and the founding director of the Broad, is also a mathematician by training. In fact, Lander has not taken a biology course for a grade since his sophomore year of high school.

The Broad is often compared to tech startups, and for good reason. It took in $430.5 million in revenue in 2017 from a combination of federal grants, donations, and funds from industry. Though it is only 15 years old, the Institute’s 4,300 members have published thousands of high-impact scientific papers. The CRISPR gene editing tool, one of many projects undertaken by Institute scientists, is widely expected to win the Nobel Prize and earn millions of dollars in patent licensing fees.

The Broad uses many of its extensive resources to generate massive biological datasets, which it shares for free online. Scientists from around the world use these data to find insights in genetic diseases, often making use of scientific computing programs like the Genome Analysis Toolkit, also published for free by the Broad. Because of these advances in biological and computational techniques, scientists are asking questions today they wouldn’t have imagined even a decade ago.

{shortcode-5056fabb990b7270a41a9ba2e2d45cbd0d03e6f5}

Bloom directs the Models, Inference and Algorithms Initiative at the Broad, a weekly seminar that covers breakthroughs in math and computer science and applies them to biology. The scientists outside the sunny lecture hall were waiting for that week’s talk to begin. When the doors opened, the room filled to capacity immediately. The crowd was a sea of laptop screens, most showing either the black Unix terminals used for remote computing or the code debugging website StackOverflow. As they sat and chatted with lattes in hand, scientists ran code designed to find the genes associated with certain kinds of cancer.

The keys to treating many of the diseases that continue to resist modern medicine — cancer, mental illness, diabetes, obesity, cardiovascular disease — lie in understanding the unimaginably complicated interactions between genes, proteins, and cells in three-dimensional tissues. Even if you map out all the building blocks of human DNA, which was done for the first time in 2001, the body’s unimaginable complexity makes treating even well-understood diseases extremely challenging.

But biology today has entered a golden age, created in no small part by the researchers at the Broad. Innovative new biological tools like CRISPR and single-cell RNA sequencing have emerged at the same time as dramatic leaps forward in machine learning and artificial intelligence. Vast oceans of data, processed by clever algorithms, allow scientists to quickly identify promising treatments with unheard of sophistication and speed. “The floodgates have opened, and now it’s up to us to do the absolute best that we can do,” said Aviv Regev, chair of the faculty at the Broad.

All of the Broad’s research is enabled by a single technology — genome sequencing — that many of the Institute’s key members were instrumental in creating. “In the computer revolution, you had a person or a small lab that could study natural languages, encryption, biology, and computer graphics all in the same lab, all in the same week, because there was this incredibly enabling core technology,” said George M. Church, a professor of genetics at Harvard Medical School and a senior associate member of the Broad. “The same thing happened with genetics.”

But new ideas and technologies are not passive things, sitting quietly on a lab shelf like a biological reagent. Ideas are alive. They create staggering wealth through patents, shape institutions, and change the fabric of what counts as science. They create a global community of researchers more connected than ever before.

Genetics has produced so much data that scientific thinking itself has changed. When mathematicians like Bloom explore massive biological datasets without knowing in advance what they are looking for, they are practicing what is sometimes called “hypothesis-free science.” It’s different from the traditional model of a single biology professor in a lab, surrounded by graduate students and staff, exploring one question at a time using a careful method governed by a hypothesis. The relationship between hypothesis-free and hypothesis-driven research at the Broad is both the heart of its success and the source of some of its controversy.

The story of the Broad is the story of biology in the 21st century. It’s a story of innovation enabled by dramatic advances in computational and biological tools, of a thrilling new age where scientists gain new insights into diseases like cancer. It’s a story of how those new tools transform the kinds of questions scientists can ask, revealing new horizons never before imagined. It is a story with millions of dollars at stake in funding and licenses, of organizations that so radically change the scale of science that the definition of science itself is fundamentally transformed.

{shortcode-ae750686305c8c5c9206443cc29a04f20a7bd39f}

George Church and the Origins of Genomics

Modern DNA sequencing, the technology at the heart of the Broad Institute, was born shortly after George Church flunked out of graduate school. Church, who stands a looming 6’5” with an enormous white beard, has always done things a little differently. Though he built his first computer when he was 10 years old, Church’s passion was always for biology. He blazed through college in two years, earning a degree in chemistry and zoology from Duke in 1974. He stayed at Duke for graduate school and threw himself into lab work, working 100-hour weeks on X-ray crystallography. Church’s research was groundbreaking — combining early computer science with biology to create the first 3D image of folded RNA — but he neglected his classes and was asked to leave the Ph.D. program in 1976. The expulsion letter is posted proudly on his website.

Few Fs have been more important in the history of science. Church’s research record was strong enough that in 1977 he enrolled without a hitch in Harvard’s biochemistry Ph.D. program. Church dreamed of seeing not just a single strand of RNA, but the entire list of DNA base pairs that encode the instructions for making a human being. In April 1984, Church published a five page paper, simply called “Genomic Sequencing,” outlining a new technology that could make that dream a reality.

This technology took the scientific world by storm. By December 1984, a meeting in Alta, Utah. sponsored by the Department of Energy set in motion what would become the $3 billion Human Genome Project, the single largest biology research program in history. The DoE, which is responsible for the American nuclear program, was interested in understanding genetic mutations from radioactive fallout in the aftermath of the bombings of Hiroshima and Nagasaki. Traditional methods of studying DNA were not powerful enough to detect the anticipated mutations. But Church, then an unknown recent graduate, astounded the scientists with his ideas for genomic sequencing. From the ashes of one massive government-funded science program — the Manhattan Project — another more hopeful one was born.

Some technologies are so much better than their predecessors that they do not merely improve a field — they create entirely new ones. Over the next few years, scientists became increasingly convinced that this new technology demanded a new kind of science. Thinking bigger than the DoE’s radiation project that first gathered them together, the scientists began to imagine sequencing the entire genome. Nothing that ambitious had ever been attempted in biology. To sequence the 6,469,660,000 base pairs in the human genome, they would need billions of dollars and the combined power of genetics labs around the world. The Human Genome Project required a government.

The history of the Broad starts with the dream of the HGP. The sheer labor and interdisciplinary thinking required to sequence the genome brought together the best biologists, mathematicians, and computer scientists in the Boston area, who then collaborated with labs around the globe. The Broad would soon blossom from this already established scientific network.

If Church and the early geneticists were going to push biology to a scale radically larger than it had been before, they had to do it in a place radically different from academia. The DoE turned out to be the perfect place to do transformative genetics research. Interdisciplinary to the core, the DoE already had a number of computer scientists in the 1980s, including some working on genomics. The engineers there came from a number of fields and thought on the scale of atomic bombs and nuclear plants — a far cry from traditional academic biologists, who had never before or since attempted a project as large and expensive.

“It was a very different way of thinking about biology,” said Lander. “The idea that, complementary to the way you would work on an individual protein, or an individual problem, that it might be possible to step back and look at the whole picture, to look at the whole genome simultaneously.” The HGP formally launched in 1990, jointly led by the DoE and the National Institutes of Health. A new era of science had begun.

Such radical growth necessitated radical change. The founding document of American science policy, a 1945 report written by MIT engineering dean Vannevar Bush, argues that “scientific progress on a broad front results from the free play of free intellects, working on subjects of their own choice, in the manner dictated by their curiosity for exploration of the unknown,” not from top-down management.

The HGP, like the Manhattan Project, had the opposite philosophy. It was not driven by curiosity. It was centralized, expensive, and driven by committees with a single-minded goal. This is why big science is sometimes associated with hypothesis-free science: It is done not to answer a question, but to build a product. Big science is the only way to build an atomic bomb, or sequence a genome for the first time. The hope of big science is that by investing heavily in a centralized effort to produce a fundamental dataset or technology, a global community of future scientists will be more empowered to pursue their own research programs.

Lander vs. Venter

In 1974, a teenage Eric Lander found himself on the roof of a building in East Germany, armed to the teeth with water balloons. He was a member of the first American team at the International Math Olympiad, and he had made fast friends with the Soviets — they were the only other superpower present. Protected by his new friends’ political control over East Germany, Lander, his teammates, and their Soviet rivals joined together in an international effort to throw the balloons into oncoming traffic.

Soon after, Lander glided through Princeton’s math program, graduating as valedictorian. He earned his Ph.D. in math at Oxford, where he was a Rhodes Scholar, but faced a growing dissatisfaction with the relentless abstraction and solitary study of professional mathematics. “[Math] is kind of a monastic career, and I’m not a very good monk,” said Lander, bursting into laughter. “I very much wanted to do things that were more in the world, more connected to people.”

Despite not technically knowing anything about managerial economics, in 1981 Lander landed a position teaching on the faculty at Harvard Business School. Though he enjoyed the subject, Lander soon became dissatisfied with that field too. Encouraged by his younger brother, a developmental neurobiologist, Lander began to study the brain, sitting in on lectures and moonlighting in a fruit fly lab.

He started working in a genetics lab at MIT, where his mathematical prowess made him an instant star. In 1986 he was appointed a fellow of the Whitehead Institute for Biomedical Research at MIT, where he started his own lab while maintaining his faculty appointment at the Business School. In 1987 Lander won a MacArthur “Genius” Grant for his genetics work. He jumped fields in 1990 to join the biology faculty at MIT, just as the HGP was about to begin.

Eric Lander was the HGP’s breakout star. Few others in genetics had his combination of managerial, mathematical, and biological talent. His lab at MIT was one of the first to join the HGP and proved to be wildly successful.

“He realized what kind of institution needed to be built and what kind of people he needed,” said Michael B. Eisen ’89, a computational biologist at Berkeley. “In some ways he was just born to build big research institutions.”

Lander’s diverse skills proved vital to the Human Genome Project when the government-funded enterprise faced an existential challenge from a breakaway scientist named J. Craig Venter. Venter, at first a collaborator on the HGP, founded his own company to compete with his former colleagues. His strategy was to use shotgun sequencing, a method that took advantage of rapidly advancing computers to dramatically speed up DNA sequencing at the cost of quality.

“We knew that Craig was just going to claim victory just by doing stuff at random, even though we knew it would not fold together into a high-quality, continuous draft,” said Lander. “But we knew that from a public relations point of view it would sound very convincing.” Venter could not be reached for comment.

If successful, Venter’s company would lock up the sequenced genome and sell access to the data, contrary to the HGP’s commitment to freely publish its sequences. This was an existential threat to the kind of open science that the Broad would go on to embody, where sharing data with a global community of scientists becomes a way of empowering new research. But Venter’s strategy, where computational power dramatically speeds up biology, predicted the change of tides in the coming decades.

Lander pushed to publish a draft of the genome to head off Venter, intending to circle back to finish the job at a higher quality. The plan worked: in 2001, the Human Genome Project and Venter’s company published their results simultaneously in Nature and Science. A business professor only a decade earlier, Lander cemented himself as one of the most important figures in human genetics.

{shortcode-3291b5902455ae469aed0cc975d6c84b4934a190}

Big Science, Small Science

Eric Lander’s sixth floor office is filled with light and looks over the metal and glass buildings on Kendall Square. It is surprisingly small — no bigger than a professor’s office — and filled with orange furniture. When he speaks, he can barely sit still, and the room fills with electric energy.

“The Genome Project taught us that to do certain things — not all things, but certain things — you had to be prepared to work at a larger scale, or in an interdisciplinary way,” said Lander. “Basically we didn’t just have to do experiments in our labs, we had to experiment with our labs. We had to experiment with the structures of academia.”

After the Human Genome Project came to a formal end in 2003, Lander wanted to take the lessons of the project and enshrine them in a new kind of institution. He met with senior leadership at Harvard and MIT to begin discussing what would become the Broad Institute, which was formally announced after philanthropists Eli and Edythe L. Broad gave a $100 million gift. But he wanted to do many kinds of science, not just industrial-style sequencing.

“To me, the Broad is just not big science,” said Lander. “It was once, and the Human Genome Project was big science. But the vast majority of what goes on here is by small teams that can draw on a much larger community and infrastructure.”

Though the Broad was founded on the research style of the HGP, research there today follows a set of three approaches. Some of its scientists work to produce big datasets in the style of the Genome Project — for example, the Cell Atlas program seeks to build a detailed map of every cell in the human body. Others follow the more traditional path of the independent biologist studying individual questions at their lab bench. Still others, often trained in math or computer science, follow a genuinely new approach — working with the data produced by the Broad and other institutions to identify promising treatments or problems.

Hilary K. Finucane ’09, a Schmidt Fellow at the Broad Institute, follows the third approach, and makes big datasets bigger. Finucane and her collaborators noticed that a database called ENCODE, funded by the government as a followup to the Genome Project, is good at understanding the functions of small pieces of DNA, but not the ways those pieces work together to create diseases like cancer. Other databases, created by a method called a genome-wide association study, are good at understanding which pieces of DNA are associated with diseases like cancer, but not why they are associated. By bringing the two kinds of data together, Finucane hopes to counter the weaknesses in each.

Just a few years ago, Finucane’s groundbreaking research would have been impossible. But she does not see new tools or equipment as the most enabling recent development.

“My very strong impression is it’s been less new technology and more a culture shift towards sharing data,” she says. The math simply would not work for a lone laboratory: A handful of scientists do not have the resources to track down people with a rare disease and take tens of thousands of samples. Perhaps they could take 100. But by teaming up and pooling data with teams around the world, a global community of scientists like Finucane suddenly have the data they need to make useful conclusions. Collaboration in this case is not just a buzzword — it is a mathematical necessity.

“We need to use computational approaches to be clever about understanding the patterns there,” said Institute Scientist Bridget K. Wagner ’95. “But you still also need the deep science, the people that are going to do the hypothesis testing.” Wagner works with the Broad’s library of over 400,000 compounds, where she is trying to find a chemical that increases beta cell count in the pancreas — in other words, a treatment or cure for diabetes. But testing that many compounds is impossible. Instead, Wagner’s team identifies a smaller set of representative samples from different chemical families. She runs tests on these representative chemicals, allowing her to identify and study the most promising potential drugs.

On the surface, Finucane and Wagner could not be more different. Finucane’s “dry lab” research, done on computers and not lab benches, squeezes global data in search for a cure for complex diseases. Wagner’s “wet lab” research involves hands-on processing of thousands of chemicals, managed by robotic arms in a giant freezer. But both share a common approach, where a massive number of initial ideas are narrowed into a smart and small set of possibilities. Newly available data and technologies have made this kind of science possible.

{shortcode-1882356202f836701c50ac19b0584f26e5ae27c5}

What the Truth Looks Like

As I waited outside the second floor lecture hall in the Broad, I struck up a conversation with a young biologist who dreamed of seeing without light. Light is a messy tool, he said, interacting with different parts of the cell before bouncing into an expensive and specialized piece of equipment. And no matter how good your microscope, it cannot read off the complex system of genes that move the cell into action. A small team of scientists at the Broad imagined cutting out the middleman: Maybe you could teach a computer how to see cells in a petri dish not with light, but with DNA.

Minutes after we met, he pulled out his laptop and excitedly described the red and green blobs floating on a black background. These were cells that once glowed red and green in real life, but the image I was looking at was no photograph. The computer that created the image had never heard of “color” or “biology.” Using sophisticated mathematics powered by a clever DNA marking system, the computer had learned what a cell is, and how to tell the red cells from the green cells. The technology was powered both by the sequencing technology invented just decades earlier and recent developments in machine learning.

While Venter’s idea of locking up genome data ultimately failed, his plan to use computers to power the shotgun method proved to be ahead of its time. Because of massive investment from the tech industry in tasks like image classification, computer scientists have rapidly turned machine learning into a state-of-the-art tool for doing biology. These new methods need an army of computational biologists, in part explaining why the Broad has been flooded by mathematicians.

Machine learning requires massive datasets to work well — this is why information giants Google and Facebook have led the field. It would never have succeeded in biology without the data produced by big science. After Church’s new sequencing technology led to a new kind of science, that new kind of science led to a new kind of math.

But for Bloom, the mathematician, the tools built by private tech companies and banks to predict ad performance and stocks may not be good enough to do biology. That’s because science is built on understanding, not clever algorithms alone. “We’re concerned with the nature of scientific truth, and we just don’t know what that truth looks like,” said Bloom.

“It’s not the case that we can just take these other methods and out-of-the-box apply them to our data and then, boom, biological insight,” said Finucane. “What’s really needed is for people who understand where the data is coming from, and then have the expertise and the creativity and the skill to develop new algorithms and statistical methods to analyze this type of data.”

Bloom’s hope is that a new generation of computer scientists will choose to join the Broad, not the Amazon or Google campuses located just a couple of blocks away. The massive data produced by a global community of scientists requires a massive labor force to understand it.

Hacking DNA

Feng Zhang ’04 was just 31 years old when he published one of the most important discoveries in a generation, earning the Broad Institute a small fortune in patent licensing fees. His research had changed paths two years earlier, shortly after he started his lab at the Broad, when he heard about little DNA fragments in bacteria cells called CRISPR.

These DNA fragments are like a library of the cell’s enemies, bits and pieces of viruses it killed and preserved like a gruesome collection of severed heads. But these aren’t just trophies from a successful battle with invaders. The cell actually uses these CRISPR fragments to guide a pair of scissors, called Cas9, to any virus foolish enough to invade twice, where the scissors slice up the virus DNA into tiny pieces.

Zhang adapted CRISPR-Cas9 for genome editing in eukaryotic cells, including humans, giving scientists control over the bacterial scissors. The technology has created a revolution in biological research and drug development, making certain kinds of experiments cheap and easy to carry out.

But a few months before Zhang published his 2013 paper, scientists Jennifer A. Doudna and Emmanuelle M. Charpentier demonstrated how CRISPR-Cas9 could be transformed into a cheap and easy way to edit DNA on the fly.

However, they stopped short of showing it worked in eukaryotic cells. Because Zhang’s work built off of Doudna and Charpentier’s more fundamental insight, biologists on important prize committees have given this pair more accolades, including the $3 million Breakthrough Prize. Zhang was snubbed, seen by the committee more as a scientist who put the finishing touches on a nearly complete product, not a revolutionary.

This question of who deserves credit for CRISPR may seem like semantics. But biology has a currency in the world, and researchers who make important discoveries stand to earn millions of dollars from patents. Forbes argues that CRISPR is one of the most valuable pieces of intellectual property ever developed in a university lab. Hundreds of millions of dollars in licensing fees are at stake. A new kind of science, one that enables rapid development of new drugs, needs a new kind of law.

Berkeley, where Doudna works, sued the Broad after Zhang filed his patent for the CRISPR technology. Berkeley claimed that Doudna and Charpentier got there first. In September 2018, a federal appeals court ruled in favor of the Broad. Patent law in America does not work the way the Breakthrough Prize committee works. Because Zhang was the first to apply CRISPR to human cells, he holds the intellectual property for applying CRISPR to human cells.

The decision created a fourtune for the scientists at the Broad. Zhang’s company Editas currently is the primary licensee of CRISPR; at the time of writing, Editas has a value of just over $1 billion. Ideas don’t just create new scientific tools — they create wealth.

Patents are a strange kind of compromise in modern science. When a researcher files a patent, they make their technology public; in return, they are guaranteed the right to the wealth created by their ideas. That openness is key to the success of modern biology research: When scientists share technology and data with an interdisciplinary community of researchers, new ideas become possible.

But money changes things. “There still is a really important need for fundamental, basic science research to go on that may not be patentable,” said Aaron S. Kesselheim ’96, a patent attorney and professor at Harvard Medical School. “But if more universities are focused on the science that could be patentable, that leaves more of a gap.” Patents allow openness, but may also change the kind of science that gets done.

In a shallow reading, Lander seems to have swapped places with his arch-rival Venter from the Human Genome Project race — jumping from the archetype of government-funded science into a mixed model funded by philanthropy, government grants, and licensing fees. The Broad's involvement with Editas and other private companies resembles Venter’s jump to the private sector.

But Venter was fundamentally different. He wanted to lock up information, selling the genome to companies who were willing to pay. It was a business model from an era before people understood the internet. The Broad’s approach is new, built on the idea that information should be shared for free. Computational biologists like Finucane have been able to make dramatic advances in understanding illness only because information is openly available to everyone. At the same time that patents have led to fights over money between institutions, they have also protected the openness that allows those institutions to do science in the first place.

Mapping the Human Body

Aviv Regev’s dreams make the Human Genome Project look small. She helped pioneer single-cell genomics, a technological breakthrough that performs the task of that entire project on a single human cell. While the HGP could not see differences across different kinds of cells, this tool has shown scientists for the first time the sheer genetic diversity in a single human being.

Regev co-chairs the Human Cell Atlas project, an international group that seeks to replicate the HGP for every cell in the human body, on top of her leadership role at the Broad Institute. She also runs the Cell Circuits Program at the Broad, which studies the complex interactions between genes and molecules in many different kinds of cells. With these two projects, Regev hopes to build a map of the human body that is unprecedented in its detail.

A map this big is necessary because humans are complicated. “By knowing what happens when you perturb one gene, and what happens when you perturb another gene, you can’t actually predict necessarily what will happen when you perturb both of them,” said Regev. “And in the past, we used to think that that makes it a very difficult problem.”

But new biological tools like CRISPR and powerful breakthroughs in computer science are turning the tide. With Regev’s maps, scientists may be able to understand the ways genes work together without trying the almost-infinite combinations of gene edits in their lab.
“It might be possible to sample the space of possibilities in a way that would let us predict what is happening with things that we’ve never done,” said Regev.

Regev’s dream relies on thousands of people around the world sharing their ideas and data. Not only does it need industrial-style sequencing, it requires armies of computer scientists building new kinds of algorithms and biologists transforming the newfound connections into powerful drugs. The Broad was built to bring these kinds of thinkers together.

“We have to be open and ethical and courageous with what we do, and we have to be very generous with each other,” said Regev. “Otherwise our community will not succeed.”

Correction: March 1, 2019

A previous version of this article incorrectly stated that Eric Lander is involved in Editas. In fact, he has no involvement with the company, and it is the Broad that is involved with Editas.

— Magazine writer Drew C. Pendergrass can be reached at drew.pendergrass@thecrimson.com. Follow him on Twitter @pendergrassdrew.