Evo 2 Can Design Entire Genomes

For more than a quarter-century, synthetic biologists have dreamt of designing new biological systems: gene circuits that behave like electrical ones, for example, or entirely original life forms. But nature rarely cooperates, and a slew of prior attempts suggest that even the brightest scientific minds cannot design biology as exquisitely as evolution.
Most bioengineers, therefore, tweak and repurpose what nature has already provided; they modify existing genomes and enzymes to create bespoke tools, tinkering here or there to drive biological progress forward.
AI models, though, can design biological systems in ways that humans cannot. Neural networks easily spot patterns across vast libraries of books and internet articles to learn — or at least imitate — the inner workings of language. And they can do the same for biology: AlphaFold, an AI model trained on thousands of protein structures in the Protein Data Bank, can accurately predict the acrobatic foldings of proteins, or even help humans design new ones.
Today, Arc Institute (a research nonprofit in Palo Alto, California) and NVIDIA launched a broader AI model for biology, called Evo 2, that can do much the same for entire genomes. It is, according to the preprint, one of “the largest-scale fully open language model[s] to date.” The release includes “open-source training code, inference code, model parameters, and the OpenGenome2 training data.”1
Evo 2 builds on its predecessor, Evo 1, a smaller model released last fall. But whereas Evo 1 focused solely on bacterial genomes, Evo 2 can “read” and “interpret” genetic sequences from all domains of life, including microbes, plants, and humans. The new model has 40 billion parameters and trained on 9.3 trillion nucleotides of genetic information scraped from 128,000 different organisms, enabling Evo 2 to make far broader predictions than any prior AI models for biology. (A smaller version, trained on 2.4 trillion tokens with 7 billion parameters, is also available.)
Researchers today often spend months trying to figure out whether a genetic mutation causes disease, simply because laboratory experiments are slow. But Evo 2 can accurately predict pathogenic mutations in just a few seconds. The same model can also generate brand-new DNA sequences at the scale of yeast chromosomes or small bacterial genomes.
(The model is available as an API endpoint, and researchers can fine-tune it for free using NVIDIA’s BioNeMo framework. Click here to use the tool in your browser.)
AI Architectures
Before explaining how Evo 2 could make bioengineering more predictable, it helps to first understand how AI models actually “learn” the language of DNA.
Like ChatGPT, Evo 2 is a large language model. It’s built using a transformer-like architecture, a type of neural network that takes inputs and converts them into outputs by looking at an entire sequence, all at once, and then figuring out which features are most important.
The same principles underlying ChatGPT and the written language also apply to biological data, such as genes and proteins. Much as swapping a letter in English can change the meaning of a word, swapping a nucleotide can change the meaning, or behavior, of genes.
Many large language models for biology already exist, but prior models tend to be more narrowly focused than Evo 2. ESM is a protein language model that captures certain aspects of protein structure and function, for example, but is blind to DNA or RNA-level features that drive cell behavior.
Other tools, like DNABert, focus on DNA but trained on much smaller datasets from slim slices of evolutionary space. Evo 1 trained on 300 billion nucleotides from single-celled organisms, such as bacteria, and is therefore limited in its predictions. That model can predict how a mutation changes gene expression, but only for prokaryotic organisms (not humans.)
“[Evo 2] might not solve all questions in biology,” says corresponding author Brian Hie, “but at least it will be helpful to many more questions in biology than if you were to train a very task-specific model, like for protein structure prediction.”
{{signup}}
One of Evo 2’s most important features is its large context window. The model can hold sequences up to one million nucleotides, or eight times more than its predecessor, in its “working memory” at one time. This means that Evo 2 doesn’t only make predictions based on short genetic fragments, but can answer questions about entire genes, regulatory regions, and distant gene interactions.
This is an important, but potentially underrated, technological leap toward understanding how eukaryotic genomes work.
Consider this: A human genome folds up inside the nucleus as a densely-packed ball. Many experiments have shown that it is not only the sequence of bases in the genome, but also the physical arrangement of this “densely-packed ball,” that contributes to its function. Data captured using a method called Hi-C, for example, has revealed large chromosome chunks that preferentially touch other chromosomes. Certain cancers and developmental disorders arise from errors in these “touch points.”
Genomes, then, are not just linear instruction manuals, where each gene operates in isolation. Instead, they are dynamic systems wherein a gene’s behavior changes depending on its physical location and on regulatory elements that are often located hundreds of thousands of bases away. A gene involved in brain development, therefore, might only “switch on” if a distant regulatory sequence is also active.
Older AI models, with their shorter context windows and limited training datasets, often miss these long-range relationships entirely. But Evo 2 doesn’t.
Expanding the context window presented some serious technical difficulties, however. Longer DNA sequences need far more computation and memory to process due to a quirk in transformer architectures. As the length of a sequence increases, the computation required to generate an output grows quadratically. A DNA sequence with one million bases has a computational cost 100-times greater than a sequence one-tenth its size.
The Evo 2 team solved this problem by building an upgraded AI architecture from Evo 1, called StripedHyena 2, which is designed to handle ultra-long genetic sequences at a lower computational cost. (StripedHyena, originally developed in collaboration with TogetherAI, had already significantly reduced the quadratic cost of transformers. Greg Brockman, co-founder of OpenAI, also spent part of a sabbatical at the Arc Institute working on this problem.)
Unlike standard transformers, which compute relationships between every possible pair of nucleotides, StripedHyena 2 leverages different convolutional and attention operators to model both short- and long-range dependencies, which can help recognize patterns like codons and introns. Instead of relying solely on self-attention like transformers, it combines short explicit, medium regularized, and long implicit convolutions in a gated multi-hybrid architecture, improving computational efficiency.
Evo 2 was then trained on 2,000 H100 GPUs from NVIDIA, which is about 150-times more compute than AlphaFold; this is, by far, the most compute ever used to train an AI model for biology and about twice the number of FLOPS as ESM3.

Benchmarks
Of course, a computationally-powerful model isn't worth much if it fails to make accurate predictions. Fortunately, Evo 2 seems to perform quite well across a broad range of predictive and generative tasks.
In one test, researchers tested Evo 2’s ability to predict whether or not a genetic mutation in humans (such as changing an “A” to a “C”) might cause disease. The team focused on BRCA1, a gene known to increase the likelihood of breast cancer.
When the researchers “input” various BRCA1 mutations into Evo 2, the model accurately predicted harmful mutations more than 90 percent of the time. It did this despite never being trained on any BRCA1 variant data.
All of the model’s predictions were compared to a dataset (assembled in 2018 from actual laboratory experiments) documenting more than 3,000 mutations in the BRCA1 gene that are known to either be pathogenic (meaning they negatively affect the protein) or benign.
Evo 2 also predicted “variants of unknown significance,” which are mutations that scientists have never observed but which the model suspects are pathogenic. If and when a new BRCA1 mutation is spotted in a patient, then, this model could plausibly help doctors figure out whether it’s likely to cause cancer.
But the team didn’t stop there. Their next experiment used Evo 2 to design entire genome sequences (up to one million bases) from scratch. For context, the human genome has about three billion base pairs of DNA, but our cells also contain mitochondria — small organelles that generate ATP energy molecules by breaking down food — that carry their own genomes. A mitochondrion’s genome stretches about 16,500 base pairs in length and encodes 22 tRNAs, 13 proteins, and 2 ribosomal RNAs.
Arc researchers began by prompting Evo 2 with short snippets of real, human mitochondrial DNA and asking the model to extend the sequence into a full-length genome. Once Evo 2 had done that, researchers compared the AI-generated genomes to real mitochondrial sequences. The AI-generated genomes partly overlapped with natural sequences and encoded the same “core” genes as those found in real mitochondria — including the ribosomal RNAs and tRNAs.
Evo 2 was also used to design a yeast chromosome and a small bacterial genome. In the latter example, however, the designed genome was missing some critical elements, according to Hie, and so would likely not function if synthesized and inserted into a real bacterium.
Since genes ultimately determine protein function, the team used AlphaFold 3 to predict the structures of the AI-generated mitochondrial proteins. The results showed that the designed proteins closely resembled their natural counterparts, with pLDDT scores ranging from 0.67 to 0.83.2
While researchers had proven that Evo 2 could generate small microbial or mitochondrial genomes encoding plausible proteins, it still remained to be seen whether the model could also design DNA sequences with specific “states” or “behaviors” inside of cells — a much more difficult task.
In the human body, every somatic cell carries the same genome, but different cell types activate or silence specific parts of that genome depending on their function. A neuron expresses genes involved in brain function while “shutting down” heart-related genes, whereas a heart cell does the opposite. This selective gene expression is controlled by chromatin accessibility; basically, there are regions of DNA that “open” or “close” depending on chemical modifications and protein interactions. This regulation ensures that each cell carries out its unique role, despite possessing the same genome.
For a third experiment, then, researchers used Evo 2 to design DNA sequences likely to adopt an “open” or “closed” state within human cells.
To do this, Arc scientists used two existing deep learning models, called Enformer and Borzoi, to predict whether a given DNA sequence was likely to be “open” or “closed” in various cell types. Evo 2 was then used to generate multiple DNA sequences, each of which was evaluated using these models to determine how well they matched the desired chromatin pattern. Finally, the researchers used a method called beam search, allowing Evo 2 to iteratively refine its sequences by keeping only the best matches at each step.
The researchers tested whether increasing the number of sequences Evo 2 generated would improve its accuracy. And indeed, it did: Evo 2 seems to abide by the same scaling laws observed in AI as a whole. As researchers increased the number of sampled sequences, the model’s designs more accurately matched the “open” or “closed” chromatin state, with an AUROC greater than 0.9 (meaning Evo 2 gets it right about nine times out of ten.) Arc Institute researchers are collaborating with DNA synthesis experts at the University of Washington to validate these AI-generated sequences in mouse cells, says Hie.
These results suggest that Evo 2 could plausibly design DNA sequences with precise gene expression levels in different cell types. And that would be a big deal for medicine.
“If you have a gene therapy that you want to turn on only in neurons to avoid side effects, or only in liver cells, you could design a genetic element that is only accessible in those specific cells,” says co-author and computational biologist, Hani Goodarzi. “This precise control could help develop more targeted treatments with fewer side effects.”
Yet, like any powerful tool, Evo 2 cuts both ways. The model is open-source, which means “bad actors” could misuse it to design harmful sequences, including bioweapons. Fortunately, Evo’s developers built a series of biosecurity measures to make that more difficult.
Specifically, they excluded viruses that infect eukaryotic hosts from the model’s training data and tested the model to ensure it would not respond meaningfully to pathogen-related queries. The researchers also implemented safeguards to prevent the model from generating or modifying sequences associated with known biological agents.
But beyond these practical concerns lies a deeper question: What, if anything, is Evo 2 actually “learning”? Does the model truly understand underlying features of its training data, or is it simply regurgitating what it’s already seen (like a stochastic parrot, which generates plausible language, or DNA, but doesn’t understand its meaning.)
To answer these questions, the Evo 2 team trained a specialized model, called a sparse autoencoder, that acts like a decoder for the model’s internal, neuron-firing patterns. This sparse autoencoder basically takes high-dimensional data inside Evo 2 and breaks it into smaller, more interpretable pieces that a human can understand.
When this decoder was applied to a single neural layer of Evo 2, it revealed that the model had discovered fundamental biological concepts solely by training on DNA sequences. Evo 2 wasn’t just memorizing examples from its training data, but had developed internal representations that allowed it to predict everything from viral DNA signatures to protein secondary structures, like α-helices and β-sheets. Evo 2 accurately mapped the exon-intron architecture of a woolly mammoth genome, for example, even though it had never been exposed to the extinct mammal during its training.3
A publicly available mechanistic interpretability tool helps researchers figure out which parts of a DNA sequence the model focuses on during prompts.
“It feels like one’s child has left home.”
Bioengineers have long tried to design lifeforms with the same precision as classical engineering fields. But unlike a microprocessor, where circuits fire in predictable patterns, cells behave probabilistically; genes turn on and off in bursts, and some proteins shift functions entirely based on their surroundings.
These uncertainties partly explain why most efforts at genetic design — whether building a metabolic pathway or engineering a synthetic genome — require testing thousands of variations before finding one that works.
Despite these challenges, though, our ability to design biology is quickly improving. Just consider prime editing, a technology that allows scientists to insert, delete, or swap DNA bases. This tool was not discovered through brute-force screening, but was instead rationally designed by fusing a Cas9 nickase with a reverse transcriptase enzyme and then fine-tuning the tool until it worked reliably.
Evo 2 could be a meaningful step toward more predictable bioengineering breakthroughs, just like prime editing. We also expect its future uses will mirror, in some ways, the AI tool that overturned structural biology just a few years ago: AlphaFold.
When DeepMind “solved” the protein-folding problem at the 2020 CASP14 competition, many structural biologists felt blindsided. “From the perspective of a scientist who wants to see progress,” structural biologist Mohammed AlQuraishi told the Harvard Crimson, reaching this milestone was enormous. But his colleagues wondered if AlphaFold could spell the end for crystallography, or even structural biology as a whole. “It was bittersweet to think, suddenly, it’s done,” he said after writing a blog about the 2020 CASP event entitled: “It feels like one’s child has left home.”
The distress felt by structural biologists, though, soon gave way to optimism. If individual proteins could be solved quickly, they figured, then researchers would be liberated to ponder much deeper problems, like designing bespoke enzymes.
A similar arc may soon unfold across a much wider swath of biology. Scientists will begin using Evo 2 to design original DNA sequences and then test them in the laboratory (a sort of real-world validation that is mostly missing from this preprint). And although the model may not provide a final solution to the long-held dreams of bioengineers, it will at least give a glimpse of their future.
{{divider}}
Eryney Marrogi is a medical student at the University of Vermont, with experience in biological engineering from working on mosquitoes at Harvard, AAV at Dyno Therapeutics, and novel biosensors at Caltech. Find him on Substack.
Niko McCarty is a founding editor of Asimov Press.
Additional reporting by Alec Nielsen. Lead image by Ella Watkins-Dulaney.
Cite: Marrogi E. & McCarty N. “Evo 2 Can Design Entire Genomes.” Asimov Press (2025). DOI: https://doi.org/10.62211/45yp-23jh
This article was published on 19 February 2025.
{{divider}}
Footnotes
- OpenGenome2 is “a new dataset compiled from curated, non-redundant nucleotide sequence data, totaling over 8.8 trillion nucleotides from bacteria, archaea, eukarya, and bacteriophage,” according to the preprint.
- pLDDT stands for predicted Local Distance Difference Test. It is a type of confidence score, ranging from 0 to 1, that estimates how well AlphaFold’s predicted protein structure aligns with an experimental structure. Scores above 90 indicate high accuracy for both backbone and side chains.
- Many organisms don’t encode their genes as a continuous stretch of DNA. Instead, genes are split into introns and exons that get recombined to build the mRNA strand that, finally, is translated into proteins.
Always free. No ads. Richly storied.
Always free. No ads. Richly storied.
Always free. No ads. Richly storied.