Steven Puetzer/Getty Images

We Call DNA a Language. Is It?

As our understanding of the genome evolves, the metaphors we use to describe it should, too.

Having the complete sequence of the human genome is similar to having all the pages of a manual needed to make the human body.
–National Institutes of Health’s Human Genome Project Completion Report, 2010

Fifty years before we solved it, long before its information began filling scientific headlines and pharmaceutical shelves, and even before we understood the basic fact that DNA is a double helix, the human genome was being framed as a type of language. In 1952, the physicist Edwin Schrödinger turned an eye towards biology. He was already famous for his work in quantum mechanics, including postulating his famous eponymous cat, which sat alone in a box, both alive and dead, until you looked inside the box to see if it was one or the other. He was influential in biology as well, postulating that there must be a code script contained in DNA bases connected by covalent bonds that was capable of transmitting heredity information. 

From the very beginning of our nearly 75-year love affair with this gooey substance, DNA was discussed as a language. And if DNA was the language, then the genome was the book—not just a container, but a manual. DNA began to be called the language of life, and many thought if we could read it, then we would understand what it means to be human. This analogy of DNA as a language and the genome as a book is so pervasive, we can hardly discuss genetics without it. Scientists talk and write about reading DNA, of DNA primers, of editing DNA, and of transcribing and transposing it. Language analogies have been used in genetics to describe a host of technical topics ranging from mutation to selection, neutral drift, recombination events, lateral gene transfer, and punctuated equilibrium. 

If our understanding of a thing like DNA changes, the metaphors we use to describe it should change as well, or else they can encumber further scientific discovery.

Yet despite the analogy’s ubiquity and apparent utility as an informal framework for genetic research, we can consider why it works and assess whether it may sometimes sag under its own weight once we lard on all our unspoken aspirations for it. This investigation matters because while we still lock our genome into the box of this 75-year-old analogy, our actual understanding of it has progressed profoundly. As the famous 20th century philosopher of science Thomas Kuhn recognized, metaphor is part of theory. If our understanding of a thing like DNA changes, the metaphors we use to describe it should change as well, or else they can encumber further scientific discovery.

Whose line is it anyway?

If DNA is like a human language, then is it like English, with its countless exceptions and puzzling spelling? Or maybe it’s more like German, with its highly specific concepts? Is it more like Hindi, with its phonetical precision? Or for us organisms who reproduce through sexual reproduction, maybe it’s more of a romance language. Maybe the genomes with which evolution flirts are more French livres or Italian libres. But even asking which human language DNA is like raises the larger question: What do we mean when we compare DNA to a language? 

Take that question a step further. In languages that use the Roman script, there are 26 letters and a host of separate markers, such as the period and question mark, to punctuate sentences. In DNA, there are just four letters, and it is basically the sequences of those letters that constitute the punctuation scheme. 

When we say DNA base pairs are letters, does that imply genes are words? The average protein coded for by a gene is composed of around 500 amino acids, making the average gene more than a thousand base pairs long. Talk about a long average word! Much longer than Pneumonoultramicroscopicsilicovolcanoconiosis—the longest word in the English language—a lung disease also known as silicosis that results from chronic inhalation of silica dust. 

If the DNA of an organism is like a book or manual, some organisms have concise editors. One puffer fish has a genome just 340 million nucleotide letters. Yet swimming next to it is the marbled lungfish with a genome almost 400 times as long. If you thought humans, all smart and sophisticated, would have a longer sequence of DNA than a marbled lungfish, you’d be quite wrong. At roughly 3 billion base pairs, humans have only a medium-length novel—hardly the mighty tome of the marbled lungfish. The Japanese flower Paris japonica, diminutive in its small white star-like petals, is even more long-winded than the lungfish. If you read its 150 billion base pair-long genome one DNA base pair per second, you’d be reading its genome for almost 5,000 years. So much to say in one little flower—fifty times the length of your typical human manual! Do we underestimate the complexity of this flower? Or is DNA and the genome in which it resides not exactly a manual but something else?

A man, a plan, a canal—Panama

Viruses meanwhile can be the most meticulous of editors, crafting meaning in tweets or haikus. They can even be palindromic—having meaning read front to back or back to front—like “A man, a plan, a canal Panama.” They are pithy writers.

Sometimes there are even strikeouts in the genomic text like this one, which say please don’t read this sentence. This is what happens when a stretch of DNA becomes bounded by methyl groups and is central to epigenetic activity. 

To be clear, for research biologists and geneticists hearing terms such as DNA transcription likely doesn’t bring to mind monks copying the bible any more than the average music listener would think of insects when hearing a reference to The Beatles. These networks of terms, originally inspired by analogies to language, have taken on specific meanings in biology. But for the public and scientists outside of genetics, the original meanings of these terms continue to influence how genetics and DNA are understood. Take the expression reading DNA. The Human Genome Project allowed us to sequence the genome, but that was a far cry from reading it. The initial sequencing work was more akin to receiving a text in a language you don’t understand—sure, you can make out the letters, but the meaning is largely unintelligible. Countless research studies since then have allowed us to decipher some of the text.

Equating sequencing the genome with reading the manual has resulted in harm.

Practically, equating sequencing the genome with reading the manual has resulted in harm. One way this has shown up is in bias and genetic discrimination. Courts have asked, for instance, whether there is a gene for violence—a question implying a simplistic relationship between single genes and complex behaviors. Even if some studies show an association between certain genes as predictors of violence, correlation does not equal causation, and the sheer volume of genes often results in statistical correlations with whatever outcome is being investigated. A more scientifically informed judiciary may avoid this type of armchair theorizing or, if they must speculate, instead wonder if there are epigenetic triggers of violent behavior or other adaptive developmental and learned behavioral responses—not to mention societal things like systemic inequality, racism, diseases of despair, or even gun laws that contribute to that violence. These questions would be profoundly different insofar as they would surface discussions of changeable social policies and not unchangeable genetics. 

Similarly, in some instances the claims of genetic testing kits exceed known science. Finding out one has the “gene for anxiety” or the “gene for depression” is a huge oversimplification and can be traced to our naive mental model of a genetic manual where each attribute or behavior has a corresponding section in the genomic manual.

A more general problem with the use of the manual metaphor: It is now clear there are multiple inheritance systems that interact in evolution and organism development through vastly complicated regulatory feedbacks—particularly in complex organisms. The stark visual distinctions between nerve, muscle, and red blood cells are primarily due to developmental differences and epigenetic expression, not to differences in the genome. If the DNA manual were all that mattered, it would be impossible to account for cellular differentiation and specialization.

While these findings may give us pause in continuing with these language metaphors, we can creatively recompose them so long as we recognize them as metaphors and analogies and not biological inheritance and development itself. Maybe the genome is like the plain text of a document and other inheritance systems are the stylings. Epigenetic, cellular, and cultural inheritance systems then take this text and convert it to other forms, for instance an audiobook where intonations create works that supersede the original meanings of the plain text. These audio recordings become their own inheritance systems, not subject to the limitations in the original text file. 

Ultimately, the richness of the language analogy for DNA may be due to the fact that languages—like genomes and organisms—evolve, albeit in different ways. Linguistic researchers, in fact, have borrowed genetic and evolutionary tools to analyze changes in the evolution of languages, just as the field of biomimicry looks to nature to inspire new synthetic designs.

That these multiple inheritance systems seem to arise from one another in intricate organisms is part of the profundity of evolution that we continue to uncover. If evolution is a story and DNA a script, the full play and stage directions are still being discovered. 

Go Deeper