Junk DNA Isn’t Junk, and That Isn’t Really News

News that about 80 percent of our DNA is functional might surprise some, but won’t surprise geneticists

Micah Baldwin

Remember in high school or college, when you learned about all that DNA inside of you that was junk? The strings and strings of nonsense code that had no function? A recent blitz of papers from the ENCODE project have the world abuzz with news that would rip that idea apart.

But, like many things that stick around in text books long after science has moved on, the “junk DNA” idea that ENCODE disproved, didn’t really need disproving in the first place. Even in 1972, scientists recognized that just because we didn’t know what certain DNA regions did, didn’t make them junk.

Their press release might have been quite exciting:

The hundreds of researchers working on the ENCODE project have revealed that much of what has been called ‘junk DNA’ in the human genome is actually a massive control panel with millions of switches regulating the activity of our genes. Without these switches, genes would not work – and mutations in these regions might lead to human disease. The new information delivered by ENCODE is so comprehensive and complex that it has given rise to a new publishing model in which electronic documents and datasets are interconnected.

And even The New York Times’ Gina Kolata bought the hype:

Now scientists have discovered a vital clue to unraveling these riddles. The human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave. The discovery, considered a major medical and scientific breakthrough, has enormous implications for human health because many complex diseases appear to be caused by tiny changes in hundreds of gene switches.

But blogger and Berkeley biologist Michael Eisen explains the trouble with both the press release and the press coverage thus far:

It is true that the paper describes millions of sequences bound by transcription factors or prone to digestion by DNase. And it is true that many bona fide regulatory sequences will have these properties. But as even the authors admit, only some fraction of these sequence will actually turn out to be involved in gene regulation. So it is simply false to claim that the papers have identified millions of switches.

Even Ewan Birney, the scientists who did the data analysis for the ENCODE project, tried to clear up the confusion. He explains on his blog that the claim in these studies—that about 80 percent of the genome is “functional”—simply means that about 80 percent of the human genome has biochemical activity. Birney writes:

This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.

And even Birney isn’t actually surprised by the 80 percent number.

As I’ve pointed out in presentations, you shouldn’t be surprised by the 80% figure. After all, 60% of the genome with the new detailed manually reviewed (GenCode) annotation is either exonic or intronic, and a number of our assays (such as PolyA- RNA, and H3K36me3/H3K79me2) are expected to mark all active transcription. So seeing an additional 20% over this expected 60% is not so surprising.

That isn’t to say that ENCODE’s work isn’t interesting or valuable. Ed Yong at Not Exactly Rocket Science explains that while ENCODE might not be shattering our genomic world, it is still really important:

That the genome is complex will come as no surprise to scientists, but ENCODE does two fresh things: it catalogues the DNA elements for scientists to pore over; and it reveals just how many there are. “The genome is no longer an empty vastness – it is densely packed with peaks and wiggles of biochemical activity,” says Shyam Prabhakar from the Genome Institute of Singapore. “There are nuggets for everyone here. No matter which piece of the genome we happen to be studying in any particular project, we will benefit from looking up the corresponding ENCODE tracks.”

Interesting and important yes. But is it shocking to find that a lot of our DNA has a function? No.

More from Smithsonian.com:

Books of the Future May Be Written in DNA
Quick and Cheap DNA Sequencing On the Horizon?