That said, since the genome project, the thing that has changed the face of genomics has been revolutionary new sequencing technologies that finally came on the scene by about 2005.
How have those improvements changed the cost and the times it takes for sequencing?
The Human Genome Project took six to eight years of active sequencing and, in terms of active sequencing, they spent about a billion dollars to produce the first human genome sequence. The day the genome project ended, we asked our sequencing groups, “All right, if you were going to go sequence a second human genome, hypothetically, how long would it take and how much would it cost?” With a back of the envelope calculation, they said, “Wow, if you gave us another 10 to 50 million dollars, we could probably do it in three to four months.”
But now, if you go to where we are today, you can sequence a human genome in about a day or two. By the end of this year, it will be about a day. And it will only cost about $3,000 to $5,000 dollars.
What were the major findings from the first genome and the ones that followed?
There are new findings that come everyday. In the first 10 years of having before us the human genome sequence, I think we on a day-by-day basis accumulate more and more information about how the human genome works. But we should recognize that even 10 years in, we are only at the early stages of interpreting that sequence. Decades from now we will still be interpreting, and reinterpreting, it.
Some of the earliest things that we learned, for example: We have many fewer genes than some people had predicted. When the genome began, many people predicted that humans probably had 100,000 genes, and they would have substantially more genes than other organisms, especially simpler organisms. It turns out that is not true. It turns out that we are a much lower gene number. In fact, we are probably more like 20,000 genes. And that is only a few thousand more than flies and worms. So our complexity is not in our gene number. Our complexity is elsewhere.
The other surprise came as we started sequencing other mammals—in particular, mouse genome, rat genome, dog genome and so forth, and by now we have sequenced 50, 60, 70 such genomes. You line up those genome sequences in a computer and you look to see where are sequences that are very conserved, in other words across tens of millions of years of evolutionary time, where have the sequences not changed at all. Highly, highly evolutionary conserved sequences almost for sure point to functional sequences. These are things that life doesn’t want to change and so they keep them the same because they are doing some vital fundamental function necessary for biology. Going into the genome project, we thought the majority of those most conserved regions that were functionally important were going to be in the genes—the parts of the genome that directly code for proteins. It turns out, the majority of the most highly conserved and inevitably functional sequences are not in protein coding regions; they are outside of genes.
So what are they doing? We don’t know all of them. But we know a lot of them are basically circuit switches, like dimmer switches for a light, that determine where and when and how much a gene gets turned on. It is much more complicated in humans than it is in lower organisms like flies and worms. So our biological complexity is not so much in our gene number. It is in the complex switches, like dimmer switches, that regulate where, when, and how much genes get turned on.
What do we have left to figure out?