But then, when it finally came time to make the libraries that were going to be used for sequencing the human genome by the Human Genome Project, the person that was the best person for making those libraries was a scientist who worked at Roswell Park Cancer Institute in Buffalo, New York. [The team] got informed consent from about 10 or 20 anonymous blood donors, and then picked one of those at random, and that was the person. About 60 percent of the human genome sequence generated by the Human Genome Project was from one blood donor in Buffalo, New York.
But, you know what, it doesn’t matter. If you go across the human genome sequence generated by the Human Genome Project, it is like a mosaic. You may go for a hundred thousand letters and it may be that one person, from Buffalo. It might end up being that you’ll go the next hundred thousand and it will be somebody else. And the next hundred thousand, somebody else. All that served as was a reference. And since all humans are 99.9 percent identical at the sequence level, that first sequence doesn’t have to be a real person. It can just be a hypothetical reference of a person.
Of all that information, why did you choose to focus on chromosome 7 [the human genome has 23 chromosomes]?
It was somewhat arbitrary. We wanted to pick a chromosome that wasn’t too big. We didn’t want to pick one that was too small. We knew there was going to be a lot of work, so we picked a middle-sized chromosome.
We didn’t want to pick one that had a lot of people working on it already. At that point, the most famous gene on chromosome 7 was the cystic fibrosis gene, and that was discovered in 1989. And we had actually isolated some of that region and were doing some studies in a pilot fashion.
The truth is, we picked it because it wasn’t too big, wasn’t too small and wasn’t too crowded. That was an arbitrary way to start; by the time the genome project ended, most of the studies were being done genome-wide.
How did the work change over the project’s lifetime?
The whole story of genomics is one of technology development. If you trace where the huge advances were made, every one of them were associated with surges in technology. Early in the genome project, the surge came in that we had better ways of isolating big pieces of DNA.
When we were sequencing smaller organism genomes—like Drosophila fruit flies—we basically industrialized the process of doing sequencing, making it more and more and more automated.
When the genome project began, the idea was, “Let’s sequence the genomes of flies and worms and yeast, all these smaller organisms, using the method of the day,” which was this method developed by Fred Sanger in 1977. The idea was they wouldn’t push the accelerator to start sequencing the human genome until a revolutionary new sequencing method became available. So there were a lot of efforts to develop new crazy ways of sequencing DNA.
When it came time, in around 1997 or 1998, to actually think about starting to sequence the human genome, everybody said, “Maybe we don’t need to wait for a revolutionary method, maybe we have incrementally improved the old-fashioned method well enough that it can be used,” and indeed that is what was decided.