Interview with Hamilton Smith and Richard Roberts by Sture Forsén and Nils Ringertz at the meeting of Nobel Laureates in Lindau, Germany, June 2000.
Hamilton Smith and Richard Roberts talk about the announcement of HUGO, the Human Genome Project; the number of genes in the human genome (5:16); how ‘Proteomics’ will help define the function of these genes (7:23); and the benefits and consequences of the HUGO project (17:45).
Three days ago, there was a major announcement, a major event in science, you could say, a joint announcement by groups in the United States and the United Kingdom about, I would say, a high-quality DNA sequence of the entire human genome and the significance of this event was underlined by the fact that two heads of state, Bill Clinton and Tony Blair, had joined the press conference. We are happy to have here among us two Nobel Laureates in Physiology or Medicine, Professor or Doctor Richard Roberts and Hamilton Smith who have been involved in various capacity in the human genome project. In fact, one of you was actually present at the White House during this press conference. Would you like to take it on from here?
Hamilton Smith: I’m actually an employee of Celera Genomics, which is a company in Rockville Maryland that has been single-handedly determining the human genome sequence in what many have called a race with the rest of the world, with the public efforts which carried on in the United States, England, Germany and Japan and I think some other countries. The White House ceremony was an attempt to get a cooperation between the public effort and the private effort, not a collaboration but a cooperation where we would agree to not say bad things about each other’s approaches or data or personalities and that we would try to do joint publications in a scientific journal, sometime near the end of the year 2000. Not papers that have joint authorship between the groups but separate publications where we present our accomplishments. I think it’s been a very good thing actually to do this, it takes the focus of the press coverage away from the warring parties and gets it back to what’s really important, namely getting the sequence and getting it out to the public.
The ceremony itself, President Clinton gave a short talk of about 4-5 minutes and then, by a TV hook-up in England, Tony Blair came on and said a few words in his best oratorical style and then President Clinton introduced briefly Francis Collins, who’s head of the public effort in the US and of course there were words said about the DoE part of the effort and there were nice things said all over the place. It was really a pretty good show and then Craig Venter was the last speaker. I was wondering when he went up to the microphones, whether there was anything left to be said but Craig, I think, handled this situation extremely well and got across how Celera fits into this accomplishment. So, we had a press conference afterwards. I think it was not only a truce between the public and private efforts but also an announcement that each of the groups had achieved their goals. The public human genome project announced that they had completed a working draft of the genome which represents about a 90% coverage of the genome but without a complete assembling of the genome and Celera announced that they had a 99% random coverage of the genome with an assembly of those sequences, so that genes are laid down linearly along the chromosomes.
Now that we have two independent groups who have come to an agreement about the general structure of the human genome, maybe you are able to answer the question, how many genes are there in the human genome. It’s a question you get when you lecture in this field and I’ve heard widely varying figures so, now if you get the question, what would your answer be?
Hamilton Smith: I think at this moment we still are in considerable doubt, but at Celera we’re also sequencing the mouse genome which will be complete sometime before the end of the year, to the extent that we can compare the mouse sequence to the human sequence. The protein coding regions of the mouse and human are sufficiently similar that they can be aligned, whereas the so called junk DNA, the 97% of the genome which doesn’t code for …
Richard Roberts: You’re talking about this stuff I discovered, are you?
Hamilton Smith: Yes, something like that.
Richard Roberts: The junk.
Hamilton Smith: Yes, the junk.
Hamilton Smith: … is more dissimilar, so it should be possible to simply read the two genomes in six frames and with high probability overlap the protein coding regions and get a fairly precise count of how many genes are present and that’s really the strategy that we’re pursuing. We have the mouse sequence, or we will have the mouse sequence and we’re the only ones in the world that will have it, so I think Celera will have the most accurate annotation.
Because people are betting on the number and there is a need for an authoritative answer to this question from the gambling point of view and from the human centred views that we entertain. To drive these large projects, the human genome project, obviously there have been precursors, people have got news from physics now that they are enormous projects, big physics and you now have big biology coming around, so once the human genome project has been completed, how will we proceed? I think we would agree that the most important thing now is to find out how the genome functions and we now enter then a type of research where we want to study the ultimate products, the proteins. Since there are many ways in which you can process the information from the DNA and the genome, at the RNA level, there are many more proteins to be found. This field of studying the proteins made in biological systems is called proteomics, so do you see any such big proteomics coming along and what can be done in terms of technology to speed up the analysis of the much larger number of proteins than the number of genes?
Hamilton Smith: I mean, I can say pretty much what we are planning at Celera. Celera I think is unique in its ability to do very large scale surveys. We don’t like really doing anything unless we can do 100,000 a day of whatever it is, so this is a way Craig Venter thinks and it allows us to do things that can’t be done easily in the university setting or in most other companies. What we’re planning of course is to move into proteomics using new instrumentation from PE Bio systems, mass spectrometers that can analyse tens of thousands of samples per day of protein. The plan would be to, the big interest initially I think is to see the spectrum of proteins that are being made in specific tissues, normal or diseased tissues, cancer, whatever. The plan would be to separate these proteins from say a cancer tissue on two-dimensional gels and then each single protein spot on the gel would be analysed in the mass spectrometer. Essentially hit by a laser blown into fragments of the protein which would then be matched with the genomic sequence and by computer, one should be able to predict the protein peptide spectrum that you would get for each of the genes in the human genome. Let’s say there are 50,000, so that we would get a virtually instantaneous identification of the particular gene product for each of the spots and we could say then that a given tissue is expressing certain genes and we form a database of this kind of information and again we sell that to our subscribers. This is sort of a first step of something that we can easily see ahead that we could produce this. I don’t know where we’re going beyond that but maybe Rich could.
Richard Roberts: The real problem with proteomics is that much of the technology that you need to relate function back to the gene is not in place at the moment, so one issue is well what is being expressed, how much is being expressed if we look at particular tissues, if we look at the brain, if we look at skin, if we look at liver, what kinds of proteins are taking place? At the moment we are able to look at what messenger RNAs are taking place, but we know also from other experiments that the amount of protein very often does not match the level of RNA, so we need both to be able to look at RNA and to look at the protein. One way to look at the protein is to use the kind of technology that Ham just talked about.
You made a point in one of the sessions at this meeting that once you study microbial genomes and there is a project called the minimal genome project which tries to define what’s the minimum number of genes. Wouldn’t such a system be the most suitable in finding out the function of various genes, because the human with this much larger genome and many genes is incredibly more complex?
Richard Roberts: Right. I think there are two separate issues, one issue relates to what is it that really makes life, what do you need, what is the minimum set of instructions that you need to make a living cell? And one way to do that is to take the very smallest cell that we know that is free living, as Clyde Hutchison is doing, and try to remove the genes that look as though they’re not necessary, get down to the minimum set, understand that in completion. Then one will at least know what is the minimum thing you need for life but that’s likely to be less than 500 genes and those 500 genes are a small set of what is present in a human cell. So, I think what will come from say the minimal genome project will be a working definition of kind of what is the minimum of life but there’s so much more to life than that. We need to know what is the precise biochemical function of each of the gene products, what reactions does it catalyse? That is something that for most proteins is not easy to do and we’re working ourselves at the moment on a bunch of proteins that are present in every organism for which the genome has been sequenced so far.
We think that this particular protein transfers methyl groups from S-Adenosyl methionine onto something, but we don’t know what and it’s not easy to find out, it’s not easy to prove that. But it is an interesting protein, it’s present in every genome but we don’t know what it does, no-one has stumbled upon it by genetics or by biochemistry yet, to know what it does and there are many such proteins. In the human genome, there are going to be thousands of these proteins for which we need to define function so what we need is to find high throughput ways, fast ways in which we can get clues to function. One high throughput way is to use computation to try to do that, so you look through the protein sequences, you try to find the little protein sequence motifs that in other proteins we know interact with ATP or they interact with DNA or they’re RNA binding proteins or whatever, so this can give you a clue, but you need to do the biochemistry afterwards to show that the clues that were given were correct or not and we don’t have good methods for doing that at the moment.
Hamilton Smith: Another approach to the minimal genome is the synthetic approach which I think is intriguing, creating “life” in the laboratory. The idea there would be again working with the mycoplasma genitalium is sort of the basic tools or parts for it. Having some idea from the other studies as to what genes are essential, one could actually make a synthetic chromosome that contains the set of proteins that have been identified as essential and then put that synthetic chromosome into a cell from which the natural chromosome has been removed and then see if you get something that will grow in the laboratory. Of course it probably won’t work the first time, maybe not the 100th time either, it could be somewhere to cloning Dolly and in the beginning you had to do hundreds of them before you got a successful attempt. But it would be a spectacular event if one could create a new genome.
Richard Roberts: I guess, when you start to think about the real importance of microbiological research, within the context of the human genome, the methodology that will need to be developed to understand how something as simple as mycoplasma genitalium work is going to methodology that can be applied to the human genome. So if we can learn to do this thing properly and if we can learn to do it in a fast manner for the small bacterial genomes, where in principle everything is a lot easier, we should be able to apply that methodology to these much more complicated systems too and in the meantime of course, we will find out much about what is really important in order to make life. What is it about these proteins and these genes that really makes something living as opposed to just a collection of chemicals in a test tube.
If we look at the biomedical benefits of the DNA sequence of the human genome, I’m sure when Jim Watson went to congress, he had many ideas on what the benefits would be and tried to convince the congress and I noticed also that Bill Clinton mentioned cancer, the cure of cancer would be something that would be following after the sequencing. How do you see the immediate consequences in the biomedical field of our knowledge of the human genome sequence?
Hamilton Smith: I don’t think I can foresee all of the benefits or consequences, we’re going to have to work into it gradually, but I it seems clear that it would facilitate much of the work that’s going on. A lot of work over the past few years has gone into hunting for genes in the genome and sub-cloning them and so on and so forth. This should short circuit all of that, I mean you should be able to in many cases find a gene or several duplications of the gene in the genome and proceed from that sort of jumpstart. I think an example would be, there are several groups of proteins that have demonstrated therapeutic benefits, for example the interferons and already we have an example by the genome sequencing of a new interferon which was previously not detected. With the whole genome you can look and often find members of a protein group that you didn’t know about, so these are new potential products. We could discover new epogen type proteins as well or growth factors that can stimulate certain tissues simply by analogy to ones that are already known.
Will it be possible to sequence a genome of individuals in the short time, in such a short time that it would be important in medical practice, in designing the therapy one is planning for a certain disease?
Hamilton Smith: Not with current technology, it’s too expensive. Eventually I think that we will need some sort of a physical method for single molecule sequencing. Once that arrives, we might be able to tackle the whole individual, but one of the big areas of effort now is large scale genotyping using various arrays of genes. My dream would be to be able to take a single drop of blood from an individual and within a few hours, determine 100,000 single nucleotide polymorphism mutations in that individual, I shouldn’t say mutations but indifferences in that person’s genome. In other words develop an immediate genotypical profile for an individual that could be used in judging what treatments would be best for that individual or what possible genetic diseases that person might encounter in life. I think that’s coming, probably in the near future.
Richard Roberts: I guess the real point that you’re getting at here is that one would like to take individuals and get some idea of their genetic makeup. One way to do that is to get the complete DNA sequence, but in fact you can get a lot of information without looking at the complete DNA sequence because as a result of these things called single nucleotide polymorphisms, we know that approximately every 300 bases or so, along with human genome, there is a region that varies, quite often from one individual to another. By just looking at those regions, in essence just sampling a one three hundredth of the genome instead of looking at the whole thing, one can actually tell a lot about the genetic propensity of various people.
For instance, we know that there are genes that if they go wrong, if they have some particular polymorphism, they have one sequence as opposed to another, that that leads to problems and the first classic example of this was sickle cell haemoglobin where we knew that a single base change in the DNA sequence for haemoglobin rendered the haemoglobin not quite so effective. This was a mutation that had been well kept within the human population in Africa because when you had heterozygous for this condition, when you had one sickle cell gene and one normal gene, you had resistance to malaria which, if you live in Africa, this is quite an important thing to have. That has been maintained in the population, even though the selection no longer applies among blacks who have moved from Africa into the US or into England or into Western Europe and they maintain this mutation because evolution is slow, it takes a long time for it to get out of, and there are many diseases for which this kind of thing is true.
You mentioned the probability of finding an SNP /- – -/ that the human genomes that have been sequenced a body, say a single group and then the sphere group have been different and can you already now see evidence of SNPs if you compare your sequences with each other?
Richard Roberts: Yes.
They shouldn’t be entirely equivalent, I would imagine.
Richard Roberts: No, basically what’s happened so far is that the main sequence that’s come from Celera is from one individual. There are other individuals who are being sequenced at Celera at a level sufficient to identify single nucleotide polymorphisms. The public human genome project, they have sequenced many individuals, a much larger number than Celera have been dealing with and so there are single nucleotide polymorphisms that are apparent within their data already and there is in fact something called the SNIP consortium which are a group of labs who are specifically looking for single nucleotide polymorphisms. These have been funded by both government agencies and by commercial companies and this data is all being placed in the public domain, so the answer is we know of a lot of snips already but we don’t know enough to do a complete genotype on someone.
Hamilton Smith: Nor do we know which of those snips really are clinically relevant, I mean the large proportion of them are probably pretty neutral changes. We don’t know how many would be medically significant or genetically significant.
Richard Roberts: But this is what will come out of the next stage of the human genome project where one tries to assign function and identify the genes, because many of these we know that these are important for this disease or for that, we will probably find homologs of some of these genes and we don’t know whether they’re important yet, so that will need to be tested. In many ways, we’re really at the beginning of the human genome project, not at the end, so even though we have announced we’ve gotten through this first stage, it very much is a beginning. Biology has undergone this revolution in the last few years where it’s gone from being really an observational science, in which people have been looking at phenomenon and trying to understand them and trying to figure out what was going on, to become a hard science like chemistry or physics where we can really now look at a complete genome and put some bounds on the problem. If we want to explain how a small bacterium works, we can say we’ve found there are 4,000 genes or 5,000 genes, we need to explain this organism in term of these 5,000 genes, if we’re going to do a genetic experiment in which we change one of these genes and we know what to look for, we know we have to look and see what happens to all of the other genes, in order to begin to understand. I think we’re very much at the start of biology, which is a wonderfully exciting time for us, you know I mean this is the time I would love to be a graduate student again, this is the time to be a graduate because there are many discoveries to make, many more Nobel Prizes coming out of this field.
Hamilton Smith: In science, one tends to go from simple to complex and then hopefully back to simple again. We’re in the complex phase right now.
But now, if you are a graduate student today looking for what to do during a career in biology and you see these large enterprises doing all the sequencing, so all the sequences will be available, you can’t get the PhD out of sequencing anything as an individual.
Richard Roberts: We would hope not.
So your supervisor buys a licence maybe, to have the detailed sequence for a certain genome, so does that leave the individual graduate student then with trying to define the role of specific proteins or signalling systems?
Richard Roberts: That’s certainly one possibility. Basically what you have is this enormous textbook, except that instead of being a textbook with diagrams and clear headings telling you what everything is, you’ve got a textbook, it’s full of words but we don’t know what the headings are and we don’t know how to put in the diagrams to explain how this little bit relates to this little bit. This is for the graduate student to start to work out.
Hamilton Smith: Each gene with an unknown function is a PhD degree, if you can figure it out.
Did you find any typos in this text? We would appreciate your assistance in identifying any errors and to let us know. Thank you for taking the time to report the errors by sending us an e-mail.
See the full schedule