Deciphering the Genetic Code, 1958-1966

Before the genetic code could be deciphered, before scientists could understand the process by which deoxyribonucleic acid (DNA) directed the synthesis of proteins, they had to resolve a final mystery: as Francis Crick and other researchers insisted, there must be a messenger to transmit genetic information from the cell nucleus to the cytoplasm, a messenger that was almost certainly made of ribonucleic acid (RNA). But what was its exact nature? Scientists had found notable amounts of RNA at the ribosome, the site of protein synthesis in the cytoplasm, and had assumed that this RNA was the postulated messenger. Each ribosome, according to this assumption, synthesized just one protein.

However, the assumption that ribosomal RNA (rRNA) was the messenger conflicted with other findings, namely that the main sections of rRNA occurred in only two lengths, whereas the polypeptide chains for which this RNA supposedly coded differed greatly in length; and secondly, that the relative amounts of the bases in rRNA were fairly constant, whereas their relative amounts in DNA varied widely from species to species. (The sequence of the bases in rRNA, as opposed to the relative amounts of its bases, would not be known for several more years.) Moreover, Arthur Pardee, Fran├žois Jacob, and Jacques Monod in their famous "PaJaMo-experiment" had produced evidence that protein synthesis commenced soon after the introduction of a gene into a cell and that it proceeded at a fast, steady rate. By contrast, the theory that ribosomal RNA was the messenger predicted that protein synthesis would start up gradually, as the newly-introduced gene first had to produce the ribosomes at which protein synthesis was to occur.

If ribosomal RNA could not be the messenger, then what was? The question was resolved during a decisive meeting at King's College, Cambridge, on Good Friday, 1960, between Jacob, Sydney Brenner, Crick, and a handful of other researchers. A few years earlier, in 1956, two scientists working with a virus that infected a bacterium found in the bacterium small amounts of a form of ribonucleic acid (RNA) that had the same base composition (the same proportion in the amount of bases) as the DNA of the virus. Their finding and its significance had remained unexplained. During the meeting, Brenner had the sudden insight that this form of RNA must be the messenger because it replicated the base composition of the virus, not of the infected bacterium or its ribosomes, where virus-directed synthesis of proteins was unfolding. Messenger RNA (mRNA) was found in such small amounts that it had previously eluded detection because it was needed only for short periods of time during protein synthesis. It then degraded, to be used again in making a copy of another stretch of DNA. Brenner and the others concluded that the ribosome was just an inert reading head that could synthesize any type of protein while it traveled along the messenger RNA, reading off the bases in sequence.

With the basic concepts of genetic control of protein synthesis in place, what remained to be explained was how the genetic code worked, that is, how genetic information was transcribed from DNA to messenger RNA to protein. In an article published in Nature on December 30, 1961, Crick, Brenner, and their team described how, by inducing successive mutations in a virus that attacks the bacterium Escherichia Coli, they obtained evidence that the chemical code embodied in a gene consisted of groups of three bases which do not overlap, or share bases. The mutants studied were acridine mutants, meaning they had been exposed to the potent mutagen proflavine, a bright yellow dye derived from the coal tar chemical acridine. As Crick correctly surmised, acridines slip in and out between the bases of the virus RNA (the virus they studied was of RNA, not DNA), leading to the insertion or deletion of a base on the complementary chain during gene replication. Such insertion or deletion of a base in the viral RNA led to a "phase shift": given that, according to the sequence hypothesis, the sequence of the bases was to be read in linear fashion, from a fixed starting point and in one direction, the addition or deletion of a base would throw the reading of the base sequence out of step (out of phase) from the point of mutation onward. Consequently, proteins synthesized from viral RNA past the point of mutation were deformed, and could not perform their usual functions; the virus the team worked with was rendered less infectious, as could be determined by observing the bacterial cultures on which it preyed in the Petri dish.

During a period of several weeks spent in the laboratory, Crick and his collaborator Leslie Barnett induced a second acridine mutation (a second addition or deletion of a base), leading to a second phase shift. When they cross-bred mutants yet again to produce one with a third base added or deleted (a laborious process), they noticed that the virus regained its infectiousness, meaning that the proteins synthesized from RNA past the point of the third mutation once again had a normal shape and function. Their inevitable conclusion was that if either three additions or three deletions of a base restored the initial sequence of the bases, then the code for reading the base sequence must be a three-letter, non-overlapping code (meaning that adjacent triplets do not share a base). The same genetic experiments suggested that the genetic code is "degenerate," meaning that one or more triplets (which Seymor Benzer had named codons) can code for a particular amino acid (for example, one amino acid, leucine, may be specified by any of six different codons), while other codons marked the start or the termination of protein synthesis. The degenerate nature of the code explained how 64 triplets could code for twenty amino acids.

Even as Crick and his collaborators were deciphering the code by genetic methods, Marshall Nirenberg and Heinrich Matthaei offered the first direct biochemical evidence that RNA sequences code for specific amino acids. When they crushed cells and dosed them with poly-U, a synthetic stretch of RNA composed of only one kind of base, uracil (which in RNA is used in place of thymine), they observed that the RNA triggered the synthesis of a polypeptide chain with only one amino acid, phenylalanine. Crick immediately recognized the importance of Nirenberg's finding that UUU coded for phenylalanine when it was first reported at the 1961 International Biochemical Congress in Moscow. He made sure that Nirenberg received a wide hearing by inviting him to repeat his seminar presentation at a large plenary meeting, where Crick presided.

By 1966, the codons for all twenty amino acids as well as for the start and termination of protein synthesis had been identified, several in Crick's own laboratory. Genetic research has since shown that some of Crick's theories of the genetic code require qualification and refinement. Crick's Central Dogma allowed for the flow of genetic information from RNA to RNA, which indeed occurs in the case of certain RNA viruses such as the flu and polio viruses. Contrary to misconceptions shared even by some scientists, the Central Dogma does not posit only a linear, forward information transfer from DNA to RNA to protein. However, in his theory Crick did not foresee the flow of information from RNA to DNA, a process that is known as reverse transcription and that is used by RNA retroviruses, such as the Human Immunodeficiency Virus (HIV) and some tumor viruses. Moreover, sequencing of the human genome (the entire complement of genes in an individual) has shown that half of our genome is made of so-called retroelements, sections of DNA that act on other sections in a complex regulatory process. Only a small percentage of our genome consists of regions that code for proteins, i.e. the regions we call genes. Nevertheless, Crick's fundamental insight that the sequence of the bases in DNA forms a code that specifies the synthesis of proteins, and his subsequent work in deciphering the three-letter code, have endured and have become the basis of modern genetics and genomics.