Reprinted from the 27th Symposium of the Society for Developmental Biology DEVELOPMENTAI. BIOLOGY SUPPLEMENT 2, 1968 Copyright @ 1968 by Academic Press Inc. Printed in U. S. A. DEVELOPMENTAL I~IOLOGY SUPPLEMENT 2, 1-20 (1968) I. SELF-ASSEMBLY OF MACROMOLECULAR STRUCTURES Spontaneous Formation of the Three-Dimensional Structure of Proteins CIXRISTIA~ B. ANFINSEN Laboratory of Chemical Biology, National Institute of Arthritis and Metabolic Diseases, National lmtitufes of Health, Bethesda, Maryland INTRODUCTION Our major consideration in this symposium will be the emergence of order during cellular differentiation and growth. The concept "emerging order" implies an organized, genetically complex process taking place over a reasonably extended stretch of time. In contrast, the restatement of linear genetic information in the form of three- dimensional protein structure results from a rapid and spontaneous interaction of amino acid side chains with each other, with the com- pleted polypeptide backbone, and with the environment, without the necessity for additional genetic information ( Anfinsen, 1967; Epstein et al., 1963). The achievement of this unique geometry might be visualized as a rather helter-skelter process. An almost infinite number of sets of interactions are possible as an extended polypeptide chain coils upon itself (Fig. 1). If the process of folding involved even a small fraction of this number of conformational states, the specific folding of the chain could clearly require considerable time. It is prob- able that the rapidity of folding is made possible through the forma- tion of one or more "nucleation sites" by side chain interactions that would predispose, during subsequent interactions, to the tertiary struc- tural characteristics of the native structure. The only obvious driving force during this approach to native conformation is the selection of progressively more stable conformations with ultimate fixation of geom- etry in the form possessing the most favorable free energy of conforma- tion, the native protein. Thus, unlike the complex predetermined pattern of successive changes occurring during differentiation, the cell must rely, in its first steps of development, on a relatively random 1 @ 1968 by Academic Press Inc.. 2 CHRISTIAS B. AKFINSEN process but involving explicit information-the amino acid sequence of a polypeptide chain. It has been suggested (Phillips, 1967) as an alternative mechanism that a polypeptide chain may progressively assume a three-dimensional conformation similar or identical to that which it occupies in the com- pleted protein molecule, as synthesis proceeds from the NH,-terminus toward the COOH-terminal end of the chain. However, the weight of evidence available at the present time, some of which I shall mention FIG. 1. Schematic drawing showing the conversion of an extended polypeptide chain to a native protein. During this oxidative process, sulfhydryl groups are paired to form disulfide bonds, and amino acid residues, widely separated in a linear sense, are brought into spatial proximity to form an active center. below, appears to be consistent with a process in which tertiary struc- ture appears only upon completion of translation of the genetic quan- tum of information. With the exception of the svnthesis of certain RNA molecules, the , information in a chain is expressed in a form useful to a cell as linear "bursts" of polypeptide chains. Each chain represents the raw material for a function that is performed by the corresponding protein molecule. Evolution in its simplest form has consisted of the continuous selection POLDIXG OF PROTEIKS 3 of organisms on the basis of the adequacy of the summation of their proteins to constitute a cell system favorable to self-reproduction under the current ecological situation. The sequences of the polypep- tide chains that are synthesized are so constituted that they assume, in a spontaneous manner, unique geometric shapes that are endowed with the function in question. Most of our information has come from a study of proteins that contain disulfide bonds as cross-links and the reversibility of refolding has been tested by examining the reformation of correct pairs of half-cystine residues, together with the restoration of biological ac- tivity and various physicochemical properties. The statistics of the situation are shown in Table 1, which lists the number of possible ways in which a given number of half-cystine residues can combine Number of bonds Sumbcr of rombinations 1 1 2 3 3 16 4 105 5 945 6 10395 T IS.`, 135 s 20'27025 0 344594% IO Gi'i'29oi5 II 1374051Odi:i 12 316'134143"'> I d-J 13 i905853iX06"5 1c -. 14 21345X046676875 1.5 619OX~353639375 16 191898i8396'LjlOB"; 17 ~i:1:~28598707~2850625 1x "`2 16430954766997i 1x75 19 8'22007945326378915593i5 20 31983098677287i770S1~~~5 81 13111307045768i98860344062.5 22 563862W9680583509947946875 23 25373i91335626257947657609375 24 1192568192774434123539907640~25 26 5X43584144~~~47272~.i:1455474~~OB'L5 4 CHRISTIAN B. ANFINSEN to form SS bonds upon oxidation. These numbers show, for example, that in the case of the u-globulin molecules, the random chance of forming the correct 23 SS bonds from the available 46 halkystine residues is 1 in 2 x 10Z8. In the case of pancreatic ribonuclease, which contains 8 half-cystine residues, 105 possible sets of 4 SS bonds can be made, only one of which is the native structure. Since much of the evidence for the spontaneity and uniqueness of polypeptide folding has been summarized earlier, I shall present here only a schematic picture. Figure 2 depicts the renaturation of what we have called a FIG. 2. The spontaneous conversion of a randomly crosslinked protein deriva- tive to the native form under conditions favoring disulfide interchange. Structural regions of the molecule that are involved in the active center are indicated 1'~ crosshatching. "scrambled" ribonuclease molecule. After complete reduction of the 4 disulfide bonds in the native protein, the reduced random chain was allowed to reoxidize under conditions leading to a random mixture of disulfide bonds ( Haber and Anfinsen, 1962)) shown diagrammatically in the upper portion of the figure. The thermodynamic instability of this scrambled mixture is demonstrated by the observation that expo- sure to conditions favoring disulfide interchange induced rapid rear- rangement of the disulfide bonds with the formation in almost quanti- tative yields of the native enzyme with its correct SS pairs. By using as a catalyst for the interchange process an enzyme from microsomal membranes that we have recently isolated, the renaturation process can be made to occur in vitro (Fuchs et al., 1967) at a rate which is quite consistent with the estimated length of time required for the synthesis of a ribonuclease molecule in viva, namely about 2 minutes (Dir&is, 1961; Canfield and Anfinsen, 1963). This experimental result militates against the concept of obligatory progressive folding during FOLDING OF PROTEIXS 5 the NH,-terminal to COOH-terminal synthesis of the chain since the scrambled collection of isomers is devoid of the features of tertiary structure that one finds in the native enzyme. We have recently carried out some pertinent experiments on the thermodynamic stability of the RNase derivative, RNase-S (Kato and Anfinsen, unpublished results). This material, prepared by the con- trolled cleavage of a single bond between residues 20 and 21 in bovine pancreatic ribonuclease by the enzyme, subtilisin, may be separated into its two noncovalently bonded components, RNase-S-protein and RNase-S-peptide ( Richards and Vithayathil, 1959). The former, con- taining all the four disulfide bonds of the native protein, is inactilre without the addition of the peptide moiety. To test whether the S-protein portion contained sufficient information to determine the specific folding that would lead to proper pairing of the eight half- cystine residues, samples were subjected to conditions of disulfide interchange under catalysis by the rearranging enzyme from micro- somes mentioned above. This enzyme, after prereduction of its single essential SH group, will catalyze disulfide rearrangement without need for added mercaptoethanol or other SH reagent. As summarized in Fig. 3, addition of the enzyme to S-protein solutions caused rapid loss of the capacity of the S-protein to be activated by addition of 1.3 equivalents of S-peptide. Peptide maps of pepsin digests indicated the presence of random SS pairing. [The residual activity may represent material which does not contain all the normal four SS bonds of ribo- nuclease. The recent observations of Neumann et nl. (1967) on the preparation of a fully active derivative of RNasc containing only two intact disulfide bonds indicate that two of the native disulfide linkages in this protein are superfluous from the standpoint of in vitro activity. Consistent with this view is the observation that fully redriced S- protein, when allowed to oxidize in the absence of S-peptide, with complete conversion of its 8 SH groups to 4 SS bonds, yields low lel,els of active material (Kato, unpublished; Haber and Anfinsen, 1961) .] Upon addition of S-peptide to the largely inactivated S-protein solu- tion, the bulk of the activity was regenerated. Similar conclusions may be drawn from parallel experiments in which the formation of intermolecular, disulfide bonded aggregates of S-protein was studied in the presence and absence of S-peptide (Fig. 4) by turbidity measurements. Once again, the information contained in the S-peptide portion of RNase-S was required to deter- 6 CHRISTIAN B. ANFINSEN mine the native structure which, by inference, must represent the most thermodynamically stable form. Experiments similar to those I have just described for ribonuclease and S-protein have been carried out on a wide variety of protein molecules, both large and small, and the phenomenon appears to be a general one (Anfinsen, 1967). Perhaps the most dramatic example is 80 I 1 IO0 200 TIME, MINUTES 300 _I FIG. 3. Inactivation and disulfide interchange of native RNase-S-protein cata- lyzed by prereduced interchange enzyme (I. Kato and C. B. Anfinsen, unpub- lished results; Fuchs et al., 1967). The arrow indicates the time of addition of RNase-S-peptide (1.3 equivalents relative to S-protein) to the reaction mixture. RNase-S-peptide (1.3 equivalents) was added to aliquots taken prior to the time marked by the arrow, and the mixtures were assayed for RNase activity. given by recent studies by Freedman and Sela (1966) on ~-globulins. Both Haber (1964) and Whitney and Tanford (1965) showed that the (Fab) 2 fragment of 7 S yG antibodies, produced by papain digestion, could be subjected to full reduction of SS bonds with subsequent restoration of significant levels of specific antibody activity upon reoxidation. Freedman and Sela were able to repeat such experiments using undegraded, native antibody molecules by the trick of massive polyalanylation of the c-amino groups of the purified rabbit-antibovine serum albumin. The addition of DL-polyalanyl side chains on proteins and polypeptides has been shown, in several instances, to confer much FOLDISG OF PROTEINS 7 greater solubility on the produots than that shown by the unpep- tidylated material. The 23 disulfide bonds of the protein (whose immunological activity was unchanged by the peptidylation) could then be reduced without formation of the otherwise insoluble, reduced heavy chain, a product of reduction that had been avoided by use of papain fragments in the earlier experiments. The reduced forms of the soluble, polyalanylated light and heavy chains were reoxidized sepa- rately and finally recombined through oxidative formation of the -.-.-A-._ ~ -.L IO 20 30 40 5'3 TIME, MINUTES FIG. 4. Disulfide interchange in S-protein as evidenced by turbidity forma- tion. S-protein ( 1 mg/ml) was incubated in 10.' hl p-mercaptoethanol, 0.1 M Tris hnffer, pH 7.4. h---A; with interchange enzyme (7 pg/ml); O-0, with- out enzyme; e---a, with enzyme (7 pg/ml) relative to S-protein. and 1.3 equivalents of S-peptide interchain SS bonds to yield regenerated y-globulin with over 50% of the initial antibody activity. The unlikelihood of this process, unless completely determined by amino acid sequence, is certainly empha- sized by the figures listed in Table 1. For completeness I should mention that certain polypeptide systems 8 CHRISTIAN B. ANFISSEN can form native tertiary structures only in the presence of ligands, such as metal ions and prosthetic groups. In the case of Taka-amylase, for example, which contains 9 half-cystine residues, the final formation of the fourth SS bond and the preservation of the remaining single SH group is dependent upon the addition of calcium ions (Friedmann and Epstein, 1967). S imilarly, the final native structure of myoglobin is achieved only when heme is added to the slightly "relaxed" apomyo- globin structure (Schechter and Epstein, 1968; Harrison and Blout, 1965 ) . FUNCTION AND GEOMETRY The increasing library of sequence data on functionally related pro- teins has made it extremely likely, simply on the basis of sequence homology, that many groups of these macromolecules have been derived from the same primordial ancestral protein molecule. Further- more, the crystallographic information available on the heme proteins, myoglobin, and the hemoglobins, indicates that three-dimensional structure has been preserved in the face of very large changes in the details of amino acid sequence. Thus, a particular spatial arrangemeut of the polypeptide chain has been "imprinted" and a variety of solu- tions to the geometric problem have been evolved. Although natural selection obviously operates at the level of the organism, this principle of "conservation of geometry" at the protein level seems likely to be a central molecular mechanism in evolution. A stereochemical arrange- ment consistent with a particular kind of function, once established through chance mutation of a primordial gene, would become estab- lished in a line of organisms because of its selective advantage. Because of such considerations, the problem of determining the nature of the forces that determine and stabilize three-dimensional structure is now a major concern of protein chemists. The role of hydrophobic side chains in the internal stabilization of protein struc- ture in solution was examined theoretically by Walter Kauzmann in 1959 (Kauzmann, 1959). Recent crystallographic work has clearly confirmed the predominant location of such side chains within the interior of proteins, secluded from the aqueous environment. The great importance of hydrophobic interaction in the determination of tertiary structure has become even more apparent from considerations by Perutz ( 1965) and his colleagues (Perutz et al., 1965), Epstein ( 1964), and others of the amino acid replacements that have occurred FOLDING OF PROTEISS 9 in certain groups of proteins during evolution and as the result of point mutations (for example, in the abnormal hemoglobins). Perutz and his associates point out that, in contrast to the extensive substitu- tion of the less hydrophobic externally situated amino acids in the large series of heme proteins that have been secluenced, a central "core" of nonpolar residues have either remained unchanged or have undergone extremely conservative replacement with residues of closely similar volume and polarity. One must infer that these invariant residues in the sequences are a most important part of the "program" for tertiary structure. Epstein has presented statistics on the heme proteins to- gether with a number of examples of species variants that indicate that replacements generally involve substitution of one amino acid with another of similar polarity. A recent comparison of the sequence of rat pancreatic ribonuclease with the three-dimensional structure of bovine pancreatic ribonuclease-S, which I shall discuss in more detail below, offers a particularly compelling set of results in this connection. We have obtained data in accord with these observations from a study of the influence of changes in the surface stereochemistry and net charge of the ribonuclease molecule on the ability of this protein to regain its native conformation after SS bond reduction and complete denaturation. As referred to above in regard to y-globulins, proteins may he reacted with N-carboxyamino acid anhydrides at neutral pH to yield derivatives containing polypeptidyl chains on the c-amino groups of the majority of the lysine side chains. Using the N-carboxyamino acid anhydride of m-alanine, eight polyalanyl chains, each containing 5-7 residues of alanine, may be attached to pancreatic ribonuclease (Fig. 5) without loss of enzymatic activity. After reduction of the SS bonds of this derivative in 8 ill urea and mercaptoethanol, removal of re- agents, and exposure of the reduced, random chain to air, oxidation causes essentially complete regeneration of enzymatic activity and of the physical properties characteristic of the starting material. These experiments (Anfinsen et al., 1962; Cooke et al., 1963) indicate that, in spite of a large number of bulky polyalanine chains, the folding of the molecule and the formation of the native pairs of half-cystine resi- dues can proceed normally, The interaction of hydrophobic residues to form the internal structure of the protein can thus proceed ef- fectively in spite of the large change in external stereochemistr)l. Similar studies have been performed in which amino groups have 10 CIIRISTIAh- B. ANFINSEN been acylated or succinylated with the replacement of positively charged side chains by uncharged acylamino- or negatively charged succinylamino- groups, once again without destroying the capacity of the reduced derivatives to refold correctly (Epstein and Goldberger, 1963 ) . It is hopeful that the complexity of computer programs now being employed in attempts to calculate tertiary structure of proteins from POLY- DL-ALANYL RIBONUCLEASE FIG. 5. Schematic representation of a fully active polyalanyl-ribonuclease molecule. The crosshatched circles indicate alanyl residues, attached in chains to E-amino soups. the information encoded in amino acid sequences, may eventually be simplified when we learn to detect and employ only those portions of the total information that are essential and sufficient. Results such as those on polyalanyl-RNase would certainly suggest that much of the polypeptide structure destined to become external in the native pro- tein may contribute very little to the thermodynamic forces involved in chain folding and stabilization. Although our catalog of three-dimensional solutions is still quite limited, it would be surprising to find that the structures of the closely FOLDING OF PROTEIKS 11 chemically related proteases, chymotrypsin and trypsin, or of the large number of well studied cytochromes c, are not extremely similar. The same situation might be expected for egg white lysozyme and the ol-lactalbumin of milk whose sequences are remarkably homologous. An interesting analysis of the sequence of rat pancreatic Rh'ase (Beintema and Gruber, 1967) has recently been made by LVyckoff, Richards, and their colleagues (see Wyckoff, 1968) in relation to the three-dimensional structure of bovine RNase-S. The sequences of these two homologous proteins are shown in Fig. 6. When considered in the -,i\~-I,~,-c:,,,- .A,. 1.iis>-I.\,., :.-,I, -rLv-`,`.,,-.,I 1-i.--, /,,-I ~-\.,~-,`~~i-I`\r-~al-Pr~~-~ai-li~~-Pi,c-A;p-Ai~-Si,--`~al FIG. 6. A comparison of the amino acid sequences of rat (above ) and bovine (below) pancreatic ribonucleases. The enclosed area contains the regions of identical sequence ( Beintema and Gruber, 1967; Wyckoff, 1968). context of the bovine geometry, differences in sequences in the rat protein, often occurring in pairs and frequently far separated on the chain, make good sense in terms of structural stabilization. Many of these double replacements appear to permit the retention of interac- tion between neighboring lengths of the polypeptide chain that form stabilized, structural features of the three-dimensional model. For example, the substitutions of arginine and glutamic acid at positions 80 and 103, replacing the neutral serine-asparagine interaction in the bovine enzyme, may help maintain the stability of a loop in the struc- ture, but now by an electrostatic interaction. Other replacements lead to a conservation polarity or specific net charge in certain areas of the surface. Thus, replacement of the hydrophobically interacting methio- 12 CHRISTIAN B. ANFINSEN nine residue 79 in the bovine enzyme with leucine in the rat, involves little change in volume but a definite change in shape. Since the former residue is partly exposed in a pit in the bottom of the three- dimensional model, the change in shape can be accommodated and actually makes room for the extra volume of isoleucine 57 which replaces valine 57 in the bovine protein. Some of the double changes are less understandable when consid- ered in the context of other experimental data. The pair of conforma- tionally neighboring residues, Lys-61 and Gln-74, in the bovine enzyme became Gly and Lys, respectively, in the rat protein. Local charge is preserved by this set of replacements, but an examination of the three- dimensional model does not suggest any more subtle reason for "con- servatism," such as preservation of a stabilizing interaction or the avoidance of a "hole" in the structure. Nevertheless, our studies on polyalanylated RNase, referred to above, show clearly that the t-amino group of lysine61 may be modified by the addition of a chain of 5-8 alanyl residues without interference with either activity or the capacity of the fully reduced polyalanyl-RNase to refold correctly after com- plete reduction and denaturation. Such a modification, although pre- serving net charge, moves the ionized amino group about 26A from the position of the original E-amino group. Intracellular requirements of a more complex nature must underlie the genetic changes that lead to double replacements of this sort; it is clear that we have much to learn about the "design" of proteins in relation to function. EFFECTS OF INTERRUPTION OR MODIFICATION OF GENETIC INFORMATION Since function is a consequence of precise geometry, spontaneous and correct folding of a polypeptide chain might not occur after tam- pering with the integrity of the translated genetic information. It is of interest, therefore, to examine the adequacy of the information for folding in multichained proteins after various limited cleavages. Multichained proteins may be classified as follows: 1. Naturally occurring proteins containing more than one chain resulting from specific in viva cleavage; this group includes, to my knowledge, only two examples-chymotrypsin and insulin. 2. Biologically active multichained molecules derived from single- chained proteins, produced by deliberate experimental cleavage of peptide bonds by protease treatment. This group of man-made deriva- FOLDING OF PROTEINS 13 tives is very small; RNase-S ( Richards and Vithayathil, 1959)) RNase-E (Klee, 1965), RNase-T (Ooi et al., 1963) (Fig. 7), and nuclease-T, -S, and -C (see Figs. 8 and 9). 3. Naturally occurring multichained proteins formed by disulfide bonding of two or more separately synthesized chains-the immuno- logically active globulins. 4. Oligomeric proteins, made up of noncovalently aggregated single chains. This very large group includes a variety of intracellular pro- teins whose multimeric structures permit "allosteric" modifications due to ligand interaction. subtilisin or I