THE STOCHkSTIC METHOD ANE THE STFXJCTUFJ3 3F PFtOTJZINS By Linus Pauline; Mr. President, Ladies and Gentlemen: It is a great honor for me to have been invited to speak at the opening session of the Thirteenth International Congress of Pure and upplied Chemistry, and I express my thanks to the officers of the Con- gress and of the International Union of Pure and Applied Chemistry for y$ c-7 1 il I r? ,x' having extended the invitation to me and to you & 4F for your courteous welcome. @J subject today is the stochastic method and the structure of proteins. Many scientists have been interested in the question hf of the way in which scientific discoveries are made. A popular idea is that scientists apply their powerful intellects in the straightforward, logi- cal induction of new general principles from known facts, and deduction of previously unrecognized conclusions from known principles. This method is, of course) sometimes used; but- the advances in knowledge that are made by it are less significant than those that -2- ,X' . . g' conscious result from mental processes of another sort - in large part submmmsAmm8 processes0 Henri Poincare: in his essay on mathematical creation, said that knowledge of mathematics and of the rules of - he must also logic is not enough to make a man a creative mathematician/ m&x ??? o ?? ? be gifted with an intuition that permits him to select from among the + infinite number of combinations. mathematical entities already known, most of them absolutely without interest, those combinations which will lead to useful and interesting results. He illustrated the role of the subconscious Q describing his investigations of ###&# the F'uchsian ) functions, which he had discovered while working at Caen. He left Caen on a geologic excursion, and for somehime, while traveling,, made no i conscious effort to attack the problem; then one day, as he put his foot on the step of an omnibus, the idea suddenly came to him th t the trans- formations that he had used to define the Fuchsian functions were identical with those of non-Euclidian geometry. Hexerified this eene&u~~&J&on spent a few deys ettheeea 13d.m ra~mbg, w)lil+ walking on the _. -.-. -. -* . .._lI.__.* ,_.._,. The field of the determination of the structure of crystals by the x-ray diffraction &me- method is provides interesting il- lustrations of the ways that scientific progress is achieved. Work in this field consists in the solution of individual, largely unrelated problems - the determination of the mti atomic arrangement of individual crystals. If the atomic arrangement (the structure) is sufficiently simple from the observed x-ray diffraction pattern it can be determine& straightforward, completely logical arguments. The procedure developed by Fllshikawa, Wyckoff, and Dickinson before 1920 consisted in the tabulation, with use of the theory of space groups, of all atomic arrangements compatible with the symmetry and size of unit required to account for the x-ray pattern, and the rigorous elimination with use of the observed x-ray intensities (especially of QUalitatim inequalities in intensity of pairs of diffraction maxima) of all of these atomic arrangements except one, which was then accepted as the structure of the crystal. During the dosen years after the determination of the first crystal structures by W. L. Bragg and W. H. Bragg most structure determinations were made in this way, It is well known that the electron distribution in a crystal can be expressed as a thre&iimensional Fouri-er series in which the coefficients of the various Fourier terms are proportional to the square roots of the intensities of j;hm corresponding diffraction maxima. The hx electron- JNJ .' distribution function depends, a however, on the phases of the Fourier generally a,::Tlicable experiments1 terms, and there is no/m* method of determining these phases. For a crystal of even moderate complexity, such as an amino acid, the number of possible atomic errsngements provided by the theory of space groups is so great that the exhaustive consideration of them and elimination of all but one by use o P f / the observed x-ray intensities cannot be cerried out even with the aid of electronic calculating ma&fines, and the attack on the gmx$jrn problems presented by these crystals must be made in other ways. During the first few years of my activity as a research man I carried through a number of structure determinations by the rigorous method, in collaboration with my teacher Roscoe G. Bm Dickinson f and L independently. I was deeply interested, however, in more complex substences than those that could be a&@&&in this way, and from 1928 on I A different method of attack to many substances. One evening in 1933, after I had described the new method of crystal strut- ture determination to himf'- and contrastdd it with u the rigorous method, Dr. Karl E rdrk'l Darrow suggested that I call it the stochastic method, and referred me to the introduction of this expression by the chemist Alexander Smith, who had written, in his "Inor- ganic Chemistry",19@9, page l&2, the following: W o ooe When Mitcherlich . discovered that Glauber's salt gave a definite pressure of water vapor, he at once formed the hypothesis, that is, supposition, that other hydrates would be found to do likewise. Experiments showed this supposition to be correct. The hypothesis was at once displaced by the fact. This sort of hypothesis ~XB& WI& predicts the probable existence of certain facts or connections of facts, hence reviving a disused word, J ~T+vm-cr06 we call it a stochastic hynothesig (Greek .; &w&&&Jw, apt to d$vine the i truth by conjecture). It differs from the other kind in that it professes to be composed entirely of verifiable facts and is subjected to verification A , ,, /v cj- -_ as quickly as possible3 /J In the stochastic method of treating very complex guessed crystals a plsusible structure is l-with the aid of hints provided by the observed si.ze of unit and space-group symmetry, ti knowledge of general principles of molecular structure, and the stochastic hypothesis that this is the actual structure of the crystal thereupon is either veri- fied or disproved by the ccmpariaon of calculated and observed x-ray intensities. As a rule, if /$he agreement between observed calculated intensities is excellent 4 the proposed structure may be accepted as the correct one. There is, however, always some possibility that the agreement is fortuitous. This was em- phasized b a discowq that Dr. M. D. &~BH Shappell and I made in 1930, 1 during our study of the structure of bixbyite)(%, Fe),O,, and the C- & i.. 'I ii : ..;, j '< -p f A modification of *the rare-earth m W. H. Zachariasen had in- , vestigated these crystals, and had assigned positions to the 32 metal atoms in the unit cube. 32 During our reinvestigation of the structure we 2 W. H. Zachariasen, Z. Krist., 67, 455 (1928) --e---w L. Pauling and M. D. Shappell, Z. Krist., '75, 128 (1930). discovered that the space group zh7 provides certain pairs of physically distinct arrangemen of x-rays fern all crystallographic planes, so that no unambiguous struc- ? ; * :\ ture determination&&&be made by the consideration of x-ray intensities ?f\ alone. This ambiguity, which has not turned out to be an important one in practice, has been further investigated "'Patterson . /\ I shall content myself with a few examples of the application of the stochastic method. A The mineral enargite, Cu3AsSq, W&S found on x-ray investigation' to have an orthorhombic unit of structure with x 5 = 6.46 8, b = 7.43 $ and c - = 6.18 9. This unit contains six copper atoms, two arsenic atoms, and ---w-w 3 L. Pauline and S. Weinbaum, Z. Krist., 88, 48 (1934). w-----w eight suihfur atoms. The dimensions are closely similar to those of the hexagonal mineral wurtsite, ZnS: the dimensions of a double orthohexagonal unit of wurtsite, containing eight zinc atoms and eight sulfur atoms, -8. 0 are 3= 6.65 f, 2e_ = 7.68 2 , and c, = 6.28 2. This,suggests that the . $4 atomic arranement A s that shown in Figure 1, which results from replacing J one fourth of the zinc atoms in wurtzite by w arsenic and A the remaining three fourths by copper atoms, in such a way as to $ive discr x-ray groups. It was found that the cslculated/intensities for structure well E- 'agree r3ms~Q with the observed %.%, and somewhat improved agreement is obtained by moving the sulfur atoms slightly closer to the 2.23 for arsenic atom and away from the copper atoms, giving the distances Y arsenic- d 'i sulfur and 2.32 !! for copper-sulfur. Although the structure depends upon thirteen parameters, there is little doubt that it is correct. A similar investigation with, t however, different results, was carried out for the mineral sulvanite. 4 Sulvanite, Cu3VS4, has been w---e- % . Pauling and R Hultgren, Z. Krist., 84, 204 (1933). --.__. !,`- -0-0-a- found as a massive mineral in Burra Burra, Australia, and in cleavable mzsses and a few small crystals near Mercur, Utah. It&*& to be 1 i! -...--- .Si,;' -, &.J' 1~ $. 3' I ,. : ," the seventy-two outer atomsAis shared between two complexes; each.;then I F i i !", '% ,&.,;, i. .'- *". A:'~ -3.3 contributes thirty-six atoms per lattice point, which with the m CS@ZUS of forty-five atoms / gives eighty-one atoms per lattice point, or 162 in the unit cube. The comnlexity of this structure may be attributed to the dif- ficulty of fitting complexes with icosahedral symmetry into a crystal with cubic symmetry. It seems not unlikely that the complexity of the r&q&~ intermetallic compound with the simple formula NaCdz is to be attributed to the same cause. I have now been working on the problem of the struc- ? ture of this crystal for thirty years* and the structure still remains --w--w `8 L. Pauling, J. Am. Chem. Sot., 45, 2777 (1923). --"II ___,,_, __-._ *., -a----- -15- undiscovered. The crystal is cubic, with the edge of the unit cube slightly over 30 4. This unit contains about 384 sodium atoms and 768 cadmium atoms. The attempt to determine its structure by rigorous methods would, of course, be hopeless; but I think that the stochastic method will ulti- mutely be successful. mutely be successful. csk Dr. csk Dr. d let's put in a d let's put in a 3 figurecif they 3 figurecif they The power of the stochastic method is illustrated by its recent application to the p roblem of the configuration of polypeptide chains in proteins. The history of this application is illuminating also in showing that an investigator who strives to anply the method must have confidence in himself. In 1937, after I had become interested in proteins, and had carried out a number of experimental studies of their properties (especially the magnetic properties of with Dr. alfred Mirsky, had formulated a general theory of the structure and process of denaturation 1 of proteins , I spent several months in an unsuccessful effort to apply the stochastic method in the discovery of an acceptable structure for a keratin. .." ..--I ,-' -1 ,/" -w--w 9 % E. MLrsky and L. Pauling, cc%. Pauling and Proc. Nat. Acad. Sci., 22, 439 (19369. C. D. Coryell, Proc. Nat. Acad. Sci., 22, 210 (1936). ~----- -16- It was possible at that time to predict that the amide group in a peptide would be planar, because of the resonance of the double bond between the two positions C-O and C-N, and it was possible to predict the interatomic distances and bond angles, essentially as given in Figure , with reasonable confidence. In addition it was recognized that a stable configuration ,&,g, would - the formation of hydrogen bonds between the N-H group and the oxygen atom of the carbonyl group, with N-H*=*0 distance approximately 2.80 2. 9 No reasonable configuration was found in this ..QSb~:tA cy investigation, however, and in consequence the possibility was considered that the structural parameters of the polypeptide chain might be sig- nificantly different from those predicted from information oktained through I J L .&%'flr-; L .h the determination of the structures of somewhat m sub- stances. At that time no structure determination had been made of any amino acid, simple peptide, or other simple substance closely related to proteins. ?+& colleque rofessor Robert B. Corey ti -t ..,. who had & 5, ----.. . . . --- ^ ."_,-"- Fe ,dfi.'? 2. yenrs earlie; made x-ray photographs of several 3' 1 proteins &[ with R. W. G. ( -...-...- ._.. . _ Wyckof$ and I concluded that we should em- . bark upon a program of precise structurCI determination of these simple -1% steadily substances. This program has been/under way since 1937, and has led precise to the/determination of the structures of crystals of half a dozen amino acids, several simple peptides, and several other simple substances (such as acetylglycine) closely related to proteins. tr Through these investigations it waa found that the planarity of the amide group and the interatomic distances and bond angles of the simple substances and can confidently be expected to apply also to the polypeptide chains in :xoteins. In addition, hydrogen bonds with N-B*.. 0 distance 2.79 i 0.12 !! have been found to be PM universally present. Even after this program was well under way, and WRWI&~EXX&~~,S it was recognized that the structural parameters of the polypeptide chain Gould be reasonably well predicted, there was delay in the application of the stochastic method to the problem of the structure of proteins. This delay probably resulted from the failure of the earlier effort and from the feeling that the chemical complexity of the proteins - their construction from about twenty different kinds of amino-acid residues - 718 - might well indicate a corresponding structural complexity. Then one day in March 1948, while I was h--at my home in Oxford (where I was again serving as Eastman Professor) @ a cold, I decided/to attack the problem of the configur&tion of polypeptide chains, for the first time in eleven years. It occurred to me to make a search for the simplest configurations - those in which all of the amino-acid residues are structurally equivalent. The most general operation that converts an -I-sv --I. ', .__ -J& (&&#yA&>e asymmetric element (xx&xBsx not its 4 -9 mirror image) is a rotation about an axis combined with a translation along the axis. The repetition of this general operation automatically leads to a helix. helical I attempted accordingly to find/configurations of polypeptide chains in- volving planar amide groups with known dimensions, such that suitable Szm%&~& hydrogen bonds were formed. Within an hour, with the aid a A pencil and a piece of paper, I had discovered a satisfactory helical structure. It did not, however, explain B the details of the x-ray diagram of hair and other a-keratin proteins, and nothing more was done along these lines for some months. -19- After my return to Pasadena,Professor Corey and I suggested to Dr. H. R. Branson, a young man interested in the application of mathematics to p chemical problems, that he make a search for other satisfadtory only e . helical configurations. He found/one &~!-a F 2 . / B and in 1951 a description of the two helixes, the a helix 40 was published. ' a----- @a" I- L, PauUng, R. B. Corey, and H. R. Branson, jclaritperar Nat. Acad. Sci., 37,'205 (1951). w PC' )". ^-_r . ,. --.. vfi --s-w- f+F i 6) +a .) Although the a heli "1\ which has about 3.6 amino-acid residues per turn and a pitch of about 5.4 !? , did not explain in an obvious way the skm principal feature of the x-ray diagram of the a-keratin proteins, P a strong meridional reflection with spacing 5.15 A, its predicted x-ray pattern was found to be in excellent agreement with the observed pattern for synthetic polypeptides* Moreover, the general similarity in appear- ance of the x-ray diagram for the a-keratin proteins and that for the synthetic polypeptides gave reasonably strong support to the assignment of the a helix to these fibrous proteins also. The problem of the -2o- P origin of the 5.15 A meridional reflection seemsAt have been solved " I/ by the simultaneous suggestion by F. H. C. Cr!.ck, 46 and Professor SW Corey and me' that in the a-keratin proteins the a helixes are twisted about one another. 46 A detailed structure is that shown in Figures 1 anda; a- it involves yt" seven-strand cable of a helixes, with two additional a helixes in the interstitial positions. -e-----w I' @% F. h. C. Crick, Nature, 170, 1882 (1952)` \2/~~" t - L. Pauling and R. B. Corey, Nature, 171, 59 (1953). ---se -- Reasonably convincing evidence has also been obtained, largely through the work of Peruta and Kendrew, who have studied hemoglobin and myoglobin, and of Riley and Arndt, who have pr?pared radial-distribution a- ;; +(.-f maw & h&l`. f curves of a number of proteins, @O$.$ pr;:.L& &J$? that/globular proteins contain 4- with the configuration of the a helix. In hemoglobin and qyoglobin these a-helix segments lie in approximately parallel orientation to one another. -2l- No detailed information ixxsosxy&x has so far been obtained about the way in which the polypeptide chains make the transition from one a-helix segment to another. Silk fibroin is shown by its x-ray diagram to have a structure which repeats in the distance 7.00 4 along the fiber axis. The antiparallel chain pleated-sheet structure, shown in Figure T , is predicted to have the identity distance 7.00 t along the fiber axis, and in other respects it accounts satisfactorily for the x-rey diagram of silk fibroin; it can be confidently accepted as representing this protein. It is probable that the p-keratin proteins (such as stretched hair) have a closely similar structure, with, however, the x&qqgmr& polypeptide chains &x~mtingxiX HI&zB&&~s&&~ in parallel, rather than antiparallel, orientation. The x-ray diagram of collagen and gelatin is characteristic of these proteins , showing that they have a structure different from that of a- the proteins of the/keratin class, and also different from that of silk and the p-keratin proteins. The stochastic method was used in the formula- 13 a-- tion of a structure for collagen and gelatin , which, however, has since been found to be in disagreement with some of the features of the x-ray diagram. - z-2- Despite much effort that has been expended on the problem, L. Pauling and R. E. Corey, Proc. Nat. Acad. Sci., 37, 272 (1952) -a--- -22- no satisfactory structure for collagen and gelatin has % yet been found. The problem of the structure of collagen and gelatin may be used to illustrate an important aspect of the stochastic method. The first step in the application of this method is to make a gmmxzrx hypothesis - a guess. The second step'is to test the hypothesis, by some comparison with experiment. In general the test cannot be sufficiently thorough to J-L -w&AA provide & proof that the 3qrpuWf9 hypothesis is correct - it may easily be shown that the hypothesis is incorrect, discovery of a significant disagreement with experiment, agreement on h a limited number of points cannot be accepted as verification of the hypothesis. In order for the stochastic method to be significant / the principles used in formulating the hypothesis must be restrictive enough to make the hypothesis itself essentially unique; in other words, e investigator who makes use of this method should be allowed one guess, If he were allowed many guesses he would sooner or later make one that was not in disagreement with the limited number of test points, but there would then be little justification for accepting that I may, however, contend that Professor Corey ---ai -3 ? b collagen, bi -. @i?@Jv Legends for Figures Cl+AsSq. Fig. 1. The structure of enargite,/ The large circles re- present sulfur atoms, N- the small~circles copper atoms, and the small shaded circles arsenic atoms. Fig. 2. The structure of sulvanite, Cu,VS,. Fig. 3. A portion of the structure of zunyite, "~,3Si~20(~, F)#. AlO groups are w represented by octahedra, and SiO, and AlO, groups w tetrahedra, the last being marked Al. Smaller spheres represent oxygen atoms, larger spheres chloride ions. Groups of five tetrahedra and twelve octahedra preserve their ic'entity in the structure. Fip.2. Diagrammatic representation of the configuration . ..--` A of the polypeptide chain in the a helix. Fig.& The a helix. Pig. . 7 At the left, a compound a helix - an a helix whose axis describes a helical configuration. The diameter, shown as about 10 f, J -25. includes the volume occupied by side chains as well as the main chains of the protein. Center, a 'irestrand a-cable. In the proposed structures of proteins of the a-keratin type these ISX%~EE cables are packed together, with compund helixes as hhown at the left in the interstices. At the right, a j-strand rope of a helixes. F%& A cross section of the a-keratin structure, showing the 7-strand a-cables AB6 and the interstitial compound helixes C. The orientation of the cross-section of the cable changes with coordinate along the fiber axis. The central cable is shown in the most unfavorable orientation for the interstitial a helixes. The protein chains are not so nearly circular in cross section as indicated in the drawing, and space is filled more effectively than is indicated. fig* f- Drawing representing the antiparallel-chain pleated-sheet structure. Fig. 4. The structure of the intermetallic compound Hg32(Zn,U)qq. The six drawings)from left to right in the top row and then left to tight in the bottom zow)have the following significance: a central atom surrounded twelve atoms at the points of by/a nearly regular icosahedron;Pfrr the icosahedral group of thirteen atoms twenty atoms at the points of surrounded by/a pentagonal &decahedron; the comyllex of thirt~three atawa surrounded by twelve atoms at the corners of cqicosahedron; the outermost shell of sixty atoms at the corners of a truncated ioosahedron, plus twelve atoms out from the centera of twelve of the hexagons of this polyhedron; packing drawing showing the complex of forty-five atoms plus an outer shell of eevanty-tuo atoms; the structure of the crystal., in which these complexes located about the points of a body-centered cubic lattice share all of the seventy-two atoms of the outermost shell with neighboring complexes.