SUMEX STANFORD UN IVER S ITY MEDICAL EXPERIMENTAL COMPUTER RESOURCE COMPETING RENEWAL APPLICATION RR - 00785 BOOK II COLLABORATIVE PROJECTS AND APPENDIXES Subm itted to BIOTECHNOLOGY RESOURCES PROGRAM NATIONAL INSTITUTES OF HEALTH June 1, 1977 DEPARTMENT OF GENETICS STANFORD UN IVERS ITY SCHOOL OF MED IC INE Joshua Lederberg, Principal Investigator -v-a- -a \-- --, DEPARTMENT OF ' HEALTH, EDUCATION, AN0 WELFARE PUBLIC HEALTH SERVICE -----_--__. - --_-- -__- --.- _- - --._- --- Fon Approved SECTION t O.&B. 60-Ro24y LEAVE BLANK REVIEW GROUP GRANT APPLICATION COUNCIL iMonth, Y8& DATE RECEIVID . I TO BE CQHPLETEL BY PRINCIPAL INVESTIGATOR (Items 1 through land 15AI 1. TITLE OF PROPOSAL (00 not exceed53 typmwituspxu) S U Medical Experimental Computer Resource (SUMEX) I 2. PRINCIPAL INVESTIGATOR 2A. NAME (Last, fim. InitiaJj LEDERBERG, Joshua 28. TITLE OF POSITION Professor and Chairman . 2C. MAI LI NG ADD A ESS fStrrr city. State, imp ad4 Department of Genetics Stanford University Medical Center Stanford, California 94305 (See Instructions) . Department of Genetics 2H. MAJOR SUSDIVISlON &elnstnrctions~ S&o01 of Medicine 7 . Research Involving Human Subjects (See instmctronsl A.a NO B.0 YES Approved: C. 0 YES - Pending Rwim Data TO BE COMPLETED BY RESPONSIBLE AOMINISTRATIVE AUTHOR 9. APPLICANT ORGANIZATION(SI (See lmtructionsl Stanford University Stanford, California 94305 IRS No. 94-1156365 Congressional District'No. 12 10. NAME, TITLE. AND TELEPHONE NUMBER OF OFFICIALIS) SIGNING FOR APPLICANT ORGANIZATION(S) D'Ann B. Downey Sponsored Projects Officer Sponsored Projects Office Telephone Numbs {rl (415) 497-2583 15. CERTIFICATION AND ACCEPTANCE. We, the undenignsd, certi 3.OATES OF ENTIRE PROPOSED PROJECT PERtOO 17'hisappl~ation. FROM ITHR~UGH Stanford University 8. Inventions (Runawl Applicani~ Only - See Intt~tiont) o A.[x1 NO Ea.0 YES - Not prHikx4y reported COVES -Previously rcportea `Y hems 8 rhrowh 13and 155) 1. TYPE OF ORGANIZATION (Check applicaM8itan) OFEDERAL OVATE 0 LOCAL DOTHER Ispaw - Private Non-Profit University 2. NAME, TITLE. ADDRESS, AND TELEPHONE NUMBER OF OFFICIAL IN BUSINE.SS OFFICE WHO SHOULD ALSO BE NDTIFIEO IF AN AWARD IS MADE K. D. Creighton Associate Vice Presfdent - Controller Stanford University Stanford, California 94305 TelephoneNumber (415) 497-2251 3. ENTlF~RGANIZATIONALCONlPONENTTO RECEIVE CREOlf $R INSTITUTIONAL GRANT PURPOSES ~SxInstntctionsJ 01 Sdhool of Medicine 14. EtlTlTY NUMBER (Formerly PHS Account Nuab,t) IRS No. 94-1155365 that the stewmenD hwein are true end Complete to the bm of out knowledpr end xcept, m to my grent mrrdd, the obliprtion to comply with Public Healtti Service tefm_ end conditions in effect al the timrof the award. SIGNATURES ISignrtures required on original copy only. Use ink, "Per"sigrvfvra not bccaptable) N In' 398 (FORMERLY PHS 398) Rev. 'f73 . . The undersigned agrees to accept responsibility for the scientific and technical conduct of the project and for the provision of required progress reports if a grant is awarded as the result of this application. 5/24/77 %p te . Table of Contents -- BOOK II Section 5. BIOGRAPHICAL SKETCHES . . . . . . . . . 6. COLLABORATIVE PROJECT PROGRESS AND OBJECTIVES . 6.1 STANFORD PROJECTS . . . . . . . . . 6.1 .I DENDRAL PROJECT . . . . . . . . 6.1.2 HYDROID PROJECT . . . . . . . . 6.1.3 MOLGEN PROJECT . . . . . . . . 6.1.4 MYCIN PROJECT . . . . . . . . 6.1.5 PROTEIN STRUCTURE PROJECT . . . . 6.2 NATIONAL AIM PROJECTS . . . . . . . . . . . . . * . . . . . . . . . . . . . * . . . I . . . 6.2.1 ACQUISITION OF COGNITIVE PROCEDURES (ACT) . . 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) . . . . . 6.2.3 HIGHER MENTAL FUNCTIONS PROJECT . . . . . 6.2.4 INTERNIST PROJECT . . . . . . . . . . 6.2.5 MEDICAL INFORMATION SYSTEW LABORATORY . . . 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE . . . . . 6.3 PILOT STANFORD PROJECTS . . . . . . . . . . 6.3.1 GENETICS APPLICATIONS PROJECT . . . . . . 6.3.2 BAYLOR-METHODIST CEREBROVASCULAR PROJECT . . 6.3.3 COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS . . 6.3.4 QUANTUM CHEMICAL INVESTIGATIONS . . . . . . . . . . . . . . . * . . . . . . . . . . ? o 1 . . . 41 . . a 41 . . . 42 . . . 76 . . . 81 . . . 84 . . . 108 . . . 112 . . . 113 . . . 118 . . . 128 . . . 132 . . . 138 . . . 144 . . . 158 . . o ??? . . ? 161 . . . 165 Page - . . . 169 Privileged Communication i J. Lederberg TABLE OF CONTENTS BOOK II (continued) 6.4 PILOT AIM PROJECTS . . . . . . . . . . . . . . . . 171 6.4.1 COMPIUNICATION ENHANCEMENT PROJECT . . . . . . . . . 172 6.4.2 AI IN PSYCHOPHARMACOLOGY . . . . . . . . . . . . 179 6.4.3 ORGAN CULTURE PROJECT . . . . . . . . . . . . . 189 6.4.4 NEUROPROSTHESES PROJECT . . . . . . . . . . . . 191 6.4.5 MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS - . . . 194 6.4.6 PUFF/V:4 PROJECT . . . . . . . Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix II AI HANDBOOK OUTLINE . . . . . . . Appendix III SUMMARY OF MAINSAIL LANGUAGE FEATURES . . Appendix IV MICROPROGRAMMED MAINSAIL PLANS . . . . Appendix V AIM MANAGEL4ENT COk@lITTEE MEMBERSHIP . . Appendix VI USER INFORMATION - GENERAL BROCHURE . . Appendix VII GUIDELINES FOR PROSPECTIVE USERS . . . . . . . . . . . . . . . . . . . o ??? ? 202 . 225 . 231 o 235 o ??? ? 243 . 245 Privileged Communication ii J. Lederberg BIOGRAPHICAL SKETCHES II. RESEARCH PLAN - BOOX II This is an application for renewal of a grant supporting the Stanford University Medical Experimental computer (SUi4EX) research resource for applications of Artificial Intelligence in Medicine (AIZ). The research plan has been divided into several logical parts: 1) Book I - -- Resource research objectives and rationale, progress report, and detailed research plans. 2) Book II - -- Biographical sketches, collaborating project reports and plans, and supporting appendixes. 3) Budget - First year budget detail, five-year budget sumary, and budget explanation and justification. 5 BIOGRAPHICAL SKETCHES The following are biographical sketches for all professional personnel contributing to the SIJNEX-AIM resource project. These do not include sketches for individual collaborating project investigators. Privileged Communication J. Lederberg FEIGENBAUM, Edward A. PIACE OF BIRTH icity, Sms, t2~~nny.J Xeehawken, New Jersey, U.S.A. Professor and Chairman Computer Science Department 1 Zc3ES'cNT NATIONALITY (If non-US ~iriren, Lex January 20, 1936 I itzdic3te kind of visa 3rd axpintion dJtd U.S. citizen ) I3 MaIs 0 Femala SCIENTIFIC FIELD Electrical Engineering EOlJCATlON (1qin sith bsccataursat3 tminino and includs sxu-rdxtorslj I YEAR CONFERRED lNSTlTUTiON AND LOCATION Carnegie Institute of Technology, Pittsburgh, Pennsylvania Carnegie Institute of Technology, Pittsburgh, Pennsylvania * DEGREE B.S. Ph.D. 1956 1959 Industrial Administration MAfOa RESEARCH 1NTEREST Artificial Intelligence RESEARCH SUPPOF)T &uinrtrvctions) (See continuation page.) ROLE IN PROPOSED PFlOl.ECT Co-Investigator 1976 - present 1976 - present 1969 - present 1965 - 1968 1965 - 1968 1964 - 1965 1960 - 1963 1961 - 1964 1960 - 1964 1965 - present 1968 - 1972 Professor (by Courtesy) Department of Psychology, Stanford University Chairman, Department of Computer Science, Stanford University Professor of Computer Science, Stanford University Associate Professor of Computer Science, Stanford University Director, Stanford Computation Center, Stanford University Associate Professor, School of Business Administration, University of California, Berkeley Assistant Professor, School of Business Administration, University of California, Berkeley Research Appointment, Center for Human Learning, University of California, Berkeley Research Appointment, Center for Research in Management Science, University of California, Berkeley Editor, Computer Science Series, McGraw-Hill Book Company, New York Member, Computer and Biomathematical Sciences Study Section, NIH Professional Societies: American Psychological Association, Advancement of Science, American Association for the National Council of ACM, Association for Computing Machinery (member, 1966-68) Consultantships: Information Sciences Institute, University of Southern California; The RAND Corporation; System Development Corporation (knowledge-based systems project); Systems Control, Inc. (HASP project) PUBLICATIONS (See continuation page.) tclH 393 (FOilllHLY PHS 333) 3 Firu. 1/73 Privileged Communication BIOGRAPHICAL SKETCH - FEIGENBAUM, Edward A. Joshua LEDEREERG RESEARCH SUPPORT 1. Contract No.: Title of Project: Grant Agency: a. Project Period: Annual Funding: % of Effort: b. Proposed Renewal: Annual Funding: 5 of Effort: 2. Grant No.: Title of Project: Project Period: Annual Funding: $ of Effort: Grant Agency: 3. Grant No.: Title of Project: Project Period: Annual Funding: 5 of Effort: Grant Agency: 4. Grant No.: Title of Project: Project Period: Annual Funding: $ of Effort: Grant Agency: Proposal Submitted: Title of Project: Project Period: Annual Funding: f, of Effort: Grant Agency: DAHC-15-73-C-Q435 Heuristic Programming Project ARPA 7173 - 7177 $ 225,762 90% summer, 43; academic year 8177 - '3179 $ 375,000 (3/77-91731, $ 350,000 (10/78-g/79) 335 summer 1377, 1715 academic year 1977-78, 100% summer 1973, 135 academic year 1978-79 RR-00512 Resource Pielated Research - Computers and Chemistry (DENDHAL) 5177 - 4/33 $ 213,580 (5/77-4/73) (Direct Costs) 5% (no salary) NIH MCS 74-23kSl Automation of Scientific Inference: Heuristic Cozpusing Applied to Protein Crystallography z/75 - 4/79 + 6 mos. e 75,000 5% (no salary) NSF Plcs 75-I 16$9 MOLGEN : A Cooputer Science Application to Molecular 2enetizs 6/76 - 5/73 -k 5 rsos. $ 55,350 10% academiz year, 100% summer (2 mos. 1977) NSF Biomedical K:no-mag~~~I iormrr for tih prnonl WAhlZ TITLE - BlRTHDA E {hIa, E.~Y, Yr.) JOHNSON, Suzanne M. Scientific Programmer November 26, 1944 PLACE OF DIRTH iCi.y, Sr3r.9, Cwntry) PRESENT NATIONALITY I/f non-US cirirmn, indicsps kind of via snd erpintion dat9] Si,X Pleasantville, New York, U.S.A. U.S. citizen EOUCATION (3qin Gth baccalaufhlta trG?iy Jnd inclvdspastdoctor?lj 0 Male m Famala _~ ~-- INSTlTUTlON AND LOCXTION SCINTlFlC FIELD University of Arizona, Tucson B.S. 1966 Chemistry HONORS MAJOR RESEARCH INTEREST Computer applications in medicine and chemistry RESEARCH SUWORT (Sesinrt~vctian~) ROLE IN PROPOSED PROJECI- Applications Programmer RESEARCH AND/OR PROFESSIONALEXPERI~~CE fSmrticg tirhppr.sEentporition,fit P-w `n . rep 3rd ?~pstince relevsnt to sr*a of projxr Lid sl} ormost~~nhti*4pvbIiutio~ 00 not azcaed3pages for each irxMdwl.) 1974 - present Scientific Programmer, SUMEX Computer Project, Department of Genetics, Stanford University 1973 - 1974 Scientific Programmer, Center for Radar Astronomy, Stanford Electronics Laboratories, Stanford University 1971 - 1973 Research Assistant (crystallographic studies/computer data reduction), Department of Chemistry, University of Iowa, Iowa City 1970 - 1971 Engineer, Geochemistry Section, Lockheed Electronics, Houston, Texas 1966 - 1969 Research'Assistant (x-ray crystallographer), Department of Chemistry, University of Illinois, Urbana PUBLICATIONS (See continuation page.) Trivileged Communication Joshua LEDERBERG n X&RAPBiCAL S,YETCH - JOHNSON, Suzanne M. PUBLICATIONS 1. Johnson, S.M., Newton, EI.G., Paul, I.C., Beer, R.J.S. and Cartwriqht, D .: The Molecular Structure of an Unsymmetrical Sa-Thiathiophthen. Chen. Commun., 1170, 1967. 2. Johnson, S.r/I., McKecknie, J.S., Lin, 9. T-S. and Paul, I.C.: Crystal Structure of Bullvalene at 25'. J. Am. Chem. Sot. 89:7123, 1967. 3. Johnson, S.X., Paul, I.C., Rinehart, K.L., Jr. and Srinivasan, R.: The rlolecular Configuration of Caldariomyein. J. Am. Chen. Sot. 90:135, 1968. 4. Paul, I.C., Johnson, S.;il., Paquette, L.A., Barrett, J.H. and Haluska, H.J.: The Piolecular Geometry of ZerivEtives of I&Azepine in the Free and Complexed State. J. Am. Chem. Sot. gO:5023, 1968. 5. Johnson, S.rJ1. and Paul, I.C.: Crystal and Molecular Structure of [15] Annulene. J. Am. Chem. Sot. gO:5555, 1953. 6. Johnson, S.M., Newton, p1.G. and Pailll, 1-C.: Crystal and Molecular Structure of an Unsymmetrical 6a-Thisthiophthen: Single-crystal X-ray Analysis of 3-Benzoyl-5-p-bromo-p?enyl- 2-methyl-thio-5a-thiathiophthen. J. Chea. Sot. (B), 985, 1969. 7. Paul, I.C., Johnson, S-24., Barrett, J.H. and Paquette, L-A.: The Thermal (5 + Q)TT Co-cycloaddition of N-alkoxycarbonylazepins: Crystal Structure Analysis of a Derived Monoaethiodide. Chem. Commun., 6, 1359. 8. Coates, R.X., Barney, R.F., Johnson, S.?4. and Paul, I.C.: The Crystal Structure of Khusimol p-Bromobenzoate. Chem. Comrt~n., 999, 1959. 9. Johnson, S.fiI.. and Paul, I.C.: The Crystal and Kolecular Structure of the Perhydromethiodide of an ilnsyzmet rival M-alkoxycarbonylazepine Diner. J. Chcm. Sot. (B), 1244, i369. 10. Johnson, S.13. and Paul, I.C.: Crystal and Molecular Structure of l-Acetonyl-l-thionia-5-thia-cyclooctane Perchlorate. Tet, Letters, 177, 1969. 11. Leonard, N.J., Golankiewicz, K., !4sCredie, R.S., Johnson, S.M. and Paul, I.C.: Synthetic Spectroscopic Xodels Related to Coenzymes and Base Pairs. III. A l,l'-Trimethylene- Linked Thymine Photodimer of cis-syn structure. J. Am. Chem. Soo. 91:5855, 1969. 12. Sabazky, ?:.J., Johnson, S.r"l., i!?artin, J.C. and Paul, I.C.: Steric Effects in ortho-Substituted Triaryl-ethanes. .J. Am. Chem. Sot. 91:7542, 1969. 13. Johnson, S.pi., Paul, I.C. and !iin-;, ;.S.D.: [Is! Annulene: The Crystal and rtiolecular Structure. J. Chea. Sot. (B), 643, 1970. 14. Johnson, S.L,I., Herrin, J., Liu, S-3. and Paul., I.C.: Crystal Structure of a Barium Complex of Antibiotic I<-537A, Ba(C34H53$)2 H20. Chem. Commun. 72, 1970. 12 Privileged Communication BIOGRAPHICAL SKETCH - JOHNSON, Suzanne M. Publications (continued): Joshua LEDERBERG 15. 16. 17. 18. 19. Johnson, `S.M., Herrin, J., Liu, S.J. and Paul, 1-C.: The Crystal and Molecular Structure of the Barium Salt of an Antibiotic Containing a i-ligh Proportion of Oxygen. J. Am. Chem. Sot. 32:4428, 1970. Gibson, E.K. and Johnson, S.M.: Thermal Analysis-Inorganic Gas Release Studies of Lunar Samples. Proc. Second Lunar Science Conference 2:1351, 1971. Gibson, E.K. and Johnson, S.M.: Thermograviaetric-Quadrupole Mass-Spectronetric Analysis of Geochemical Samples. Thermochimica Acta 4 :49, 1972. Carhart, R-E., Johnson, S-M., Smith, D.E., Buchanan, B.G., Dromey, R.G. and Lederberg, J.L.: Networking and a Collaborative Research Community: A Case Study Using the Ds;JDRAL Program. In Computer Networking and Chemistry (Ed. Peter Lykos), American Chemical Society Symposium Series, ~3. 13, 1975. Levinthal, E-C., Carhart, R-E., Johnson, S.M. and Lederberg, J.: When Computers Talk to Computers. Industrial Research, November, 1975. 13 FAME KAJJLER, Richard Q- . TITLE -- BIRTHOATS {Ma, DPY, Yr.J Scientific Programmer November 4, 1952 Los Angeles, California, U.S.A. U.S. citizen I El ?A313 n Famala EDUCATION {3$n vlrith ~WCJ~JU~MCY :nining gnd incpid3 PQitd'Xtof3fl INSTtTUTlON AND LOCATION I DZGREE YEAR SClENTlFlC CONFERFt50 FIELD Stanford University (1969-72) None -- Electrical Engineering, Computer Science HONORS MAJOR RESEARCH INTEREST Subsystem software ROLE IN PROPOSED PROJECT development, human engineering of user User Consultant programs, user/project communications RESEARCH SU??OI?l'iSkin~mciions) 1975 - present Scientific Programmer, SUMEX Computer Project, Department of Genetics, Stanford University 1975 Computer Programmer, Institute for Mathematical Studies in the Social Sciences (IMSSS), Stanford University PUBLICATIONS (none) {i;`Iy$ the fooNo*kg inicwm-ltion for ~it .xoFadonal panonnai fisted on p+a 3, b-Tinnirtg tnith the ?rkipal Inest.+ lor. USC continuation +yas rrnd follow rhs sdme pww-~f Format fDr sch ,o~~on~ NA?fiZ TITLE * BlHTHDATE (Mcs, Dsy, Yr.1 LEDERBERG, Joshua Professor and Chairman Department of Genetics May 23, 1925 PLACZ OF alaTH (My, Seta, Country) PRESENT NATIONALITY fz'fnon-U.S citir~, SEX ir?clico# kind of vis md axpintion datd Montclaire, New Jersey, U.S.A. U.S. citizen Ki tl3la 0 Fwr?ala EDUCATION (sqin with hwalaur~ta tmining snd inclvd~pcjfdcx.toral~ WSTITUTION AND LOCATION Columbia College, New York College of Physicians and Surgeons, Columbia Univ., New York (1944-46) Yale University HONORS DECREE B.A. Ph.D. YEAR COXi=ER35D 1944 1947 SCIENTIFIC FIELD Microbiology 1957 - National Academy of Sciences 1955 - Nobel Prize in Medicine MAJOR T(ESEAR0-i 1NTEREST Molecular Genetics, Artificial Intelligence IROLE I?.! PROPOSED PROJECT I Principal Investigator (See continuation page.) RESEARCH ANDDR PROFESSIONAL EXPERIENCE {Startim wi~hprzent~sidon,$it :r.tini~qandsxpan'Mcrr~~~~~ tODrY3Ofpro]xf Lirr sfl or mixt ~esan.Kttiva publio~ioru Do not WC& 3 p+ss For tech irxiividcLirl.) 1959 - present Professor and Chairman, Department of Genetics Stanford University School of Medicine 1957 - 1959 Chairman, Department of Medical Genetics University of Wisconson 1947 - 1957 Professor of Genetics University of Wisconsin SELECTED PUBLICATIONS (See continuation page.) Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SXETCH - LEDERBERG, Joshua RESEARCil SUPPORT Funding ----------------------- Current Project Grant 110. Title of Project Year Period -----a------- --------------m----- ----____-- ___________ PERXMAL RESEARCH CO~QiITME!JTS: CA15896 Genetics of Bacteria $ 80,000 $ 464,569 (5/77-4/73 (5177-4182) pendins) NASI-9592 Viking Mission participation $ 20,000 82 572 (5/77-q/78) :4/70-$73) (incl. Indirect Costs) TRIriCIPAL INVESTIJATOR EX OFFICIO: GM00295 Genetics Training $Ill,oop 52.8 368 Grant (graduate (7/77-s/78) :7/7;-b79) research trainins) ~~29832 Genetics Research $265,587 $1,292,113 Project (5177~4/78) (5/74-4/79) NGR-05-029-004 Cytochenical Studies $ 137,500 of Planetary (9/76-l 2/77) Iricroor~anisns 0 (incl. indirect Costs) $ of Grant Effort Agency ------ -v--__ 15 10 20 10 ? PJIH NASA NIH NIB NASA SELECTED PUBLICATIOitiS 1. Lederberg, J.: Topology of ilolecules. In The rlathenatical Sciences (Ed. Committee on Support of Research in the !,lathematical Sciences (Ci;SSIpfS) with George A.M. Boehm), MIT Press, p. 37-51, 1959. 2. Lederberq, J., Sutherland, G.L., Buchanan, B-G., Feizenbaum, E-A:, Robertson, A.V., Duffield, A,:-1. and Djerassi, C.: Applications of artificial intelligence for chemical inference. I. The number of possible organic compounds. Acyclic structures containins C, Ii, 0, and N. J. Am. Chem. Sot. 91:2973-76, Nay 21, 1959. 3. Buchs, A., Delfino, A.B., Duffield, A.X., Djerassi, C., Buchanan, B .z;. , Feigenbaum, E.A. and LederberE;, J.: Applications of artificial intelligence for chemical inference. VI. Approach to a general method of interpreting low resolution mass spectra with a computer. Helvetia Chimica Acta 53:1334-1417, 1973. 4. Lederberz, J.: Use of Computer to Identify Unknown Compounds: The Automation of Scientific Inference. In Biochemical Applications of Mass Spectroaetry (Ed. G.R. Wailer), John Wiley and Sons, New York, P- 193-207, 1972. 18 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - LEDERBERG, Joshua Selected Publications (continued): 5. Lederberg, J.: The freedoms and the control of science - notes from the ivory tower. Southern California Law Review 45:535-614, 1972. 6. Lederberg, J.: The control of chemical and biological weapons. Stanford J. International Studies 7 :22-44, 1972. 7. Lederberg, J.: The genetics of human nature. Social Res. 40 :375-405, 1973. 8. Lederberg, J.: A System-analytic Viewpoint. In How Safe is Safe? - Tne Design of Policy on Drugs and Food Additives, National Academy of Sciences, Hashington, D.C., p. 66-94, 1974. 9. Masinter, L., Sridharan, N., Lederberg, J. and Smith, D.H.: Applications of artificial intelligence for chemical inference. XII. Exhaustive generation of cyclic and acyclic isomers. J. Am. Chem. Sot. 96:7702-7714, 1974. 10. Harris-Warrick, R.M., Elkana, Y., Ehrlich, S.D. and Lederberg, J.: Electrophoretic separation of B. subtilis genes (EcoR1/agarose gel electrophoresis). Proc. Nat. Acad. Sci. U.S.A. 72:2207-2211, 1975. 11. Carhart, R.E., Johnson, S.I'~., Smith, D.H., Buchanan, B.G., Dromey, R.G. and Lederberg, J.: Networking and a Collaborative Research Community: A Case Study using the DENDRAL Programs. In Computer Networking and Chemistry (Ed. Peter Lykos), ACS Symposium Series, No. 19, p. 192-217, 1975. 12. Buchanan, B.G., Smith, D.H., White, W.C., Gritter, R., Feigenbaua, E-A., Lederberg, J. and Djerassi, C.: Applications of artificial intelligence for chemical inference. XXII. Automatic rule formation in mass spectrometry by means of the meta-DENDRAL program. J. Am. Chem. Sot. 98:6168-6873, 1976. 13. Sagan, C. and Lederberg, J.: The prospects for life on Mars: A pre-Viking assessment. Icarus 28:291-330, 1976. 14. Ehrlich, S-D., Bursztyn-Pettegrew, H., Stroynowski, I. and Lederber;, J.: Expression of the thymidylate synthetase gene of the B. subtilis bacteriophage phi-x-T in E. coli. Proc. Nat. Acad. Sci. 73:4145-4199, 1975. 15. Klein, H.P., Lederberg, J., Rich, A., Oyama, V.I. amd Levin, G-V.: The Viking Mission search for life on >Jars. Nature 262:24-27, 1975. 16. Klein, H-P., Hororqitz, N.H., Levin, G.V., dyama, V.I., Lederberg, J., Rich, A., Hubbard, J.S., Hobby, G.L., Straat, P.A., Berdahi, B.J., Carle, G-C., Brown, F.S. and Johnson, R.D.: The Viking biological investigation: Preliminary results. Science 194:99-105, 1975. 17. Chi, N-Y. \I., Ehrlich, S.D. and Lederberg, J.: Functional expression of tuo Bacillus subtilis chromosonal .genes in Escherichia coli. J. Bact., 1977. (In press)' 19 (G&v tb* following infsmrtion for~llprofersionalp~nonnal /isred on p;rga 3, bqinniry ti:h thq Prirxipat Invstiptor. ifs? continustion P;~ES wd follvw rh9 zam3 gwwr~l format ior .sch Pomona P?AVE TITLE ' almimrf w2, by, yr.j LEVINTHAL, Elliott C. Adjunct Professor of Genetics, Dir., Instrumentation Res. Lab. April 13, 1922 ?lACE OF BIRTH /City. State. Counrry) PRESENT NATlONALlTY IIf non-U.S citizen, SEX indicste kind of visa end expiration d.+tsi Brooklyn, New York, U.S.A. U.S. citizen El Filala D Fern&3 EOUCATlON Idmin y&h bxce/surPats trCCr?q ~nd~nclu;;~~ilstd~iof31) INSTITUTION AND LOCATION Columbia College, New York. Massachusetts Institute of Technology Stanford University SCIE;\)TIi1C FIELD Physics Physics and Math Physics and Math Public Service Medal, awarded by NASA, April, 1977, for exceptional contributions to the success of the Viking project ROLE IN PROPOSED PROJECT' Medical instrumentation research RZZMCH SlJ?POF?T {.%ginJ-tmdms) I AIM Liaison (See continuation page.) 1974 - present 1970 - 1973 Associate Dean for Research Affairs, 1961 - 1974 1953 - 1961 1952 - 1953 1950 - 1952 1949 - 1950 1946 - 1948 1943 - 1946 1943 Adjunct Professor, Department of Genetics, Stanford University; Director, Instrumentation Research Laboratory, Department of Genetics, Stanford University Stanford University School of Medicine Senior Scientist/Director, Instrumentation Research Laboratories, Department of Genetics, Stanford University President, Levinthal Electronic Products Chief Engineer, Century Electronics Research Director/Member of Board of Directors, Varian Associates Research Physicist, Varian Associates Research.Associate, Nuclear Physics, Stanford University Project Engineer, Sperry Gyroscope Company, New York Teaching Fellow in Physics, Massachusetts Institute of Technology PUBLICATIONS (See continuation page.) Privileged Communication BIOGRAPHICAL SKETCH - LEV RESEARCH SUPPORT INTHAL, El .liott C. Grant MO. Title of Project Funding ----------------------- Current Project % of Grant Year Period Effort Agency ------e--m ----------- ----^- ---___ ------------- -------------------_ MASl-9682 Viking Mission participation Joshua LEDERBERG NGR-05-020-004 Cytochemical Studies of Planetary Microorganisms m20832 Genetics Research $ 256,587 $1,292,113 8 NIH Project (5/77-4178) (5/74-4179) RR-00612 Resource Related Research-Computers and Chmistry (DERDRAL) SELECTE:D PUBLICATIOi~S AND PAPERS 1. 2. 3. 4 . 5. 5. 7. $ 85,300 $ 175,552 50 NASA (11/75-g/771 (11/75-g/78) $ 137,500 11 NASA (9/76-12/77) $ 213,580 $ 598,399 6 NIH (5/77-4178) (5/77-4/80) Levinthal, E.C.: Detection of Extraterrestrial Life. Professional and Technical Group of Instruaentation and Measureineats of IEEE, April, 1963. Levinthal, E.C.: The Detection of Life within our Planetary Systen. Presented at WESCX, Aqust, 1963. Levinthal, E.C.: The Biological Exploration of Xars. Presented at the Space Technology Laboratory's Invited Lecture Series, Movenber 5, 1963. Levinthal, E.C .: The Biological Exploration of blars. Presented at doffet Field, Fullerton, Los Angeles and San Diezo, as part of the University of California Extension Series Lectures - Horizons in Space Biosciences: Exobiology, April 27-30, 1964. Levinthsl, E.C., Lederberg, J. and Hundley, L.: Nultivator - A Biochenical Laboratory for Martian Experiments. Life Sciences and Space Research II, Ci)SPAR (Comnittee on Space Research), 195!+. Halpern, B., iiestley, J.:J., Levinthal, E.C. a.nd Lederberg, J.: T'he Pasteur Probe: An Asssy for Piolecular Asymetry. Life Sciences and Space Research, CGSPXR (Co,;l:nittee on Space Research), 1956. Levinthal, E-C.: Space Vehicles for Planetary I4isslons. In Biology an;l the Exploration of !qars, gat. Acad. Sci., National Research Council. 22 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - LEVINTHAL, Elliott C. Selected Publications and Papers (continued): 8. Levinthal, E.C.: Prospects for Manned Mars Missions. In Biology and the Exploration of Mars, Nat. Acad. Sci., National Research Council. 9. Reynoids, O., Levinthal, E. and Soffen, G.: The Role of the Scientist in Automated Laboratory Systems. AIAA Paper No. 67-532, 1957. 10. Levinthal, E-C., LederberE, J. and Sagan, C.: Relationship of Planetary Quarantine to Biological Search Strategy. Presented at COSPAR Meetin,% (Committee on Space Research), London, 1967. 11. Sagan, C., Levinthal, E.C. and Lederberg, J.: Contamination of Mars. Science 159:1191-1195, 1968. 12. Levinthal, E.C.: The Role of $folecular Asymmetry in Planetary Biological Exploration. Presented at Gordon Research Conferences, Nuclear Chemistry Section, 1953. 13. Kriss, J.P., Bonner, W.A. and Levinthal, E.C.: Variable Time-Lapse Videoscintiscope: A Modification of the Scintillation Camera Designed for Rapid Flow Studies. J. Nuclear Med. 10:20g, 1359. 14. Reynolds, L/.X., Bacon, V.A., Bridges, J.C., Coburn, T.C., Halpern, B., Lederberg, J., Levinthal, E.C. and Steed, E.: A Computer Operated Mass Spectrometer System. Anal. Chem. 42:1122, 1970. 15. flasursky, H., Batson, R., Borgeson, id., Carr, M., ZcCauley, J., Milton, D., idildey, R. and Wilhelms, D., Murray, B., Horowitz, N Leighton, R. and Sharp, R., Thompson, U., Brizqs, G., Chandeysson: P. and Shipley, E., Sagan, C. and Pollack, J., Lederberg, J., Levinthal, E., Hartmann, ii, McCord, T., Smith, B., Davies, M., de Vaucouleurs, G., Leovy, C.: Television Experiment for Mariner .hiars 1371. Icarus 12: 10-45, 1970. 15. Masursky, H., Batson, R.M., McCauley, J.F., Soderbloa, L.A., Wildey, R.L., Carr, M.H., Milton, D.J., 'rJilhelms, D.E., Smith, B. A Kirby, T.B., Robinson, J.C., Leovy, C.B., Bri.s%s, G.A., Duxbury, I'.;)., Acton, C.H., Jr., I'lurray, B.C., Cutts, J.A., Sharp, P.P., Smith, Susan, Leighton, R-B., Sagan, C., Veverka, J., Noland, M., Lederbera, J., Levinthal, E., Pollack, J.B., Xoore, J.T., Jr., Hartmann, W.K., Shipley, E.N., de Vaucouleurs, G., Davies, M.E.: >!ariner g Television Reconnaissance of Hars and Its Satellites: Treliminary Results. Science 175(4319):294, 1972. 17. Mutch, T.A., Binder, A.B., Huck, F.O., Levinthal, E.C., Morris, E-C., Sagan, Carl and Younq, A-T.: Iaagiq Experiment. Icarus 16:92, 1972. IS. Sasan, Carl, Veverka, Joseph, Fox, Paul, Dubisch, Russel, Lederberc, Joshua, Levinthal, Elliott, Quam, Lyna, Tucker, Robert, Pollack, James 5. and Smith, Bradford A.: Variable Features on Mars: Preliminary Mariner 9 Television Results. Icarus 17:345, 1972. 23 Privileged Communication BICXXAPHICAL SKETCH - LEVINTHAL, Elliott C. Selected Publications and Papers (continued): Joshua LEDERBERG 19 * 20. 21. 22. 23. 24. 25. 26. Levinthal, E.C., Green, W.B., Cuts, J.A., Jahelka, E.D., Johnsen, R.A -9 Sander, M.J., Seidman, J.B., Young, A.T. and Soderblom, L. A.: Mariner 9 - Image Processing and Products. Icarus 18:1088, 1373. Sagan, C., Veverka, J., Fox, P., Dubisch, R., French, R., Gierasch, p *' cum, L., Lederberg, J., Levinthal, E., Tucker, R., Eross, B. and Pollack, J-B.: Variable Features on Mars, 2, Mariner 9 Global Results. J. Geophysical Research 78, No. 20, p. 4153-4196, 1973. Lederberg, J., Feigenbaun, E., Levinthal, F. and Rindfleisch, T.: SUNEX - A Resource for Application of Artificial Intelligence in Medicine. Proc. Ann. Conference, Association for Computing Machinery, November, 1974. Levinthal, E.C., Carhart, R.E., Johnson, S.M. and Lederberg, J.: men Computers Talk to Each Other. Industrial Research 17(12):35-42, 1975. Mutch, T.A., Binder, A.%., Huck, F.O., Levinthal, E.C ., Liebes, S., Morris, E.C., Patterson, W.R., Pollack, J.B., Sagan, C. and Taylor, G.R.: The Surface of Mars: The View from the Viking I Lander. Science 193(4255):791-831, 1975. Mutch, T.A., Arvidson, R.E., Binder, A.B., Huck, F.O., Levinthal, E.C., Liebes, S., Jr., siorris, E.C., Num?edal, D., Pollack, J-3. and Sagan, C.: Fine Particles on Mars: Observations with the Viking I Lander Cameras. Science 194(4260):87-91, 1976. Levinthal, E.C. and Huck, F.O.: Multispectral and Stereo Imaging on Mars. In Astronautical Research 1376 - A New Era cf Space Transportation, Pergamon Press, 1976. Proc. of the XXVII International Astronautical Congress, Anaheim, California, 1976. Mutch, T.A., Arvidson, R.E., Aurin, P., Binder, A.B., Huck, F.O., Levinthal, E-C., Liebes, S., Jr., t4orris, E-C., Pollack, J.B., Sagan, C. and Saunders, R.: The Surface of Mars: The View from Lander 2. Science 194(4271):1277-1283, 1975. 24 (Give :h~ folimwkg inismab'on for ~Gpro~~~sional panonnrt li;t-~I on page 3, bV+$nning with ?h* Prirxi@ Invatige tar. t'sr continuaiion gqac and iollow tha umn,r3n+ml igrmdt f3r czjrh pmonl NAhlE TITLE * alaTHDc\TE fsh, by, yr,) NII, H. Penny Scientific Programmer October 6, 1939 ---1 I PFtESENT NATI%gALITY i/f non-US cirirsn, SEX indica.t3 kind of vin, Srrl eApifXion &tsl Tokyo, Japan lMSTlTUTlON AND LOCATION DEGREE YEAP SCIENTIFIC CONFERREO FIELD Tufts University, Jackson College, Medford, Massachusetts Stanford University B.S. 1962 M.A. 1973 Mathematics Computer Science MAJOR RESEARCH INTERZST ROLE IN PROPOSBD PROJ.ECT Knowledge-based computer systems design Scientific Programmer - AI tool generalization RfSEAaCH SlJ?PORT (&iflfl~tio& (See continuation page.) RESEARCH ANOIX~R PROFESSIO?4AL EXPERIENCE (San@ tithprewntpoStion,lir[ t, pa qini and sxparimce rular~~t to JTLW of pro&t :;r: cl/ or mmt r?p~nhrivepubliwtiom Do mt axcaad3~ for oath irxGidual.) 1976 - present Scientific Programmer, Heuristic Programming Project, Department of Computer Science, Stanford University 1973 - 1975 Associate Investigator for Computer Science, HASP Project, Systems Control, Inc., Palo Alto, California 1967 - 1968 Systems Engineering Advisor, International Business Machines World Trade Asia Corporation, Tokyo, Japan 1962 - 1967 Research Staff Programmer, International Business Machines Corporation, Thomas J. Watson Research Center, 1965-67 Project Leader, Electronic Coding Pad (ECP) System 1965-66 Assistant Manager, Man-Computer Interaction Group 1963-64 Programmer, World's Fair Lexical Processing System 1962-63 .Programmer, applicat.ions ranging from text processing to linear programming problems RECENT PUBLICATIONS (See continuation page.) WIH 39s (FOF(Y33LY PHS 398) Rav. 1/73 25 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - NII, H. Penny RESEARCH SUPPORT Funding ----------------------- Current Project % of Grant Grant No. Title of Project Year Period Effort Agency ------------- -------------------- -----em--- -a.--------- --s--m ------ DAHC- Heuristic Programming (incl. Indirect Costs) ARPA 15-73-c-0435 Project Current: $ 225,762 $ (7/76-7177) (7/73%77) 100 Proposed renewal: $ 375,000 8 725,000 80 (Wi'i'-W78) (8/77-9/n) NCS 74-23461 Automation of $ 75,000 $ 150,200 NSF Scientific Inference: (5/77-4/78)'(5/77-4/7g Heuristic Computing + 6 mos.) 6177) Applied to Protein (incl. Indirect Costs) Crystallography RECENT PUBLICATIONS 1. Feigenbaum, E.A., Nii, H.P., et al.: HASP (iieuristic Adaptive Surveillance Program) Final Report, Vol. I-IV, Technical Report under ARPA Contract bl66314-74-C-1235, Systems Control, Inc., Palo Alto, California, 1975. (Classified document) 2. Engelmore, R.A. and Nii, H.P.: A Knowledge-based System for the Interpretation of Protein X-ray Crystallographic Data. Heuristic Programming Project Memo, HPP-77-2 (also STAN-CS-77-589), January, 1977. 3. Nii, H.P. and Feigenbaum E.A.: Knowledge-based Understanding of Signals. Proc. k1orkshop on Pattern-Directed Inference Systems, iday, 1977. 26 SECTION II - PRI:/ICEGE3 COlwYUMlcATION B1CGB1AP,H1CAL SKETCPi ISive the foof!owing infamedon for el~yrofs~i~nal perronndl lirted on p+e 3, bqinnirg Smith the F'ri~i~al Inv*ti:ator. (I,-9 continuation pagas 3nd follow the z3rn+Sfldmi fofmet fw aach ~?~son.l FInln TITLE ' BIRTHDATZ (;da, Osy, yr.) RINDFLEISCH, Thomas C. Senior Research Associate December lc), 1941 PLACE OF SIRTH i'civ, Siate, Coun.v) PRESENr NATiO%ALITY (If con-US ci?i:?n, SEX indicate kind of vise ati cqimtion date! Osh!cosh, Wisconsin, U.S.A. U.S. citizen @ Ma13 0 Fernala EDUCATlO,Y (YTin tvith beccJleurM;;s riJining dnd includ9pmtdccrOfJf~ INSTITUTION AND LOCATION DEGREE YEA? SCIENTI'IC CONFERREO FIELD Purdue University, Lafayette, Indiana B.S. 1962 Physics California Institute of Technology, M.S. 1965 Pasadena Physics Ph.D, Thesis to be completed; all course HONORS _ work and e-ions completed. Graduated with Highest Honors, Purdue University NSF Fellowship, Caltech Sigma Xi MAJOR RESiARCt-4 INTEREST Computer science ROLE IN PROPOSED PROJECX applications in medical research; image Facility Manager processing and artificial intelligence ~%XiMCH SUPPORT &mi,-,~rttb'on~) RESEARCH ,WO,0R PROFESSIOiVALEXPEFIiENCE M,~~i~m'thpr;~nti;o;i;ion,~ande*pen'encerda*ent toereeofpr3;s: LitiJlI or mcst ~rsen?Hive plrblicetiom Do rat oxcwd 3 gegas for each itxfividaaI.1 Department of Genetics, Stanford University School of Medicine: 1976 - present Senior Research Associate/Director, SUMEX Computer Project 1974 - 1976 Research Associate/Director, SUMEX Computer Project 1971 - 1976 Research Associate - Mass Spectrometry, Instrumentation Research Jet Propulsion Laboratory, California Institute of Technology, Pasadena: 1969 - 1971 Supervisor of Image Processing Development and Applications Group 1968 - 1969 Mariner Mars 1969 Cognizant Engineer for Image Processing 1962 - 1968 Engineer, design and implement image processing computer software PUBLICATIONS (See continuation page.) Privileged Communication BIOGRAPHICAL SKETCH - RINDFLEISCH, Thomas C. PUELICATIOXS Joshua LEDERSERG 1. 2. 3. 4. 5. c cl. 7. a. 9. 10. 11. 12. 13. 1 4 . Rindfleisch, T. and !lillingham, D.: B Figure of Merit Xeasurirq Picture Resolution. JPL Technical report 32-665, September, 1955. Rindfleisch, T.: R Photometric Method for Deriving Lunar Topographic Information. JPL Technical Report 32-785, September, 1965. Rindfleisch, T. and Willingham, D .: A Figure of Merit Measuring Picture Resolution. Advances in Electronics and Electron Physics, Vol. 22A, Photo-Electronic image Devices, Academic Press, 1956. Rindfleisch, T.: Photometric Nethod for Lunar Topography. Photogrammetric Engineering, March, 1966. Rindfleisch, T.: Generalizations and Limitations of Photoclinometry. JPL Space Science Summary, Vol. III, 1967. Rindfleisch, T .: The Digital Removal of Xoise from Imagery. JPL Space Science Summary 37-62, Vol. III, 1970. Rindfleisch, T.: Digital Image Precessinq for the Rectification of Television Camera Distortions. Astronomical Use of Television- Type Image Sensors. NASA Special Ptublication SP-256, 15171. Rindfleisch, T., Dunne, J., Friedcn, !i., Stromberg, W, and Ruiz, R .: Digital Processing of the Mariner 5 and 7 Pictures. J. Geophysical Research, Vol. 75, I!o. 2, Janilary, 1371. Pereira, \J.E., Sumnon~, R-E., Reynolds, W.E., Rindfleisch, T.C. and Duffield, A-N.: The Quan'citation of Beta-Aminoisobutyric Acid in Urine by Mass Fraqmentographv. Clini2a Chimica Acta, 49, 1973. Summons, P-E., Pereira, W.E., Reynolds, W.E., Rindfleisch, T.C. and Duffield, A.M.: Analysis of Twelve Amino Acids in Biological Fluids by Xass Frasmentography. Analytical Chemistry, Vol. 45, No. 4, April, 1974. Pcreira, X.E., Summons, R-E., Rindfleisch, T.C. and Duffield, A . X . : Tne Determination of Ethanol in 31ood and Urine by ?!ass Fragmentography. Clin. Chim. Actz, 51, 1974. Pereira, iJ.E., Summons, R-E., Rindfleissh, T-C., Duffield, A-Y., Zeitnan, 6, and Lawless, J-G.: Stable Isotope i.less FragmentoTraphy: Quantitation and Hydrogen-Deuteriua Exchange Studies of Eight Kurchison k?eteorite Amino Acids. Geochem. et Cosmochirn. hcta, 39, 153, 1975. Dromey, R.G., Stefik, M-J., Rindfleisch, T.C. and Duffield, A.M.: Extraction of plass Spectra Free of 3ack:-;round and NeiThborinT Component Contributions from Gas Chroaato~raphy/:i.ass Spectrometry Data. Analytical Chemistry, 48, 1352, 1375. Smith, D-H., Yeager, W-J., Anderson, P-J., Fitch, ?i.L., Rin:ifleisch, T-C. and Achenbach, H.: Historical Library Search. An Aporoach to Quantitative Comparison of GC/i`/IS Profiles of Complex Mixtures. (Submitted for.publication) 28 SCHULZ, Rainer W. ?L4CE OF i3lRTl-l {City, Srare, Country) Computer Systems Specialist PRESENT ?JATl3NALITY (If non4J.S cibrsn, indicats kind of vk 3rd axpiration clattd /,,:""""ry 29, 1942 Berlin, Germany U.S. citizen EDUC,\ilON iLt-win w'th baccalruraate tkiniw andinclude~~td~i~f. INSTITUTIOX AND LOCATION DZGREE 1 YEAR L CONFERRED ifornia State University, San Jose B.A. ! 1964 Cal SCIENTIiIC FIELD E?athematics, Engineering HONORS Graduated Summa Cum Laude, California State University lWA.!OR RCS5AHCH INI-EREST Computer systems design RESEARCH SUPPORT (.%einstmction~} ROLE IN PROF'OSEO PROJECT System Programmer (See continuation page.) PUBLICATIONS (none) Privileged Coamunication Joshua LEDERSZ:ilG BIOSRAP~1ICAL SKETCH - SCHULZ,, Rainer W. i3ESEARCH AND/OR PfiOFESSIOiJAL EXPERIENCE b/Ork Experience: 1971 - present Institute for ?3athenatical Studies in the Social Sciences 1970 - 1971 (I%lSS) , Stanford University: System 9lanaqer. Responsible for operations of large-scale PDP-10 timesharing system. Manager, system software. Technical evaluation responsibility of software and computer hardblare. System design and systems development. Coaputer Operations, Inc., Costa !+?sa, California: Design of operatinf: system for conputer to be built by COI. Berkeley Computer Corporation, Berkeley, California: Project leader of BCC timesharing software. Guided monitor and peripheral processor software desirgn and iaplzaentation. Coded approximately 505 of basic system. !$rote some micro code for peripheral processors. Scientific Control Corporation, Dallas, Texas: Assisted Project Genie at the University of California, aerkeley, refining, XDS 940 timesharin.5 system. Invplved in desig of SCC 5730 ti3esharir-g software and hardware, particularly resource allocation and memory manasezent. Xerox Data Syste=ls, El Segundo, California: Diagnostic progam2in.g for I/O channels. Design of peripheral hardk;are simulators. Desizn/i.apleaentation of multi-orogramqed system evaluation and diagnostic test for all Sigma coaputers. IEM, San ,Jose, California: Mote an assembler and loader for IBY 1330 and 1130 systems. Assembler ran on a 1401. Wrote dia,gostic programs for process COiltrOl equipment. Assisted engineerins in debugging prototype 1800 and 1130 machines. 30 Privileged Communication IX&RAP~ICAL SKETG - SCHULZ, Rainer W. Heseam!? and/or Professional Experience (continued): Professional Activities: 1975 1974 1974 - 1975 1974 - 1975 1'373 1373 1973 1971 1971 - present - present - 1974 - 1976 - 1973 Intel Corporation, Santa Clara, California: Data processing administrative consultant. System perfomance and hardsrare evaluation. Systorn iinprove3ent proposals. System Control, Inc., Palo Alto, California: Secure systen design. Consultant in system design and coaputer system evaluation. University of Southern California (USC-ECL, USC-ISI), Los Angeles: Consultant in system and adninistrative area reqardinq coaputer operations and system development. Digital Equipnent Corporation, ilarlboro, l:assachusetts: Consultant in system developaent area and marketin? decisions for large-scale system. National Science Foundation, Vashinrr,ton, B.C.: Consultant in technological innovations. Evaluating proposals for tec:mical feasibility. RWL2Ki~~ 'hiG;hly technical projects in computer science area. Co:nputer Curriculum Corporation, Palo Alto, California: Systea consultant and software nanagl-nent of pro;jra:minz staff for small computer systems. University of Hawaii, Honolulu: Lecturer in COzpUter Systea fiesign aad Computer-Assisted Instruction. A:nes Research Center, bloufitaln View, California: Coi?sultant in System Design, and Develcpyent of timesharing syscens for the ILLIllC IV Project. Institute for the Future, ;.!enlo Park, California: Consultant in Computer Systen 9esi.g for Informtion Retrieval System. 31 PLACE OF BIHTH ff3ry. Strte. Country) PifESENT NATIONALITY {Ifnon-U.S citir% Washington, D.C., U.S.A. 1 U.S. citizen I EOUCATiOPI l!?-Tin with b&~ INSTlTUTlON AND LOCATION University of Pittsburgh, Pennsylvania University of Pittsburgh, graduate school (1965-66) DEGREE B.S. None YEAR CONFERRED 1965 I Kl blala [7 Femais ~-- SClENTlrlC FIELD Mathematics Mathematics, Computer Science MAJOR RESEAFICH INTEREST I ROLE lN PAOPOSED P.SOJ.ECT Operating systems H;sEAHCH SUPPo.m- i&9i/?-ptP~ti00J) System Programmer RCSE,J?,RCH ~p..j~a$j PRO'r~SS)ONAtEXPERlsNC~ ISbnj~~thpr-~idnfp3rition,li,~ :r~in;~q3nd~rpa~ancar~#YsntZo3rsrofpru;irct &ins// ormost r3pr3untab'~3pubcationL 00 not oxc88# 3~x5~s fw mch icdiGL4l 1976 - present Head System Programmer, SUMEX Computer Project, 1974 - 1975 1970 - 1974 1968 - 1969 1966 - 1968 PUBLICATIONS Department of Genetics, Stanford University Senior Systems Designer, ILLIAC IV Project, Evans and Sutherland Systems Analyst Supervisor, Computer Center, University of Pittsburgh Computer Specialist, Office of Personnel Operations, Department of the Army, Headquarters the Pentagon Systems Programmer/Analyst, Computer Center, University of Pittsburgh (none) 33 WAhlE VEIZADES, Nicholas PLACE OF BlF1TI-i :Cisy, Stat*, Counwy) _ TITLE ' R&D Instrumentation Research Laos. E*gi< PRESENT PiATIONALlTY (if non4J.S citilan, indicscs kind ~f viza ar;d sr,Sm!ign &tj) Larissa, Greece li%TITUTlOlU AND LOCATION City College of San Francisco, California (1954-55) University of California, Berkeley Stanford University HO&Otis DEGRZE B.S. M.S. . -. . CONiEARED 1 FIELD 1958 Electrical Engineering 1961 Engineering Science MAJOR RESEARCH IN-i-EFIZST ROLE IN PROPOSED PROJSCT Electronic circuit design RESEARCH SUPPOAT Ii%innmdonr) I Electronics Engineer (See continuation page.) RESEARCH AND~R PROFESSION4LEXPERlENCE (Sbrtirx~ &thprssantvition,B rr-,ir,i,~lZYnd3~~3n'dnc9rei3Y3nt :~81~~ofpr~,+ct Li;t.qN or mcst r~~~~~ns'vspwbiicatio~~. Do r&t MC& 3p;rgar Iw each indivitiud.~ 1962 - present Electronics Engineer, Instrumentation Research Laboratories, Department of Genetics, Stanford University 1961 - 1962 Project Engineer, Fairchild Semiconductor (Instrumentation), Division of Fairchild Instrument and Camera Company, Palo Alto, Ca. 1958 - 1961 Senior Engineer, Link Division, General Precision, Inc., Palo Alto, Ca. PUBLICATIONS (none) Privileged Comimnizatim Joshua LEDERBERG tiIERAPHICAL SKETCi-1: - VEIZADES, Nicholas ----_------------------ current Project 5 of Grant Grant No. Title of Project Year Period Effort Agency ------------- ---------s--------m- ---------- ----------- ------ ---___ Rii-006 12 Resource Related $ 213,530 S 598,393 25 NIH Research-Computers (5/77-A/73) (5/77-4/80) and Chemistry (DENDRAL) GM20832 Genetics Research $ 265,587 $1,2Y2,113 18 NIH Troject (5/77-4/?8) (5/74-4/7Y) NGH-05-020-009 Cytochesical Studies s 137,503 7 NASA of Planetary (Y/75-12/77) Microorganisns 36 t.J~r continuation ,C+JCS and foNow tha JOY ,r*neA format for each pcmr~f NAME TITLE ' WILCOX, Clark R. Student Research Assistant PLACE OF BIRTH fCity, Sr.+t9, Country) PRESENT NA'iIOti`IALiTY (If non-US citizm, indica ~9 kind af via ard awpirJ tion da tel Winston-Salem, North Carolina U.S. citizen EDUCATICN [s-&n with baccelaurtwtcl training ant lWTlTUl-tON AND LOCATION DEGREE Duke University, Durham, North Carolina B.S. Stanford University M.S. Stanford University (1973-present) Ph.D. . . . . . . RCIw.3~pclsroocfOf~f1 YEAR SCI~rw-IilC CoNiERRED FIELO 1970 Mathematics 1973 Computer Science (In progress) Computer Science I. HONOitS Phi Beta Kappa, Duke University Graduated Magna Cum Laude, Duke University 3lRTHDATE LVa. Dsy. rr.) May 3, 1948 SEX MAJOR RESEARCH INTEREST Software portability RESEAHCH SUPPORT he instmrions) ROLE IN PROPOSED PROJECT System Programmer RESEARCH ANDIOR PROFZSSlONALEXPERlENCE (Startjq withprss?tposition,& t,~~in.;~and~~?*n'sncertrlwan?to8ra9ofpt~~~c Lirt.M- ormcst rj'prT+wnbtivapubliwtiio~ Do not aucssd3p+~s for c3~h irxfividu4.1 1974 - present Student Research Assistant (MAINSAIL design/implementation), SUMEX Computer Project, Department of Genetics, Stanford University 1970 - present Ph.D. Candidate, Department of Computer Science, Stanford University: 1973-present Research in software portability and directly executable languages under Dr. Michael Flynn 1972-73 Research in complexity theory under Dr. Robert Floyd 1969 - 1970 Undergraduate student, Duke University: 1969-70 Research in symbolic computation under Dr. Robert Caviness, Math. 1969-70 Design/implementation of medical infornation system under Dr. William Hammond, Medicine 1969 Programmer, Computer Center PUBLICATIONS Wilcox, C.R.: MAINSAIL - A Machine Independent Programming System. Proc. Digital Equipment Computer Users Society (DE&S), 2(4):975-979, Spring, 1976. COLLABORATIVE PROJECTS 6 COLLABORATIVE PROJECT PROGRESS AND OBJECTIVES The following subsections report on the collaborative use of the SUMEX facility including the formally authorized projects within the Stanford and AIM aliquots and the various "pilotl' efforts currently under way. These project descriptions and comments are the result of a solicitation for contributions sent to each of the project Principal Investigators requesting the following information: I) Summary of research program A) Technical goals B) Medical relevance and collaboration C) Progress summary D) Up-to-date list of publications E) Funding status 1) Current funding 2) Pending applications and renewals II) Interactions with the SUMEX-AIM resource A) Examples of collaborations and medical use of programs via SUMEX B) Examples of sharing, contacts and cross-fertilization with other SUMEX-AIM projects (via workshops, system facilities, personal contact, etc.) C) Critique of resource services III) Follow-on SWiiX grant period (a/78 - 7/83) A) Long-range user project goals and plans B) Justification for continued use of SUKEX by your project C) Comments and suggestions for future resource goals, development efforts, etc. We believe that the reports of the individual projects speak for themselves as rationales for participation; in any case the reports are recorded as submitted and are the responsibility of the indicated project leaders. 6.1 STANFORD PROJECTS The following group of projects is formally approved for access to the Stanford aliquot of the SUYEX-AIF resource-. Their access is based on review by the Stanford Advisory Group and approval by Professor Lederberq as Principal Investigator. As noted previously, the DENDRAL project wa s the historical core application of SUZEX. Although this is described as a "Stanford project," a significaiit part of the development effort and of the computer usage is dedicated to national collaborator-users of the DENDRAL programs. Privileged Communication 41 J. Lederberg DENDRAL PROJECT Section 6.1.1 6.1.1 DENDRAL PROJECT DENDRAL - Resource Related Research - Computers & Chemistry Carl Djerassi, Principal Investigator Professor of Chemistry Stanford University I. OVERVIEW OF RESEARClI ACTIVITIES - Technical Goals Our research, development and future plans focus on both the question of structure elucidation in general and the problem of providing computer assistance to scientists engaged in specific aspects of this important activity. A simplified representation of major milestones in solving unknown biomolecular structures by manual methods is presented in Figure 1. cbimwalou REARRANGE0 --__ PHYS HISTORY COMPOUNDS NEW STRUCTURAL INFERENCES AND CONSTRAINTS CANDIDATE STRUCTURES I --I i I FINAL STRUCTURES Figure 1. Important steps in manual solution of structures of unknown chemical compounds. These steps, indicated as separate boxes, may be performed explicitly or implicitly. There are considerably more complex relationships among the boxes of Fig. 1 than are indicated when structures are actually solved. Nevertheless, the Figure provides a good introduction to both our recent work and our future directions. We describe briefly each of the milestones in the following paragraphs. More detailed discussions of each topic follow in subsequent sections. J. Lederberg 42 Privileged Communication DENDRAL PROJECT Section 6.1.1 The first step in identification of an unknown structure is to separate it from other components in a potentially complex mixture and to isolate it in reasonably pure form. These steps are performed by scientists, frequently with the assistance of various instruments. Although our research is not directed toward any part of this separation and isolation procedure (except insofar as these procedures also yield data which are subject to computer-assisted interpretation), information about the chemical and physical characteristics of the compound may be crucial to further efforts to determine its structure. Depending on the quantity of sample available and its characteristics, various spectroscopic and additional chemical data are then collected on the unknown. A mass spectrum is frequently obtained, e.g., from a combined gas chromatograph/mass spectrometer (GC/HS) systen. An important part of our recent proposal to the NIH is directed toward automation of combined GC/MS systems operated at high mass spectrometer resolving powers. Data on elemental compositions and relative ion abundances are then available in computer-readable form for further analysis (see MSRANK). The chemist possess an armamentarium of spectroscopic techniques which can be brought to bear on a structure. One advantage of our work is that any data so obtained can be used to help solve the structure as long as it can be expressed, manually or by computer, in substructural statements about the unknown. The next important phase in structure elucidation is interpretation of the available data (Fig. 1) in terms of structural features of the molecule. These interpretations may be in terms of known structural units ("superatoms", polyatomic aggregates of atoms in known configurations), or in terms of structural units, ring sizes, p roton or carbon distributions. The latter set of features represents constraints on the kinds of structures which are possible. Our efforts in the area of computer-assisted data interpretation are focussed on mass spectral and carbon-13 nuclear magnetic resonance (13CPlR) data. We are developing general approaches to automated analysis of these data in terms of structural features of unknowns. Our recent efforts are summarized in Figure 2, and discussed in detail subsequently. We have been concerned with use of these data from two points of view, planning and prediction (Fig. 2). During planning, experimental data are examined in order to extract specific structural information to be used in assembling candidate structures. In prediction each candidate structure is tested to determine how closely its predicted spectrum agrees with the observed spectrum. The candidates can be ranked accordingly. The Meta-DENDRAL research is directed toward determination of rules of spectroscopic data which can be used either for planning or prediction (see below). Given possible structural fragments of the complete molecule and constraints on how these fragments may be assembled into complete molecules, a process of structural assembly follows (Fig. 1). There has been no proven algorithm for solving this problem prior to earlier work supported by the current grant. Traditionally, this process has been left to manual, pencil and paper work. Our CONGEN program, which was designed to solve this problem, is the farthest advanced of programs designed to assist in various aspects of structure elucidation. It performs the structural assembly process, under constraints, and Privileged Communication 43 J. Lederberg Section 6.1.1 DENDRAL PROJECT DATA INTERPRETATION "PLANN 1 NG" PREDICTION EXTRACTION OF STRUCTURAL USE OF SPECTROSCOPIC INFORMATION DIRECTLY FROM DATA TO RANK SPECTROSCOPIC DATA, CANDIDATE STRUCTURES, 1, MASS SPECTRA - MDGGEN 1, MSPRUNE, MSPRED 2, 13CNMR 2, .13CNMR FORMATION 0F RULES TO BE USED FOR BOTH PLANNING AND PREDICTION, Figure 2. Relationship between use of rules in either planning or prediction. Both approaches are used in utilizing data for structure elucidation. J. Lederberg 44 Privileged Communication DENDRAL PROJECT Section 6.1.1 allows the scientist using the program to examine structural candidates and remove those deemed implausible (Fig. 1). A large portion of our recent and future work is directed toward improving the CONGEN program and building other facilities around it (see later sections). We have demonstrated the utility of CONGEN in structural studies, and subsequent sections discuss our recent developments and applications of CONGEN as well as our interactions with other scientists desiring access to our programs. Given a set of structural candidates, the experimenter examines them to determine what experiments might be performed to focus on the correct structure by stepwise rejection of alternative hypotheses. When there are only a small number of possibilities under consideration, manual methods suffice. But COYGEN provides the capability for exhaustive enumeration of structural possibilities at a point in a structural problem when there may be many hundreds of possibilities. It is very difficult to examine these structures and plan experiments by hand. We have begun exploring ways to provide computer assistance to this important aspect of structure elucidation. We refer to this research area as the Experiment Planner, discussed in more detail below. When new experiments have been planned the researcher carries them out and uses the results as additional constraints on the structural candidates (Fig. 1). New experiments may includ e collecting of additional spectroscopic data or performing a sequence of chemical reactions on the unknown. The latter experiments may be chosen to convert the unknown into a related compound which possesses physical or chemical properties more amenable to analysis. During the past year we have developed a program to assist scientists in carrying out representations of chemical reactions in the computer and eliminating undesired structural candidates based on constraints exercised on the products of the reaction. This work is described in two subsequent sections. One section describes use of the program, which we call REACT, to explore structural possibilities exactly as outlined above. A later section describes recent progress in increasing the power of REACT. Medical Relevance Structure elucidation is a fundamental problem for medical practice and biomedical research. For example, we are collaborating with physicians in the Department of Pediatrics who monitor the body fluids of newborn infants in order to detect abnormal compounds. Much of the research leading to new drugs and new methods for synthesizing drugs also depends on careful analysis and identification of molecular structures of compounds. The computer tools that we are developing will aid in the determination of molecular structures by giving working scientists help with data collection, data interpretation, hypothesis testing and, most important, systematic consideration of all molecular structures that are consistent with the interpretations of the available data. Privileged Communication 45 J. Lederberg Section 6.1.1 DENDRAL PROJECT PROGRESS SUMMARY Experiment Planner We have begun preliminary considerations of design and implementation of an experiment planner. This program will assist chemists in designing the most effective set of experiments to perform to solve the structure. Although the experiment planner will be a future activity of our group, we are developing and using other structure manipulation functions which will provide groundwork for future developments. One important aspect of experiment planning is the ability to examine in some way the set of candidate structures. Although many can be drawn for visual review, drawing is impractical when dozens or hundreds of structures are involved. To assist persons using CONGEN in reviewing their structures we have developed a function auxiliary to CONGEN which we call SURVEY. SURVEY FUNCTION: AIDS IN PERCEPTION OF ANY OF A PRE-SPECIFIED SET OF STRUCTURAL .FEATURES IN A GROUP OF STRUCTURAL CANDIDATES, E,G, A) FUNCTIONAL GROUPS B) TERPENOID SKELETONS C> AMINO ACID SKELETONS Figure 3. Function of the SURVEY program and examples of recent application areas. The function of SURVEY is summarized ,in Figure 3. SURVEY simply acts as a reminder to the scientist of the presence or absence of certain structures or structural features. During the past year we have used SURVEY extensively. For example, we have used it to detect implausible functional groups in a set of candidate structures, using a file of substructures representing a wide variety of functionalities. In many problems, implausible functional groups are forgotten and CONGE?I is never constrained to remove them. Another example of use of SURVEY is in conjunction with collaborative work with persons in the J. Lederberg 46 Privileged Communication DENDRAL PROJECT Section 5.1.1 Department of Genetics. In analysis of serum or urinary metabolites in patients of high risk of metabolic disorder, we have had occasion to use CONGEN in exploration of unknown structures [Report HPP-77-111. Some of these structures could formally be conjugates of amino acids with organic acids. If so, such structures will possess backbones of naturally-occurring amino acids. SURVEY was used to provide a summary of which structural candidates possessed such amino acid skeletons. We have recently used SURVEY in a related application involving the structure of "polyalthenoll', discussed by LeBoeuf, et al. (Figure 4). Superatoms and constraints supplied to CONGEN to derive structural candidates are summarized in Fig. 4. We summarize in Figure 5 the structural possibilities which resulted. There are five structures possessing a bicyclo[2.1.1] system, and six which possess a bicyclo[4.3.1] system (Fig. 5, top). These structures are energeticaly less favorable. For example, several possess a double bond at a bridgehead atom, which violates Bredt's Rule. There remain, however, 11 structures which are not formally excluded by data presented by LeBoeuf, et al. Because these workers based their structural assignment on biogenetic grounds, we used SURVEY and REACT to test their hypothesis. We have, in computer-accessible libraries, known terpenoid ring systems which can be used within SURVEY to test sets of structures for known skeletons. None of the 22 structural candidates possesses a previously known skeleton. Because the authors postulated a relationship to a known skeleton via a single methyl shift, we used REACT to exercise a single methyl shift in all possible ways on each of the 22 candidates. SURVEY was then used to test the results for the presence of known terpenoid systems, and the drimane skeleton, the postulated precursor of polyathenol, was the only known skeleton which resulted. This does not prove the hypothesis of LeBoeuf, et al., but certainly helps strengthen it. SURVEY is, however, only the barest beginning of an experiment planner, even though it has proven useful. We plan to build from this beginning toward a much more powerful system. Privileged Communication 47 J. Lederberg Section 6.1.1 DENDRAL PROJECT M. LeBoeuf, M. Hamonn&re, A. Cave/, H. Gottleib, N. Kunesch, and E. Uenkert, Tet. Lett., 3559 (1976). "POLYALTHENOL" C23H31N0 ARsrrRARV NUMBER.. FV Y / CH-FV CH3-!-\H-CH2-CH=C \ ,F-FV I 6H CH3 CH; FV NE CH2 CH3-FV FV-CH2-FV Y FV-CH-FV IN BI CH 1 1 1 3 1 1) ALL FREE VALENCES BONDED TO NON-HYDROGEN ATOWS 2) GOODLIST IN-CHZ-BI . 1 TO ANY (EVENTUALLY IN-CH2-CHO,$ ME-(BIJH) 1 TO ANY (EVENTUALLY CH3-CH, EXACTLY 1) 3) GOODRINGS 2 EXACTLY 5 41 BADRINGS 3 Figure 4. Superatoms and constraints supplied to COXGEN in investigations of plausible structural alternatives to the proposed structure of Polyalthenol. J. Lederberg 48 Privileged Communication DENDRAL PROJECT Section 6.1.1 (51 OH t HO IN OH ,OH HO CHp-l \ @ OH HO Figure 5. Structural candidates for polyalthenol based on data given in Figure 4. Privileged Communication 49 J. Lederberg Section 6.1.1 REACTION CHENISTRY DEVELOPNENTS DEMDRAL PROJECT 1, SEPARATION FROM COMGEN - COMMUNICATION VIA FILES OF STRUCTURES, 2, ADDING CONSTRAINTS - SITE - AND TRANSFORM - SPECIFIC, 3, CONTROL STRUCTURE - RAMIFICATION A, ESTABLISH RELATIONSHIPS AF~IONG PRODUCTS AND REACTANTS 3, DEAL PROPERLY WITH RANGES OF NWBERS OF, PRODUCTS 4, INTERACTION - DEVELOP MANIPULATION CO:~MANDS WHICH PARALLEL LABORATORY OPERATIONS, E,G., SEPARATE INTO FLASKS, TEST CONTENTS OF VARIOUS FLASKS, INCOWLETE SEPARATIONS, ETC, 5, REPRESENTATION OF REACTIONS 6, PROSPECTIVE DETECTION OF DUPLICATE PRODUCTS BASED ON SYMMETRY PROPERTIES OF: A) STARTIbr'G MATERIAL; AND B) TRANSFORMATION, Figure 6. Current and future direction for improvement and extension of REACT, a program for exploration of applinations of reaction chemistry to structure elucidation problems. J. Lederberg 50 Privileged Communication DENDRAL PROJECT Section 6.1.1 Applications of REACT to Structure Elucidation Problems We have recently described our initial efforts toward representation of chemical reactions and their use in structure elucidation problems [Report HPP- 76-51. These efforts provided the framework for carrying out reactions within the computer which emulate actual laboratory reactions performed on a unknown. Constraints on the numbers and identities of the products are used to constrain the reaction products and, implicitly, the starting materials. Based on the results of that work we drew up a set of steps to be carried out to provide a truly useful tool for the chemist. Although the current program can be used in applications to real problem s it has some fundamental limitations which we have been working to solve. The developments we have undertaken to improve REACT are summarized in Figure 6. We first undertook to separate REACT from CONGEW, for two reasons. One reason was due to program size. Many functions of CONGEN are not needed in REACT and become unnecessary when only REACT is being exercised. The procedures of structure generation (CONGEN) and REACT are sequential and a separate program introduces no problems. A second reason was the different uses of certain CONGEN functions in REACT. For example, the ways in which the graph matcher is used are different between the two programs, necessitating keeping two different versions around with the programs together. The separation has been accomplished. The current version of REACT is now a separate program. It communicates structural information with CONGEN via files. All interactive portions are consistent with the structural manipulation functions of CONGEN so that learning the structural language of CONGEN is sufficient to use either program. We have also added new constraint types to the reaction to expand greatly the ways in which reactions can be defined and constrained. An example of new extensions to reaction definitions illustrates some of the new features (Figures 7-10) . The reaction defined here is one which will perform a dehydration of an alcohol; the site of the reaction is defined in Fig. 7. The transform is defined as cleavage and loss of the oxygen resulting in formation of a double bond between the two carbon atoms of the original site (Fig. 7). In this particular dehydration the chemist wished to specify a site- specific constraint. It was known that a tertiary butyl group was part of the structure, and the dehydration will be prevented if that group is in close proximity to the reaction site (i.e., in a position alpha to the carbinol carbon). The definition of this constraint is given in Figure 8. Subsequently, this constraint ("HINDERED") is placed on BADLIST for constraints specific to the site as shown in Fig. 9. The completed definition of the reaction is summarized in Figure 10. Privileged Communication 51 J. Lederbeq Section 6.1.1 DENDRAL PROJECT :EDI?-REACT NAME:DEHYDRATION (NEW REACTION> *SITE >CHAIN 3 .ATNAME i 0 .HRANGE 1 1 1 3 13 .ADRA'rl DEHYDRATION: (HRANGES NOT INDICATED> O-C-C >DONE *TRANSFORM dlMJOIN 1 2 >JOIN 2 3 >DELATS 1 PADRAW DEHYDRATION: (HRANGES NOT INDICATED) c=c >DONE Figure 7. Definition of reaction site and chemical transform in REACT. J. Lederberg 52 Privileged Communication DENDRAL PROJECT Section 6.1.1 *DEFINE-CONSTRAINTS :? PiEASE ENTER ONE OF: GRIPE BUGOUT TRANSFORMSPECIFIC GENERAL(G) DONE SITESPECIFECW HALT :SITESPECIFIC NAME: HINDERED (NEld CONSTRAINT) WARNING: THE FINAL CONSTRAINTS MUST HAVE AT LEAST ONE ATOM OF THE SITE) .NDRAW HINDERED: (HRANGES NOT INDICATED) NON-C ATOMS: 1 0 l-2-3 'BRANCH 3 2 4 1 4 I >ADRASJ HINDERED: (HRANGES NOT INDICATED) c o-c-c-i-c C >DONE Figure 8. Definition of a site-specific constraint to be applied to the reaction DEHYDRATION. Privileged Comnunication 53 J. Lederberg Section 6.1.1 DENDRAL PROJECT "CONSTRAINTS :? PLEASE ENTER ONE OF: GRIPE BUGOUT ST FOR CONSTRAINTS ON STARTING MATERIAL S FOR SITESPECIFIC CONSTRAINTS T FOR TRANSFORMSPECIFIC CONSTRAINTS PR FOR CONSTRAINTS ON PRODUCTS DONE HALT :S >BADLIST BADLIST CONSTRAINTS CONSTRAINT NAME:HINDERED CONSTRAINT NAME: w--e--- >DONE :DONE Figure 9. Specification of constraint naned HINDERED as a BADLIST constraint for tne reaction. J..Lederberg 54 Privileged Communication DENDRAL PROJECT "SHO'III --- SITE: NAME=DEHYDRATION ATOM# TYPE ARTYPE NEIGHBORS HRANGE 1 0 NON-AR 2 l-1 2 C NON-AR 1 3 3 C NON-AR 2 1-3 DEHYDRATION: (HRANGES NOT INDICATED) NON-C ATOMS: 1 0 l-2-3 TRANSFORM:. UNJOXN 1 2 JOIN 2 3 DELATS 1 DEHYDRATION: (HRANGES NOT INDICATED) 2=3 CONSTRAINTS: CONSTRWiTS ON STARTING NO CONSTRAINTS SITE-SPECIFIC CONSTRAINT ------- BADLIST CONSTRAINTS NAME HINDERED ---a--- MATERIAL: 'S : TRANSFORM-SPECIFIC CONSTRAINTS: NO CONSTRAINTS CONSTRAINTS ON PRODUCTS: NO CONSTRAINTS "DONE (DEHYDRATION DEFINED) (DEHYDRATION ADDED TO THE REACTION LIST) Section 6.1.1 Figure 10. Summary of the completed definition of the DEHYDMTION reaction. Privileged Comaunication 55 J. Lederberg Section 6.1.1 DENDRAL PROJECT The remaining items summarized in Figure 6 are currently under development. We are redesigning the control structure so that the scientist using the program can use intuitive concepts as commands, such as separation. To carry this out important parts of the current mechanism have to be redesigned. Although the current program can be used effectively, its non-intuitive approach to dealing with reactions yielding multiple products and subsequent separation (within the computer) and analysis of each product presents a barrier to use by a wider community. We are continuing to develop our capabilities for representing reactions to ensure that the user of REACT has a complete descriptive language with which to specify reactions. We continue to study ways to avoid duplication in carrying out reactions. We know how to implement certain of the symmetry- related constraints and will do so shortly. CONGEN Developments The problem solving paradigm that has emerged from DENDRAL work is the so- called flplan-generate-test" paradigm. It is based on heuristic search of a space of possible hypotheses with planning before generation of hypotheses and testing of each generated candidate. The generator for DENDRAL, named CONGEN, is a general-purpose graph generator which produces a list of all possible graphs containing specified numbers of nodes of various types. The most important features of the generator are that the list of graphs is guaranteed to be complete and non-redundant and, equally important, that the list need not be exhaustively generated. The generator can be constrained to produce only graphs that meet specified criteria that are inferred from the initial problem data. During the past year, CONGEN has developed along two major lines: 1) tools have been developed which will allow more efficient and "intelligent" use of substructural information supplied by the chemist; and 2) data from chemical reactions and from observed mass spectra can be used to eliminate unlikely structural candidates from a set produced by a CONGEN generation. These extensions will be discussed below. 1) Intelligent use of substructural information as constraints There is soaetimes a significant conceptual gap between the intuitive chemical phrasing of a CON""' ",lJ problem and the phrasing which is most efficient, in both computer time and storage requirements, for the program, CONGEN provides a rich language for stating structure elucidation problems in precise substructural terms. However, there are usually many ways of defining a given problem and different definitions can place widely different demands upon the program. We have a continuing interest in reducing this conceptual gap by in making CONGEN responsible for rephrasing a problem in the most efficient way, thus freeing the chemist to concentrate upon the chemical, rather than the algorithmic, aspects of a given case. One distinction which is frequently puzzling to new CONGEN users is the one between superatons and GOODLIST items. A superatom is a polyatomic "building blocV which CONGEN joins with other superatoms and single atoms to form full J. Lederberg Privileged Communication DEHDRAL PROJECT Section 5.1.1 structures. GOODLIST items are substructures which are required to be present in those full structures, but they are not incorporated directly into the initial phrasing of a problem as are superatoas. Rather, their presence or absence is tested by a graph-matching routine after the structures are produced. Frequently, a great many structures produced by the structure generator are discarded by this final test and a significant amount of the pro,gram's time can be spent "shooting blanks". The concepts behind these two types of constraints - that specified substructural features must be present - are similar, but their implementations differ substantially in efficienzy. GOODLIST items cannot simply be transferred to the superatom list, though, because GOODLIST items are allowed to share atoms and bonds with other GOODLIST items or with superatoms. For example, if two substructures which are benzene rings are placed on GOODLIST, then a naphthalene derivative will be an acceptable structure even though the two occurrences of the ring have two atoms and one aromatic bond in common. Because of the building-block nature of superatoms, they may be joined to one another by additional bonds in CONGEN, but never l'merged" (i-e, overlapped). Thus the price of efficiency is a more restricted interpretation of structural possibilities for superatoms. We have developed a new procedure which captures the best of both situations. In order to incorporate a GOODLIST substructure into the problem at the earliest stage, it is necessary to find all unique ways that the given substructure can be created using parts of the existing building blocks (atoms and superatoms). This produces a set of neu CONSEN problems with more or larger superatoms, each of which is easier to solve than the original one because the GOODLIST item is built-in and needs not be tested. Figure 11 shows schematically some of the ways this construction might occur: a) by bonding together two (or more) existing superatons to create one larser one; b) by bonding additional atoms to a superatom to create a larger one; and c) by constructing a copy of the substructure from single atoms, creating a new superatom. Figure 12 summarizes a CONGEN problem which was attempted but which could not be completed because of the unintelligent use of GOODLIST. The problem amounts to finding all ways of allocating three new bonds to the free valences (the bonds with unspecified ternini) in the superaton CEMB such that the three indicated substructures are present in the final molecules. There are perhaps 19,000 unique allocations of those three net! bards, but only 7 pass the GOODLIST tests. Using GOODLIST as a post-test only, COXW would generate all 10,000 and discard nearly all of them, a process which uould have been so lengthy that it was never completed. The constructive graph-matchin g routine approaches the problem in a much more efficient and chemically intuitive way: 1) there are only three places in which the first GOODLIST item can be constructed; 2) for each of these, there are four ways of constructing the second; and 3) for each of these, there are 0, 1 or 2 ways of incorporating the third. It quickly arrives at the correct set of solutions. Most CONGEN problems contain one or more GCODLIST items which can be processed in this way, and when the constructive graph-matcher is fully integrated into CONGEN, it will make a substantial difference in its ability to use this structural information effectively. Privileged Communication 57 J. Lederberg Section 6.1.1 DENDRAL PROeJECT Cemb: + H7 GOODLlST: I CHzvC=CH-CH;!- Cby=$'H-YH- Figure II. Example of breaking one GOODLIST substructure into several subproblems for COi\iGEN, each with different superatoms. J. Lederberg 58 Privileged Communication DENDRAL PROJECT CONGEN PROBLEM GOODLIST ENTRY CONSTRUCTIVE SUBSTRUCTURE SEARCH 000 ETC, ! Section 6.1.1 A qH2 fCH23i H2 / FH2 / ETC, Figure 12. Example showing the inefficiency of specifying a constraint as a GOODLIST item instead of analyzing its implications for constructing allowable chemical graphs. Privileged Communication 59 J. Lederberg Section 6.1.1 DENDRAL PROJECT 2) New tools for post-pruning CONGEN structures. From aa algorithmic standpoint, CONGEN is successful if.it can, in a reasonable amount of of time and without exhausting storage resources, produce a list of candidate structures satisfying the chemist's constraints. However, this list is often quite large , perhaps several hundred structures, and from a chemical standpoint the problem may be far from complete. It remains for the chemist to discriminate among the candidates, eventually reducing the possibilities to just one structure. A SURVEY function is available for classifying the list into groups of chemically related structures using either pre-defined or user-defined libraries of substructural features, and this process can help the chemist perceive groups which might easily be ruled out by additional experiments. Also, the graph-matching (pruning) mechanism of CONGEN allows him to express, in terms of substructural tests on the candidates, new data which he qathers on the unknown. These are both important aids in dealing with a list of candidates, but are restricted to tests which can easily be phrased purely in terms of structural features of the candidates themselves. There are two informative sources of data which cannot always be phrased in this way: 1) structural features observed in products of the unknown when it undergoes simple chemical reactions; and 2) empirical spectroscopic measurements on the unknown which cannot be interpreted unambiguously in precise structural terms. During the past year, we have made progress in utilizing such information. The program REACT addresses the first problem while MSRANK concerns the second, in the context of mass spectronetric observations. 2.1 REACT This program [see Report HPP-76-51 has two basic goals: 1) to provide the chemist with a computerized language for defining graph transformations and applying them to structures, thus simulating chemical reactions; and 2) to automatically keep track of the interrelationships between structures in a complex sequence of reactions so that whenever structural claims are made ruling out structures at one level, the implications in terms of structures at other levels can traced. During the last year so-~e progress has been made toward both of these goals. EDITREACT, the reaction-editing language, has been extended to allow the user to define subgraph constraints which apply relative to a potential reaction site rather than to the molecule as a whole. For example, in the present version of REACT, we can say either that a hydroxyl group (OH), if present anywhere in the reactant molecule, would inhibit the reaction, or that such inhibition would take place only if the OH group is adjacent to the reaction site. Such site- specific constraints, applied either before or after the transformation (i.e., reaction) has been carried out on the site, are critical to the detailed description of real chemical reactions. The inclusion of this facility in REACT substantially increases its usefulness in real-world chemical problems. The bookkeeping problem has undergone a complete reconceptualization in the past year, the purpose being to mimic more closely the actual steps taken by a chemist in the laboratory. In the initial implementation, a set of products arising from the application of a given reaction to a given starting structure J. Lederberg Privileged Communication DENDRAL PROJECT Section 6.1.1 could be subjected to a multi-level classification which grouped the products based upon user-defined substructural constraints. Each of these classes had an associated minimum and maximum number, representing the numbers of products which were allowed to be members of the class. Any starting materials whose products could not satisfy these conditions were removed from the list of candidates. Structures in any class could be further reacted, their products classified, and so on. This treatment of bookkeeping was sufficient for stating many chemical problems. For example, suppose a chemist knew that a particular reaction on an unknown compound yielded two carbonyl compounds (i.e., containing GO>, at least one of which was an ester (-O-GO). He could define a product class CARBONYL using the C=O substructure with a minimum and maximum of two products. He could then define a sub-class of CARBONYL called ESTERS using the substructure -O-C=0 with a minimum of one and a maximum of two products. The program would automatically use this information to eliminate candidate starting structures which could not give the indicated product distribution with the given reaction. There are chemical problems, though, for which the above scheme is too rigid. For example, suppose a reaction gives several products, two of which are isolated and labelled Pl and P2. Suppose that only a small amount of Pl is available so only mass spectroscopic measurements are practical. Suppose also that a deuterium-exchange experiment shows that Pl has two exchangable protons (say, either N-!-l or O-H). P2 shows a strong carbonyl absorption in the IR. P1 might also contain a carbonyl group, but that was never determined, and neither was the number of exchangable protons in P2, which could be two. No matter how one attempts to use the above-described classification system, one cannot express this information accurately. In the new approach, for which the algorithmic design has been completed, one is allowed to express data in a much more natural sequence which parallels the experimental steps. The first experimental step after a reaction is usually the separation and purification of products. An analogous step is to be included in REACT, in which the separation amounts to the setting up of a specified number of labelled "flasks" (analogous to the labels Pl and P2 in the above example) each of which is ultimately to contain a specified number (usually I) of the products. As experimental data are gathered on each real product, corresponding substructure constraints are attached to the corresponding flask in the program. As each such assertion is made, the bookkeeping mechanism verifies that, for a set of reaction products fro,n a given starting material, there is at least one way of distributing them among the flasks such that each product satisfies the constraints for its flask. If this test is ever violated, the starting material is removed as a candidate structure. Flasks containing more than one product may be further separated into "subflasks" to any level, and the contents of any flask may be made to undergo further reactions. This capability, the reacting of flask contents, is analogous to Common laboratory procedures in srhich incomplete separations of products are encountered. Dealing with such situations adds considerable complexity to the bookkeeping mechanism, because the contents of a flask may be ambiguous to the program when the reaction is applied. REACT must keep track of all possible structures which might, based on the current flask constraints, occupy the reacting flask. If such a reaction fails (because the products did not satisfy the constraints specified for them), REACT does not eliminate the starting structure entirely, but notes that the structure may not occupy that flask in future flask-allocation tests. Privileged Communication 61 J. Lederberg Section 6.1.1 DENDRAL PROJECT 2.2 MSRANK This program is an outgrowth of MSPRUNE described in last year's annual report. It is a combination of a predictor which uses a very simple theory of mass spectrometry to predict the spectra of candidate structures, and an evaluation function which compares the predictions with the observed spectrum of the unknown, assigning a goodness-of-fit score to each candidate. The candidates are then sorted based upon how well they match the observations. The basic concept here is not a new one to the DENDRAL project [See, for example, Buchanan, et al. in Machine intelligence 3 (Meltzer & Michie, eds., Edinburgh Univ. Press, -- 1969)], but there are some new aspects to the problem when viewed in the overall CONGEH context. Because of the wide variety of structural types which can be produced by CONGEN, it is necessary for MSRANK to use a very general model of mass spectrometry. The best predictive theories of mass spectrometry are limited to families of closely related structures (i.e., class specific theories), and the Meta-DENDRAL program is designed to help in discovering such theories. There are very few general principles upon which to draw in predicting mass spectra, though, so MSRANK is limited to only the most approxiaate kinds of evaluation functions. One principle which we noticed being used by practicing mass spectrometrists was: of two candidate structures for an unknown, the most likely structure is the one which explains the observations most 1'simply71 - i.e., with the fewest complex explanations involvin g many bond cleavages and the transfer of many hydrogen atoms. The evaluation function used by MSRANK is based on a quantitation of this principle. MSRANK is quite new and we have not yet had sufficient experience with it to evaluate its overall usefulness. By using only unit plausibilities for selected characteristics of the mass-spectral cleavages, i?e are able to duplicate earlier results obtained with the predictor/comparitor functions applied to mono- and di-ketoandrostanes. These tests serve to check the accuracy of the MSRA??K program. We are now doing a systematic study of various classes of compounds by ranking the spectrum of a known structure against a CONGEN-generated list of structures which contains the correct one among several which are closely related. Stereochemistry in CONGEN We have started the complex task of giving CONGEN the capability of recognizing stereochemical features of molecules and using stereochemical information in structure determination. The ability to recognize stereochemical features would allow, for example, the generation of all stareoisomers of a give1 topological structure with or without constraints. The ability to use stereochemical information would allow the determination of constraints on stereoisomer (and topological isomer) generation caused by, for example, partial knowledge of relative or absolute stereochemistry of structural fragments, knowledge of overall molecular chirality (or lack of), absolute and relative J. Lederberg 62 Privileged Communication DENDRAL PROJECT Section 6.1.1 stereochemistry from circular dichroism measurements, and so forth. Thus far, only the topological information (constitution) has been recognized and used by CONGEN. The first stage of this development is to produce a program which generates all the stereoisomers of a given topological structure. This program will be placed at the end of the existing CONG2N program. The present report describes the development of the theory and algorithm for stereoisomer generation and the progress on the programming of this algorithm. The GC/HRMS DATA SYSTEM New Developments In addition to upgrading old versions of the high resolution system, work is being done on creating a low resolution system for the MAT 721. The ultimate aim is collect data that can be run through CLEANUP, a program that resolves multiple spectra under a single GC peak, and cleans up the final spectra. The problem with the current system is that w e calnot scan fast enough to provide CLEANUP the data it needs. The high resolution system requires resolution good enough to separate sample peaks from the reference peaks. If the scan is sped up past a certain point, SAMRUN can no longer separate the peaks, and therefore cannot calibrate the run. At the same time, CLEANU? requires at least 7 spectra across a GC peak be taken to insure resolution of multiple spectra. The fundamental problem then is that an alternate method of calibrating the mass spectrum, without using known calibration peaks, must be found before scan speeds required by CLEANUP can be achieved. Tine most direct solution to this is to directly measure the magnetic field strength of the instrument, and using it to calculate the mass that is being observed. To do this we inserted a hall probe between the poles of tne magnet, and connected it to the data acquisition system on the PDP-1 l/20. The main problems with th e hall probe are as follows: 1) to make sure that the ion reading and the hall probe reading are simultaneous 2) to insure that the correct hall reading can be assigned to the correct ion reading 3) to determine the reproducibility of hall readings versus mass being observed in both dynamic (scanning) and static situations and 4) to decide if the probe has the speed and accuracy to calibrate the instrument. The first two problems are a matter of hardware. The configuration of the original data collection system is as follows: the ion detector goes to an A/D converter, which is connected to a DMA. The DMA is on an 11/20, which has a data collection system, SAQMON, running. This performs various low level filtering and buffering operations. The DMA is actually a low level processor which counts the number of samples taken, stores them into successive memory locations, and interrupts the central processor when a block of data has been collected. The timing of the sample collection is controled by a quartz crystal clock. On each timing pulse, a signal is sent to the A/D on the ion detector to convert that value to a digital number. To Privileged Communication 63 J. Lederberg Section 6.1.1 DENDRAL PROJECT accommodate the hall probe, the DMA was modified so that on the timing pulse, the start signal is sent simultaneously to both the A/D on the ion detector and the A/D on the hall probe. The DMA then services both of the A/D's, and stores the readings in successive memory locations. The net result is that when the DMA interrupts the central processor, the block of data is a set of pairs of readings, an ion reading and the hall reading for that time. This solves both of the first two problems, since we now have the ion reading and the hall reading connected both in time and location. The second two problems, testing the reliability and reproducibility of the hall probe, requires new software. We are currently modifying portions of the calibration mechanism of the high resolution system to calculate masses for a large number of hall readings. Y6TA DENDRAL The success of any reasoning program is strongly dependent on the amount of domain-specific knowledge it contains. This is now almost universally accepted within AI, partly because of DENDRAL's success. Because of the difficulty of extracting specific knowledge from experts to put into the program, many years ago we began to explore the problems of efficiently transferring knowledge into a program. We have looked at two alternatives to lfhand-crafting" each new knowledge base: interactive knowledge transfer programs and automatic theory formation programs. In this enterprise the separation of domain-specific knoxledge from the computer programs themselves has been a critical component of our success. One of the stumbling blocks with the interactive knowledge transfer programs is that for some domains there are no experts with enough specific knowledge to make a high performance problem solving program. We were looking for ways to avoid forcing an expert to focus on original data in order to codify the rules explaining th:ise data because that is such a time-consuming process. Therefore we began working on an automatic rule formation program (called Meta- DENDRAL) that examines the original data itself in order to discover the inference rules for that part of the domain. The problem solving paradigm for Meta-DENDRAL is also the plan-generate- test paradigm used in Heuristic DENDRAL. in this case one part of the program (RULEGEN) generates plausible rules within syntactic and semantic constraints and within desired limits of evidential support. The model used to guide the generation of rules is particularly important since the space of rules is enormous. The planning part of the program (INTSUM) collects and summarizes the evidential support. The testing part (RULEMOD) looks for counterexamples to rules and makes modifications to the rules in order to increase their generality and simplicity and to decrease the total number of rules. Meta-DEHDRAL successfully formulated rules of mass spectrometry that were new to the science. These rules, along with a discussion of the methodology, J. Lederberg 64 Privileged Communication DENDRAL PROJECT Section 6.1.1 were published in the scientific literature [Report HP?-76-41. The program was tested to see if it could rediscover the rules of mass spectrometry for two classes of chemical compounds that were already well understood (amines and estrogenic steroids). Then it was applied to three classes of compounds whose .mass spectrometry was not as well known (mono-, di-, and tri-ketoandrostanes). The program produced three sets of rules that explained much of the significant data for these classes. The time for manual rule formation for these data was estimated to be several months. Progress was made on generalizing the Meta-DEMDRAL program, and rules for a new domain were successfully discovered by the program. A scientific paper on this application was submitted for publication [Report HPP-'77-43. The new application was learning rules for interpreting signals from Cl3-NMR spectroscopy. The instrument produces data points in a bar graph in response to the resonance of each carbon-13 nucleus in the sample. The rules describe an environment of a Cl3 atom and predict a resonating frequency range for every atom that matches the description. The Meta-DENDRAL program needed some modification because the rules are predicting ranges of data points, and not precise processes, as for the mass spectrometry version. The RULEGEN component of Meta-DENDHAL was demonstrated to work with its heuristic search paradigm. Guidance from a model of mass spectrometry is an important feature of RULEGEN. Also, the program uses problem data for prunin,g possible rules (and all more specific rules formed from those). The amount of data examined during the search is very large and the space of rules is immense, so the search needs to be rather coarse in order to produce plausible, but not necessarily optimal, rules. The RULEHOD program for "fine-tuning I1 Meta-DENDRAL's newly-discovered rules was finished. This program provides a number of important subtasks, including merging similar rules, making rules more specific or more general, and filtering out the weakest rules. RULEWXI checks for counterexamples to rules and uses this information in all of the named tasks. Because of the expense of computing counterexamples to possible rules, this computation is delayed until Meta-DENDRAL has a set of plausible rules, rather than computing counterexamples on each possible rule examined in the search of the rule space. A report was written on the AI methodology underlying Meta-DEMDRAL The major idea developed in this report is that knowledge of the domain can be used effectively to guide a learning program. The major difference between Meta- DEADRAL and statistical learning programs is that Meta-DENDRAL uses a strong model of mass spectrometry, including any assumptions the user cares to make about the domain, to guide the formation of explanatory rules. Cl3 NMR SPECTROMETRY 13C rWiR was selected as a new application area for the rule formation program, Meta-DENDRAL. The algorithms used for mass spectroaetry rule formation Privileged Communication 65 J. Lederberg Section 6.1.1 DENDRAL PROJECT were extended to 13C Nt+iR and used to obtain a set of rules for These two classes and acyclic amines. These two classes were chosen since compounds in these classes are known to show a strong correlation between structural environment and shift. Thus, the programs could be tested knowing that the underlying basis for the form of the rule was valid. The form of the rule is substructure ---> shift range. A sample rule generated is C-P-C-X- ---> 19.85<= (delta sub C)<=21.3. The asterisk in the substructure description denotes the atom for which the shift is predicted. Only topological descriptor s were used to construct the substructures. The addition of stereochemical terms is a topic of current work. It was necessary to change RULEGEN so that the left-hand sides of rules were expanded outward from a carbon atom rather than from a bond. The right-hand side of the rule is associated with a range rather than a precise mass as in the mass spectrometry program. This modification also required changes in the rule search procedure. The user sets two parameters which guide the rule search. These parameters are MINIMUi&EXAMPLES which requires each rule to explain a given number of peaks in the training set and MAXIMU%RANGE which defines the acceptable shift range for a rule. These parameters regulate the degree of specificity or generality of the rules. From the set of rules generated a subset is selected corresponding to the "best" set which still covers all the training set data. The best rule is selected by calculating (number of peaks predicted/(range %* 2)). Data which are predicted by the best rule are removed and the next best rule is found for the remaining data using the criterion given above. This process is repeated until all data are explained. In order to test the informational content of the rules generated a second program was written which applied the rules to a list of candidate molecules and ranked the molecules. Firsts, all possible structural isomers for a given empirical formula were generated using CON2EN. The rules were applied to each of the possible isomers and spectra were predicted. The predicted spectra were compared to that of a known spectrum from a compound with--the same empirical formula. The structural isomers were ranked according a comparison score to determine how well the correct compound was distinguished from its isomers, on the basis of the predictive rules. The details of the generation of rules and the use of rules for structure selection can be found in a paper recently submitted for publication [Report HPP- 77-41 J. Lederberg 66 Privileged Communication DENDRAL PROJECT Section 6.1.1 'The 13C NMR rule formation program was applied to a set of paraffins and acyclic amines. The program generated 138 rules to cover 435 data peaks. The rules generated were applied in a structure selection test for the structural isomers of CgH20 and ~6~15~. No structures with these empirical formulas were included in the training set. Twenty-four CgH20 and eleven CSHl5N 13C NMR spectra were available to act as unknowns in the structure selection test. The results of the structure ranking applied to these spectra are shown below. EMPIRICAL FORMULA C9H20 C6Hl5N NWBER OF NUMBER OF CANDIDATES CANDIDATE ISOMERS RANKING 1st 2nd . . . ..6th......gth 35 20/24 3/24 l/24 39 ~/II 2/11 l/II The performance of the rules in discriminating among similar structures not included in the training set data demonstrated the content of the rules. FUNDING STATUS Renewal of funding for three years was just received for NIH Grant RR-00612 from the Biotechnology Resources Program (May, 1977 - April, 1980). The award for 1977-78 is approximately $193,000. In addition, support for the basic artificial intelligence research on which this work is grounded is provided by the Advanced Research Projects Agency of the Department of Defense (ARPA Contract DAHC-15-73-C-0435). A new two-year contract was just negotiated for the period July, 1977 - June, 1979. RECENT PUBLICATIONS -- ---- (Only publications related to computers in chemistry are shown.) HPP-76- 1 HPP-76-2 HPP-76-3 D.H. Smith, J.P. Konopelski and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference. XIX. Computer Generation of Ion Structures", Organic Mass Spectrometry, 11: 86, ( 1976). Raymond E. Carhart and Dennis H. Smith, "Applications of Artificial Intelligence for Chemical Inference XX. Intelligent Use of Constraints in Computer-Assisted Structure Elucidation", Computers In Chemistry (in press). C.J. Cheer, D.H. Smith, C. Djerassi B. Tursch, J.C. Braekman and D. Privileged Communication 67 J. Lederberg Section 6.1.1 DENDRAL PROJECT HPP-76-4 HPP-76-5 HPP-76-6 HPP-76-10 HPP-77-4 HPP-77-6 HPP-77-11 Daloze, "Applications of Artificial Intelligence for Chemical Inference XXI a Chemical Studies of Marine Interbrates - XVII. The Computer- Assisted Identification of [+J-Palustrol in the Marine Organism Cespitularia sp., aff. subviridis". Tetrahedron. 32:1807, Pergamon Press, (1976). B.G. Buchanan, D.H. Smith, W.C. White, R.J. Gritter, E.A. Feigenbaum, J. Lederberg, and Carl Djerassi, llApolication of Artificial Intelligence for Chemical Inference iXI1. Automatic Rule Formation in Mass Spectrometry by Means of the Meta-DENDRAL Program", Journal of the American Chemical Society, 98: 6168 (1976). T.H. Varkony, R.E. Carhart and D.H. Smith, "Applications of Artificial Intelligence for Chemical Inference XXIII. Computer-Assisted Structure Elucidation. Modelling Chemical Reaction Sequences Used in Molecular Structure Problems", in l'Computer-Assisted Organic Synthesis", W.T. Wipke, Ed., American Chemical Society, Washington, D.C., in press. D.H. Smith and R.E. Carhart "Applications of Artificial Intelligence for Chemical Inference XXIV. Structural Isomerism of Mono and Sesquiterpenoid Skeletons 1,2-l', Tetrahedron, x2:2513, Pergamon Press (May 1976). Bruce G. Buchanan and Dennis Smith, "Computer Assisted Chemical Reasoning", in Proceedings of the III International Conference on Computers in Chemical Research, Education and Technology", Plenum Publishing, (1975). T.M. Mitchell and G.M. Schwenzer, ltApplications of Artificial Intelligence for Chemical Inference. XXV. A Computer Program For Automated Empirical 13C NMR Rule Formation", (Submitted to JACS, January 1977). Bruce G. Buchanan and Tom Mitchell. "iModel-Directed Learning of Production Rules", Submitted to the Proceedings for the Workshop Pattern-Directed Inference Systems in Hawaii, (February, 1977). cs-77-597 > on (STAM- Dennis H. Smith and Raymond E. Carhart, l'Structure Elucidation Based on Computer Analysis of High and Low Resolution Mass Spectral Data". Proceedings of the Symposium on Chemical Applications of High Performance Spectrometry. University of Nebraska, Lincoln, (in press). II. INTERACTION WITH THE SUMEX-AIM_ RESOURCE -I_-.- - - The number of persons experimenting with CONGEN has grown as a result of both the continuing practice of issuing an "invitation for program trial use" at the conclusion of publications, as well as continuing personal contact between J. Lederberg 68 Privileged Communication DENDRAL PROJECT Section 6.1.1 Dendral project members and potential program users. Three categories of users make up this group: Chemists Using Exported Programs The part of CONGEN responsible for teletyp e output of chenical structures (the DRAW program) is coded in Fortran. Since the paper describing this proeram appeared in print [R. Carhart, JACS, 15:82, 1975-l. we have exported the program to half a dozen sites, ranging from Japan, across North America, to Ensland. Similarly, the entire CONGEN program, is largely coded in Interlisp and SAIL, and has been exported to a collaborator in England who is very interested in the methods and programming techniques employed in coding the program. Another program which we have exported for use by other chemists is the PDP-11 CLEANUP program which was described in ANALYTICAL CHEMISTRY [48:1363, 19761. This program "cleans up*' new GC/MS data to eliminate noise peaks and to separate the data associated with components in the mixture. In each case, the requestors were provided with an initial choice of format options from which they could select the one most suitable for their computer installation. They were asked to send a 2400 foot reel of magnetic tape appropriate to the selected format option. The programs were written on the tape and returned to them along with a brief written explanation of program organization. Accurate records are kept of who has received the prosrams, so that omissions and errors can be corrected by mail at a later date, if ever necessary. 1. Dr..James F. Elder, Dow Chemical U.S.A., IMidland, ?lichigan. 2. Dr. Robert 1. Supnik, Massachusetts Computer Associates, Inc., Wakefield, Massachusetts. 3. Mr. Dan Pearce, Orange County Sheriff-Coroner Department, Santa Ana, California 92702 4. Dr. H. J. Stoklosa, Central Research & Development Department, E. I. du Pont de Neaours & Company, Wilmington, Delaware. 5. Dr. Douglas i?. Kuehl, Environmental Research Laboratory-Duluth, Dulut'n, Minnesota. 5. Dr. Richard A. Graham, Food Sciences Laboratory, U. S. Army Natick Laboratories, Natick, Massachusetts. 7. Dr. Walter M. Shackelford, United States Environmental Protection Agency, Environmental Research Laboratory, .4thens, Georgia. 8. Dr. Richard Gans, Chemical Research Division, American Cyanamid Company, Bound Brook, New Jersey. 9. Dr. John C. Marshall, Department of Chemistry, the University of North Carolina, Chapel Hill, North Carolina. 10. Dr. Graham S. King, Department of Chemical Pathology, Queen Charlotte's Hospital for Women, London, England. Privileged Communication J. Lederberg Section 6.1.1 DENDRAL PROJECT 11. Dr. J. Wyatt, Chemistry Division, Naval Research Laboratory, Washington, D. C.. 12. Dr. Gareth Templeman, Research and Development Laboratories, The Pillsbury Company, Minneapolis, Minnesota. 13. Dr. J. B. Justice, Department of Chemistry, Emory University, Atlanta, Georgia. 14. Dr. Thomas Knudsen, Northrop Services, Environmental Sciences Group, Research Triangle Park, North Carolina. 15. Dr. Ingolf Meineke, Fachbereich Chemie, Philipps Universitaet, Lahnberge, West Germany. 16. Dr. M.A. Shaw, Unilever Research, Port Sunlight Laboratory, Wirral, Merseyside, England. 17. 18. Dr. Ernst Weber, Varian MAT, Bremen, West Germany. Paul V. Fennessey, Department of Pediatrics, University of Colorado 14edical Center, Denver, Colorado. 19 * 20. 21. R. G. A. R. Maclagan, Department of Chemistry, University of Canterbury, Christchurch, New Zealand. James E. Oberholtzer, Arthur D. Little, Inc., Cambridge, Massachusetts. F. Street, AEI Scientific Apparatus Limited, Manchester, England. Remote Users of SUMEX Due to the fact that the SUHEX computer is available via both the TYMNET and ARPANET communication networks, it is possible for scientists in many parts of the world to directly access the Dendral programs on SUIYEX. Primary usage is centered on CONGEN, although INTSU14 is beginning also to gain a following. Although access points to SU!4EX are widespread, they frequently are not diverse enough to accommodate the dispersed group of scientists who have expressed an interest in using one of the Dendral programs. For example, Dr. Joseph Baker of the Roche Institute of Marine Pharmacology in Dee Why, Australia, is looking at the possibility of accessing SUMEX by usin g International Direct Distance Dialing (IDDD). Chemists Communicating by Mail Many Scientists interested in using DENDRAL programs in their own work are not located near a network access point. .Users of this type choose to use the mail to send details of their structure elucidation problem to a Dendral Project collaborator at Stanford. J. Lederberg 70 Privileged Communication DENDRAL PROJECT Section 6.1.1 Chemical Problems Posed to CONGEjl Following is a list of CONGEN users, and a brief summary of their program interests during the past year. 1. Dr. Roger Hahn, Syracuse University. While at Stanford he used CONGEN to help solve the structures of photoproducts by obtaining all possibilities under available constraints and designing NMR experiments to differentiate the possibilities. This work will be published soon. 2. Dr. William Epstein, University of Utah. During a demonstration of CONGEN, he posed a problem to verify that the structural possibilities he determined for an unknown were in fact all possibilities. The structure of methyl santolinate has been published (see Epstein, et al., J.C.S. Chem. Commun., 590 (1975)). 3. Dr. Clair Cheer, University of Rhode Island. While on sabbatical at Stanford, Dr. Cheer has worked on a number of structure elucidation problems using CONGEN including Briareine D and [+I-Palustrol (Cheer et al., Tetrahedron Letters, 1507 (1975)). Work is continuing on the structure of another marine natural product, presumably a cembrenolide, for which there are currently seven possibilities. 4. Dr. Jerrold Karliner, Ciba-Geigy Corporation. Dr. Karliner has solved several structural problems using CONGEN, including material with flame retardant properties, an impurity in a production sample and nitrogen heterocycles being investigated for pharmacological activity. COIJ';EN enabled reduction of the number of possibilities to the point where subsequent experiments led to unambiguous structural assignment. 5. Dr. Gino Marco, Ciba-Geigy Corporation. He has used CONGEN to help solve structures of conjugates of pesticides with sugars and amino acids. 6. Dr. Milton Levenberg, Abbott Laboratories. He has worked on the structure of a compound with mild antibiotic activity, isolated from a fermentation broth. There are currently ten structural possibilities, reduced to that number from the 33 initially determined using CONGErJ by additional experimental data. 7. Dr. David Pensak, DuPont. He is currently learning to use CONGEN and plans to evaluate its utility for structural problems of some of his coworkers. 8. Dr. Douglas Dorman, Eli-Lilly. He is using CONGEN to assist in structure elucidation of netabolites of microorganisms shown to have pharmacological activity. He has worked on five such problems, including a current one where the developing MSPRUNE capabilities are being used. 9. Dr. L. Minale, Napoli, Italy. We have worked with him by sending him Privileged Communication 71 J. Lederberg Section 6.1.1 DENDRAL PROJECT structural alternatives for propostd structures for some marine natural products (Pallescensins, Tetrahedron Letters, 1417 (1975)) and cyclic diethers from the lipid fraction of a thermophilic bacterium (J. C. S. Chem. Commun., 543 (1974)). 10. Dr. K. Nakanishi, Columbia UniversLty. We have worked with him by sending him structural possibilit ies for termite defense compounds (structure finally solved by X-ray crystallography). This trial plus a live demonstration to one of hi s students has resulted in efforts toward continued collaboration on other insect defense secretions and exploration of the possibility of his direct access to SUMEX. 11. Dr. L. Dunham, Zoeeon Corporation. We have collaborated with him on the use of INTSUM for mass spectral fragmentation studies of insect juvenile hormones. 12. Dr. A. G. Gonzales, Tenerife, Spain. We have recently sent him structural alternatives for constituents of Laurencia Perforata (Tetrahedron Letters, 2499 (1973)), and expect to continue discussions on the structures of these compounds. 13. Dr. T. Irie, Sapporo Japan. We have recently sent him structural alternatives to published structu res on constituents of Laurencia Glandulifera (Tetrahedron Letters, 821 (1974)) and expect to continue discussions on this problem. 14. Dr. C. J. Persoons, Delft. We have corresponded with him on structural alternatives for cockroach sex pheremones (Periplanone-B (Tetrahedron Letters, 2055 (1976)), and he has agreed to further collaboration on new problems. 15. Dr. F. Schmitz, University of Oklahoma. We explored for him structural alternatives for an unk?.oun diterpenoid hydrocarbon. We obtained 25 possibilities, of which only four obeyed the isoprene rule. 16. Dr. J. Baker, Roche Institute of Marine Pharmacology, Australia. We plan collaboration with Dr. Baker on the sterol fractions of various marine organisms and are explorin s ways for him to access CONGEN. 17. Dr. E. VanTamelen, Stanford University. We have used the developing reaction features of CONGEN to explore structural possibilities for both chemical and biogenetic cyclization products of squalene-oxide congeners. We have suggested alternatives to proposed structures and helped to design experiments to differentiate them. 18. Dr. J. C. Braekman, Brussels. Dr. Braekman visited Stanford as a part of continuing collaboration. in marine chemistry with Dr. Tursch's group. While at Stanford he explored use of CONGEN for use in current problems in marine natural products, and worked on the problems of Drs. Irie and Gonzales (see above). He is currently exploring access to CONGEN from Brussels, via TYMNET. J. Lederberg 72 Privileged Communication DENDRAL PROJECT Section 5.1.1 Use of CONGEN by working scientists has turned up one major area in which additional information to the user was thought to be necessary. CONGEN users unanimously indicated their desire for a method of determining what percentage of the whole problem was solved at any moment, i.e., total number of possible structures is represented by the number already generated. In a prototype system we have implemented the Cntrl-I and Cntrl-S user information interrupts, to show how far CONGEN has progressed. If, for example, someone who has generated 357 structures is told that this indicates that they have generated 1 percent of the total possible structures, they immediately know that they do not want to finish generating all the structures. Even if there were enough space, r(O,Oc)O structures would be far more than they would want to see. We implemented another user-oriented facility for an invited paper presented at the 172nd American Chemical Society meeting, in August of 1975. Special features were added for a character-oriented, screen-addressable CRT terminals to give users an informative visual interface to CONGEN, an otherwise complex The dynamic field of view provided by this type of terminal was used to advantage to give the chemist-user a continuous , graphic summary of both the information he has supplied to the program and the dynamic use of that information by the program. INTERACTION WITH OTHER SUMEX-AIII PROJECTS We have had numerous discussions with Prof. Todd Wipke's research group in meetings of our combined groups. Because the problems of manipulating chemical graphs are much the same for both groups, frequent discussions are mutually advantageous. Almost daily contact with other Stanford-based projects provides new ideas and programming assistance. In particular, there is considerable interaction with members of the MYCIN, MOLGEN and Protein Crystallography projects. Many of our experiment planning ideas have come from discussions with the MOLGEN group. Our ideas about explaining a program's reasoning are derived from the success of MYCIN's explanation package. And our ideas about integrating multiple sources of knowledge in data interpretation have been enhanced through discussions with the Protein Crystallography group. The large number of excellent INTERLISP programmers in all these groups provides a pool of programming expertise that we draw on frequently also. We are collaborating with Dr. Robert Lindsay on a monograph about the DENDRAL programs, with most of our interaction and all our text preparation taking place over the SUMEX system. We have also discussed helping Dr. Lindsay with a knowledge-based reasoning program to help pathologists at the University of blichigan . CRITIQUE OF RESOURCE SERVICES Some problems have arisen as a result of the Dendral commitment to working with outside chemist users. The primary area of difficulty arises from the fact that the i)endral project, as one of the many projects which use the SUMEX facility, is allocated a certain portion of system resources. Therefore, support Privileged Communication 73 J. Lederberg Section 5.1.1 DEFJDRAL PROJECT of an extensive body of outside users means that resources to support these users must be diverted from the research goals of the project. In encouraging new users, Dendral must be careful to state that access to Dendral programs might have to be restricted in the future if system loading becomes extensive. Understandably then, some scientists are reluctant to invest time in learning to use a complicated, although potentially useful program which they may well only be able to use on a temporary basis. One solution to this problem is to make the available programs as efficient as possible, and/or to make it possible to distribute copies of the program to other sites. The interactive computing environment provided by the SUMEX-AIM resource and the power of the INTERLISP language give us the capability of building and debugging complex programs rapidly. These are the best tools currently available for AI research. Because these tools are available and they are almost always available on command, our researchers are working at the frontier of applied artificial intelligence. The SU?JEX staff does an outstanding job of keeping the computer and peripheral devices running reliably: without this professional support we would not be able to build, enlarge, and test programs as complex as the DENDRAL programs. The large number of persons who use the resource is our single biggest source of frustration. Several of the DENDRAL programmers work frequently from midnight to 8:00 a.m. just to avoid computing during the day. Although this minimizes their interaction with the rest of the research group, it allows them to work on large, cycle-intensive programs without competing for resources during "prime-time" hours. III. USE OF SUMEX DURING THE FOLLOU-ON GRANT PERIOD (8/78-7/8X) ---- LONG-RANGE GOALS Our primary goal is to build reliable, useful tools for biomolecular structure characterization and make them available for widespread use. The CONGEN program is farthest along in this respect. We will extend its scope and add features to make it easier to use, while working on the problems of increasing its availability. By building onto CONGEN we will develop a broader set of tools with capabilities for helping biomedical scientists in many ways. By increasing the generality of Meta-DEMDRAL we intend to-provide tools for model-directed learning from empirical data that will complement purely statistical tools. At the same time we are building tools we are also exploring basic AI issues of knowledge representation, use, and acquisition in complex reasoning programs. These are fundamental issues for knowledge-based programs, such as those currently running on SUMEX. J. Lederberg 74 Privileged Communication DENDRAL PROJECT Section 6.1.1 JUSTIFICATION FOR COrJTINUED USE OF SUflEX The research goals and methods of the DENDRAL project fit well within the stated AIM criteria. We are building knowledge-based programs, and extending the art of applying AI to medicine to the benefi.t of both working biomedical scientists and other groups building similar tools. We need the SUMEX-AIM resource for our work because of its excellent environment for symbolic computing. The interactive computing facilities and the features of the INTERLISP language on SWIEX give us a several-fold increase in productivity over our previous batch computing environment using LISP-360. Privileged Communication 75 J. Lederberg Section 6.1.2 HYDROID PROJECT 6.1.2 HYDROID PROJECT HYDROID - Studies in Distributed Processing and Problem Solving Prof. Gio Wiederhold Computer Science and Electrical Engineering Stanford University I. Summary of Research Program - A. Technical Goals The objective of this research is the development of a methodology for the analysis and implementation of alternatives in distributed processing and problem solving. One of the primary reasons for interest in this area is its potential to break through the speed limitation barriers imposed by uniprocessing systems. If such a breakthrough can be achieved then the viability of the methods being developed by other projects using the SUMS&AIM resource will be enhanced. The rapid development of microprocessor and communications technology has given rise to a large number of proposed implementations of networks employing multiple processors. The computations to which these distributed systems are to be applied include heuristic decision-making problems, mathematical modelling, data reduction, and database search, as well as general purpose multi-access computing. There is however a lack of an adequate global understanding of the computational tradeoffs implied by network architectures. In order to complement the experimental results of other investigators and broaden their applicability to the system-design decision-making process, we are developing a general framework for the study of processor interaction in distributed processing systems. The framework consists of rules to obtain parameters from programs which specify the computations, rules to parameterize descriptions of networks of processors, and procedures to calculate expected system performance from these parameter sets. The framework is to be sufficiently powerful so that, when it is validated, the methods will be able to assist in the a priori assessment of the potential performance of new system alternatives or of systems with improved system components. One of the primary tools we are using to analyze the interaction between computations and distributed processor networks is simulation. The behaviour of Processor network nodes, interprocessor control and task flow, and problem decomposition all require simulation at different levels of abstraction. Analytic queuing models may provide insight into relationships in networks, but are not adequate to provide quantitative results. Simulation is not seen as the end product of the study, but as a means to develop and assess the validity of our model of the interaction of computations and processor network architecture. Where possible, mathematical results will be used to assess the validity of model simulations. J. Lederberg 76 Privileged Communication HYDROID PROJECT Section 6.1.2 A number of large computational application- 3 are being analyzed in order to assess their potential for decomposition into modules for distributed processing. The current candidate applications are: a> b) c> d) b) cl Programs which use heuristic methods in decision-making. Heuristic programs frequently employ recursive decomposition of problems into subsidiary problems which themselves may be suitable for distributed processing. Programs which use multi-faceted databases to retrieve and abstract information. The process of intelligent data retrieval and analysis often depends on data or knowledge sources which are being maintained at geographically distributed processing sites. Programs which acquire data from multiple, possibly dissimilar, sensors and attempt to reduce this data to simpler hypotheses. Programs which solve large numerical problems, such as those found in image processing applications. Parameters which describe the computations to be simulated include: The computational kernel size: the cycle and memory demand of a computational unit between interprocessor reference requirements. The computation definition message size: the amount of data required to transmit sufficient information to initiate a computational kernel. The database size: the amount of data or program text required to sustain a computational kernel, and its availability and residence in the network. The behaviour of the system can be varied through the adjustment of other parameters. These parameters may be set to reflect the architecture of specific hardware systems, or may be varied to obtain optimum performance. In addition to obvious parameters (as the number and power of the processors), we expect the following parameter types to be important in developin g an understanding of the spectrum of distributed processor architectures: a) Interconnection density. As the density decreases, the message delay and congestion increase. This parameter will provide a high level abstraction of multi-processor connectivity schemes. Seographical distribution will increase message delay and transmission cost. b) Computational locality. A high degree of locality (of database or procedural information in the network) will enhance the probability that relevant knowledge exists in closely linked nodes, thus counteracting the effects of a low interconnection density. c) Database viscosity. A database, including the programs required to carry out the computations at a node , may be more or less fixed to one specific node. This therefore encourages the use of certain nodes for specific functions. Many current processor networks are completely rigid in this sense, and for these networks optimal initial program and database Privileged Communication J. Lederberg Section 6.1.2 HYDROID PROJECT allocations may be determined. However, we hypothesize that a greater degree of dynamic resource allocation is desirable to cope with changing loads and in order to enhance reliability. For this reason this parameter needs to be included. d) Redundancy. In order to assess the cost and benefits in terms of responsiveness and reliability, the redundancy of database and computations will also be made a parameter. In order to utilize the redundancy well, the computational resources (programs or data) which effect system performance most must be identifiable. e) Error rate. In order to test the effectiveness of reliability strategies, node and communications channel failures will be simulated. An important aspect of this model is that we intend to keep the abstractions at a sufficiently high level to allow analytic and intuitive verification of the model behaviour -when applied to well understood computations. Computations have been mapped into specific parallel machines, but these results are not easily transferred to new architectures. The distributed processor systems now being built may have characteristics with unpredicted effects on system behaviour. We expect to be able to use the model to find potential bottlenecks, which then will define areas where extra design attention has a high payoff. We do not intend to build hardware which is based literally on the abstract model. We hope to verify results obtained from the model using existing distributed processor systems and, assuming that our model (with appropriate parameters describing the load and architecture) matches the given system, be able to advise on system utilization or development aspects. A local resource of this type may be the Stanford I processor, now being built under ERDA sponsorship. In addition, if we determine that a certain, yet untried, architecture is promising, we would like to encourage and participate in its implementation. B. Medical Relevance and Collaboration Many applications at SUNEX consume large quantities of computational resources. The use of multiple distributed processors may provide a means to gain the required processing capabilities in an economic manner. In this sense the medical relevance of this study is indirect. We are attempting to develop tools which will be of use in medical computation problems. Our studies in distributed data base applications have a more direct medical relevance. To this end, we are maintaining contact with Dr. Jim Fries, whose AHAMIS database network collects data for the analysis of disease progress and treatment efficacy in rheumatoid arthritis from a variety of institutions. Sharing of data to provide a broader base for analysis is also a feature of programs in cardiology and oncology in which physicians at Stanford participate. In each of these instances the distributed nature of the data resources leads to differences in the meaning of data items, so that simple aggregation of the data may not be valid. Distributed processing may provide a powerful alternative. LT. Lederberg 78 Privileged Communication HYDROID PROJECT Section 6.1.2 C. Progress Summary The HYDROID project got underway in the fall of 1976. We have been involved since that time in developing a basic understanding of important problem areas in distributed processing and problem solving. A weekly research seminar, begun in Dec. 1975 has brought together members of the faculty and students from a variety of disciplines, and has included several speakers from application areas where distributed processing may be beneficial. We have developed a formalism in which to express the control of distributed problem solving in loosely-coupled processor networks. This CONTRACT NET protocol makes the cost of interprocessor interactions explicit. It is this cost which appears to generate one of the performance boundaries for distributed processor systems. We have written a basic simulator with which to investigate the merits of the formalism together with problem solving methods applicable in the distributed processing environment. To this end the simulator is currently being tested with small search problems as a means of determining the necessary information that must be transferred from node to node in a distributed processor system for such problems together with the advantages to be accrued via a distributed approach. The simulator is being developed to cover a greater variety of computational interactions. D. Publications 1) H. Garcia-Molina and Gio Wiederhold, "Application of the Contract Net Protocol to Distributed Data Bases", HPP-77-21, Heuristic Programming Project, Stanford University, April 1977. 2) R. G. Smith, "The Contract Net: A Formalism for the Control of Distributed Problem Solving", HPP-77-12, Heuristic Programming Project, Stanford University, February 1977 (also submitted to the Fifth International Joint Conference on Artificial Intelligence). E. Funding The HYDROID project is currently funded as part of ARPA Contract DAHC 15- 73-C-0435. Other potential funding sources are currently bein,g contacted for support of the specific areas of Hydroid application and interest. II. Interactions with SUMEX-AIM ___--_I - _I____ SUMZX-AIM currently provides all computin g resources for the project. We thus enjoy a high degree of interaction with other projects involved in the problems which result from construction of large programs. Other points of contact are related to the use of the same programming languages as well as the abundance of AI expertise residing around the resource. This latter point is Privileged Communication 79 J. Lederberg Section 6.1.2 RYDROID PROJECT especially important considering that one of our aims is discovery of suitable mappings of well understood AI methods onto highly parallel asynchronous processor networks. SUMEX-AIM is also an excellent medium for informal transmission of reports, recent results and bulletins to users with related interests and problems. The powerful screen-oriented editors available greatly enhance our capabilities for writing both text and programs. Finally, the development of simulation programs generally requires a highly interactive computing environment - the sort of environment we feel is provided by SUMEX-AIM. J. Lederberg 80 Privileged Communication MOLGEN PROJECT Section 6.1.3 6.1.3 MOLGEN PROJECT &!OLGEN - An Experiment Planning System for Molecular Genetics Prof. J. Lederberg (Genetics, Stanford) Prof. N. Martin (Computer Science, U. of New Mexico) Prof. E. Feigenbaum (Computer Science, Stanford) I. Summary of Research Program A. Technical Goals The goal of the MOLGEN project is to develop an experiment planning system for the domain of molecular genetics. In order to accomplish this, we hope to create and apply innovative methods of knowledge management and hierarchical planning. Experiments in molecular genetics are concerned with the study and manipulation of DNA molecules. The MOLGEN knowledge base will include both declarative and procedural information about such structures and the laboratory tools and techniques which experimental geneticists use. Also represented will be much of the strategic information required to join individual experimental steps into a meaningful whole. i?e are using the uniforn method of schemata for representation of all types of knowledge within i4OLGE:J. We believe this will facilitate knowledge acquisition and explanation and provide a consistent means of storing hierarchical and other relations among objects and rules in the system. 'rle hope to make the underlying knowledge base flexible enough to allow for experimentation with a wide variety of specific planning strategies. B. Medical relevance and collaboration lllolecular genetics has at least two major connections to medical research. Learning about the basic mechanisms whioh control the operation and transmission of genetic information is necessary to understand and treat the wide range of diseases (and health conditions like aging) which are genetically controlled. Also, recent developments in molecular genetics offer the promise of using genetic mechanisms to produce essentially limitless amounts of drugs and ot'ner biomedical substances. The MOLGEN project will develop a system designed to aid the molecular geneticist in planning experiments of these types. The MOLGEM project is a joint effort of the Computer Science Departments of Stanford and the University of New t4exico and the Genetics Department of Stanford. flajor participants are Professor Nancy Piartin of the University of New Mexico, Professor Edward Feigenbaum, Peter Friedland, Jonathan King, and Nark Stefik of Stanford Computer Science, and Professor Joshua Lederberg and Jerry Feitelson of Stanford Genetics. Privileged Communication 81 J. Lederberg Section 6.1.3 MOLGEN PROJECT C. Accomplishments MOLGEld is in the first year of formal funding as an independent entity. We have devoted this year to learning and analyzing the basic knowledge of experimental molecular genetics and to building part of the central structure of the knowledge base management system. A wide variety of experiments have been studied with the aim of extracting knowledge about the genetic objects and operators used as well as the higher-level know-ledge used to form the overall experimental plan. The object level knowledge is currently being organized into the schemata formalism for an initial attempt at a molecular genetics knowledge base. A representation method for DNA structures and an interactive structure editing and entry system (EDNA) has been built and tested successfully with geneticist users. Work is proceeding on the schemata storage and access routines and on routines for acquiring and editing the rules which describe the procedural knowledge of the domain. We plan to have the basic MOLGEN system operational for the purpose of testing object and operator knowledge (the practical goal of experiment checking) by the end of July 1977. D. Publications 1) N. blartin, P. Friedland, J. King, M. Stefik, "Knowledge Sase Management for Experiment Planning in Molecular Genetics," submitted to Fifth International Joint Conference on Artificial Intelligence 2) M. Stefik and N. Martin, "A Review of Knowledge Based Systems as a Basis for a Genetics Experiment Designing System," Feb. 1977 Stanford CS Report STAW-CS- 77-596, HTP77-5 3) N. I%rtin, P. Friedland, M. Stefik, "MOLGEN Knowledge Base I: Object, System" To appear as HPP Working Paper 4) N. Martin, P. Friedland, M. Stefik, "MOLGEN Knowledge Sase II: Rule System" To appear as HPP Working Paper -7 c. Funding MOLGEN research is supported by NSF grants C4CS76-11649 and MCS76-11935 for the two year period from June 1375 - June 1978. II. Interactions wit'? SUMEX-AIrJI -I_ - --__ All system development has taken place on the SUMEX-AIM facility. We have used the syste!m not only for programming, but also as a major aid in writing and transmitting among ourselves the wide variety of formal and informal reports which are necessary in the YOLGEN design phase. We believe the availability of good interactive text editing facilities like TV-Edit increases our productivity significantly. J. Lederberg 32 Privileged Communication :4OLGEN PROJECT Section 6.1.3 Active collaboration with remote users at the University of New Mexico will begin in September 1977 (Prof. Nancy Martin has been visiting at Stanford this year). We expect this collaboration to occur over the ARPA network. We hope also to maintain a collaboration with Dusko Ehrlich, formerly a Stanford geneticist and now doing research at The Institut de Biologie Moleculaire Faculte de Science in Paris over a TYHNET link to Suaex. We have benefited enormously from the collected expertise in both knowledge-based systems and general programming and design problems available from other SUZIEX-AIi4 projects. We have especially strong ties to the knowledge management expertise of the MYCIN project, but we also share common objectives with parts of the DENDRAL, SECS, and protein crystallography projects. We have also benefited from the intense interaction with many other projects at the AIM conferences. Finally, we have provided small amounts of SU3EX resources to geneticist users as part of a quid pro quo relationship for helping us understand that subset of genetic knowledge necessary for our initial knowledge base. The most outstanding example of this sort of collaboration occurred with Prof. Larry Kedes' group at the VA hospital in Palo Alto who are using SUMEX to determine the feasibility of automated assistance in analyzing complex DNA base sequences. Privileged Communication a3 J. Lederberg Section 6.1.4 MYCIN PROJECT 6.1.4 MYCIN PROJECT -1___- MYCIN - Computer-based Consultation in Clinical Therapeutics S. N. Cohen, M.D. (Pharmacology) and B. G. Buchanan, Ph.D. (Computer Science) Stanford University I) Summar of research -- - Technical goals The Mycin project is aimed at the development of a computer program capable of functioning as an expert consultant on a range of medical decision making problems. In particular, we have been working on the construction of a system that provides consultative advice on the diagnosis and therapy selection for a number of infectious diseases. Current areas of competence of the system include bacteremia and meningitis, and work is currently underway to extend this to urinary tract infections, pulmonary infections, and prophylactic use of antibiotics. Our work has been guided by three fundamental objectives: (1) (2) (3) A major goal of the MYCIN system has been to provide a computer-based therapeutic tool designed to be clinically useful, one that would be used eventually in the clinical setting. This goal requires development of a system that has a medically sound kno-dledge base, and that displays a high level of clinical competence in its field. The program must first convince clinicians of the quality of the information it is providing before they Xi11 be willing to use it. Since many clinicians are not likely to accept the advice provided by a computer-based system unless they c%n understand why the recommended therapy has been selected, the system has to do more than just give advice dogmatically. It should have the ability to explain the reasoning behind its decisions, and should be able to do so in terms that suggest to the physician that the program approaches the problem in much the same way that he does. . This permits the user to validate the program s reasoning, and modify (or reject) the advice if he believes that some step in the decision process is not justified. It also gives the program an inherent instructional capability that allows the physician to learn from each consultation session. A third major goal is to provide the program with capabilities that enable augmentation or modification of the knowledge base by clinical experts in infectious disease therapy, in order to improve the validity of future consultations. The system therefore requires some capability for acquiring knowledge by interacting with experts in the field, and for incorporating this 'knowledge into its 'knosrledge base. J. Lederberq a4 Privileged Communication HYCIti PROJECT Section 6.1.4 Three separate parts of the ?4YCIN system accomplish these goals. The consultation system uses the knowledge base, along with patient-related data entered by the physician to generate therapeutic advice. The explanation system has the ability to explain the reasonin g used during the consultation, and to document the motivation for questions asked or the rationale for conclusions reached. Finally, the knovJledge acquisition system enables experts in antimicrobial therapy to update MYCIN's knouledge base, without reqUiri.ng that they know how to program a computer. We have also sought to use Nycin as a framework for understandins the process of medical decision making and the nature of clinical judgment. Physicians are constantly faced with the necessity of making decisions based on information that is both incomplete (missing historical data or test results not yet available) and inexact (results are rarely definitive). In addition, those decisions are often based on rules that are only approximate (e.g., "a gram- negative aerobic rod in the blood is probably a bacteriodes"). But decisions are made despite these problems, and the results often proven later to be valid. We have attempted to understand how this is done by developing in our system a parallel set of capabilities. We have relied on the "production rule" encoding of information, in which individual decision rules are specified in an "if/then" format. For example, the rule indicated just above is encoded in the system as: If 1) the gram stain of the organism is gram negative, and 2) the morphology of the organism is rod, and 3) the aerobicity of the organism is anaerobic, Then there is suggestive evidence (-6) that the identity of the organism is Bacteroides. This encoding of knosJledge offers a number of advantages over some of the more traditional approaches to diagnosis like decision trees, Bayesian analysis, and utility theory. Unlike decision trees, it can deal with both inexact and incomplete information. Unlike the Bayesian and utility theory approaches, it does not need extensive amounts of conditional probability data. A collection of independent rules is also far easier to augment than a complex decision tree; the rules thus provide a much more flexible body of knowledge to which new information is more easily added. The rules also make possible an explanatory capability: the system can justify any of its actions or decisions by displaying the relevant rules it invoked in reaching that decision. This provides an explanation that is far more comprehensible than any we might be able to provide by recapping the actions of a program based solely on statistical considerations. A more specific goal of our research involves understanding the process of infectious disease diaP,nosis and therapy selection. This process is not as yet well understood, and we believe that by dissecting it down to individual decision rules, we can gain insight into how it works. In addition, the resulting set of rules may prove to be a useful compendium of knowledge about the task. Since we believe this set of rules will also be quite large, we are studying the problems of accumulating, managing, and using large stores of such task-specific knowledge. We are working on a range of techniques to provide capabilities like insurin g the consistency of the set of rules and making it easy to modify existing rules or add new ones. Privileged Communication a5 J. Lederberg Section 6.1.4 MYCIN PROJECT Finally, since computer consultants are designed for use by people who might not otherwise make use of computers, we have devoted a great deal of attention to the issue of human engineering, and the "habitability" of the system. This ranges from such minor items as the automatic correction of misspelled answers, to the range of sophisticated explanation capabilities available. Medical relevance and collaboration A number of recent studies indicate a major need to improve the quality of antimicrobial therapy. Almost one-half of the total cost of drugs spent in treating hospitalized patients is spend on antibiotics [ 1,21, and if results of a number of recent studies are to be believed, a significant part of this therapy is associated with serious misuse [2,3,4,5], Some of the inappropriate therapy involves incorrect selection of a therapeutic regimen [ 41, while another serious problem is the incorrect decision to administer any antibiotic [2,4,5]. One recent study concluded that one out of every four people in the United States was given penicillin during a recent year, and nearly 90% of these prescriptions were unnecessary [6]. Other studies have shown that physicians will often reach therapeutic decisions that differ significantly from the decisions that would have been suggested by experts in infectious disease therapy practicing at the same institution. Nonexperts sometimes choose a drug regimen designed to cover for all possibilities, prescribing either several drugs or one of the so-called 'Ibroad spectrum" antibiotics, even though appropriate use of clinical data might have led to more rational and less toxic therapy. Within a hospital environment in which professional resources are often overburdened, and in environments where expert sources are not readily available, a computer-based consultant will be highly useful. Such a system will also have broad fringe benefits in its educational impact on staff physicians and in providing a framework for quality control and peer-review evaluations. Antimicrobial therapy appears to be an esp ecially suitable area for the initial development of a computer-based system to assist physicians with decisions in clinical therapeutics. The components of the decision making process in antimicrobial th erapy are more readily definable than in many other areas of medicine, and the consequences of the physician's decision can usually be assessed in terms of the direct therapeutic action. Nevertheless, the general approach used here is applicable to other areas of clinical decision making. The basis of rational antimicrobial therapy decisions is identification of the microorganisms causing the infectious disease. Accurate identification is important because of the specificity of antibiotic action: drugs that are highly effective against certain organisms are often useless against others. The patient's clinical status and history (including information such as prior infections and treatments) provide data that may be valuable to the physician in identifying the diseasa ,-causing organisms. ilovever, bacteriological cultures that use specimens taken from the site of the patient's infection usually provide the most definitive identifying information. Initial culture reports from a microbiological laboratory may become available within 12 hours from the time a clinical specimen is obtained from the J. Lederberg Privileged Communication t"lYCIN PROJECT Section 6.1.4 patient. While the information in these early reports often serves to classify the organism in general terms, it does not often permit precise identification. It may be clinically unwise to postpone therapy until such identification can be made with certainty, a process that usually requires 24 to 43 hours, or longer. Thus it is commonly necessary for the physician to estimate the range of possible infecting organisms, and to start appropriate therapy even before the laboratory is able to identify the offendin g organism and its antibiotic sensitivities. In this setting MYCIN plays two roles: (a) providing consultative advice that will assist the physician in making the best therapeutic decision that can be made on the basis of available information, and, (b) by its questioning of the physician, pinpointing the items of clinical data that are necessary to increase the validity of the clinical decision. Our project is an interdisciplinary effort involving the joint effort of computer scientists from the Stanford Computer Science Department, and clinicians from both the Department of Clinical Pharmacology at Stanford and the Department of Infectious Disease at the University of Arizona. The task of the clinicians has been to specify the decision rules necessary for diagnosis and therapy selection, while the computer scientists have been devising ways to represent and use this information in the computer. The system is then tested by the clinicians using real cases obtained from journals and medical records. A complete listing of the staff is given below. Stanley N. Cohen, MD, Clinical Pharmacology Bruce G. Buchanan, PhD, Computer Science Stanton Axline, MD, Infectious Disease (now at University of Arizona) Randall Davis, PhD, Computer Science Frank Rhame, ND, (to q/75), Infectious Disease Edward Shortliffe, MD PhD (to 6/76, returning 6/77), Infectious Disease Victor Yu, MD, Infectious Disease Rudolph0 Chavez-Pardo, MD, (to g/75), Clinical Pharmacology A. Carlisle Scott, MS, Coinputer Science Sharon Wraith, BS, Clinical Pharmacology Jan Aikins, BS, Computer Science Robert Blum, MD, presently in Computer Science William Clancey, AB, Computer Science Larry Fagan, AB, Computer Science \?illiam van Melle, AB, Computer Science Progress Report Period covered: June 1, 1974 through September 30,1975 Summary Over the past three years we have designed, built and partially evaluated a computer program capable of diagnosis and therapy selection for certain varieties of infectious diseases. The program is intended to function as a consultant, and "interviewsf' a doctor about his patient, requesting information on clinical findings and results of laboratory tests. It relies on a store of judgmental knowledge (obtained from experts in infectious disease) to determine the Privileqed.Comnunication J. Lederberg Section 6.1.4 MYCIN PROJECT conclusions which can be drawn from the answers it receives. This judgmental knowledge is in the form of some 400 decision rules dealing with the wide range of topics that must be considered in determining the likely identity of causative organisms and selecting appropriate antimicrobials. MYCIN is composed of the three systems described earlier (the consultation, explanation, and knowledge acquisition systems), all of whieh reference the knowledge base of decision rules. The program is currently capable of dealing with bacteremia and meningitis infections. It can diagnose the likely presence of more than 35 different organisms and can recommend therapy for 100 organisms, selecting drugs from a llpharmacopoeiatt of 30 antimicrobials. The system can tailor its therapy recommendations to a specific organism and infection, can adjust dosage levels and durations in response to impaired renal status, and can combine drugs to create combination therapies , giving it a wide range of clinical applicability. Detailed Report Our work in the past several years has been organized around five main areas of investigation. We have a) increased the system's competence in existing areas of clinical expertise while expanding its scope b) developed a number of user-oriented features to increase the program's attractiveness to clinicians c) developed a range of knowledge acquisition capabilities to speed the process of expanding the system's clinical competence d) solved a number of technical problems to insure that the program does not outgrow the computer resources available to it e) evaluated the system's level of expertise. Clinical Capabilities Since the primary qualification for any clinical consultant is competence in the domain, we have devoted significant effort to expanding 1IYCIN's knowledge base and widening its scope of competence. For instance, the system was directed initially at patients with positive blood cultures, the basic methodology was generalized to stupport a much broader approach to the problem. XYCI?? has now gained the ability to deal with infections from which the causative pathogen hasn't been isolated (e.g., pneumonia), or which haven't even been cultured (e.g., brain abscess). With this broadening of scope, it has also become necessary to be able to evaluate the meaningfulness of isolates for cultures taken from sites ot'ner than blood. For urine and sputum isolates, for example, the system gained the ability to base its evaluation of sterility of an isolate on both the method of collection and the user's estimation of conscientiousness of collection. J. Lederberg 88 Privileged Communication XYCIN PROJECT Section 6.1.4 An extensive review of the program's approach to drug selection has led to a major revision in the basis for therapy selection during the course of program development. The program was given the ability to consider both the infectious disease diagnosis and the significance of the organism as further determinants of tnerapy, in addition to organism identity. These three together have become the primary factors in drug selection, with drug toxicity and ecological factors as secondary considerations. The result is a more appropriate, more sharply focussed drug selection that also includes dose, route, and duration. While the initial development of the knowledge base focussed on rules concerned with the diagnosis and therapy for blood infections (bacteremia), the couplexity of infectious disease therapy and the frequent occurrence of multiple infections in a single patient requires a broader knowledge if the system is to be clinically useful. In response we have extended MYCIN's knowledge base, while at the same time improving the degree of sophistication with which the system deals with bacteremia. The second major area has been the diagnosis and treatment of meningitis, and more than 100 rules were added to provide the ability to deal with it. In the processs the program was also extended beyond bacteria, as it gained the ability to consider and treat bot'n fungi and viruses. This area has proved to be an especially useful domain because it has presented several new challenges. In particular, meningitis requires the ability to deal with a disease that is often diagnosed on clinical grounds alone, before any specific microbiological evidence is available (by comparison, the diagnosis of bacteremia on clinical grounds alone is far less certain, and usually requires establishment of the fact that bacterial growth has occurred in blood cultures.) For this reason, extension of the project into the meningitis area has made it necessary for MYCIiJ to consider a larger range of clinical factors, and has resulted in a system which has a broader picture of the whole patient. Other contributions to the system's competence have come from expansion of the knowledge base to include information about normal bacteriological flora for a wide range of culture sites. This enables the program to distinguish between normal and pathological flora, and it can as a result decide more precisely on whether to treat. User Oriented Features Clinicians traditionally shun computer programs, and we believe this is in large measure due to insufficient attention paid to user oriented features. As a result, we have devoted significant effort to insuring that MYCIN is responsive to its users in a number of unique ways. The development of the explanation and question answering capabilities have been a essential for this work, and both have grown extensively in power. The system's ability to explain the motivations for its questions, for instance, underwent a major design revision. It is now based on a more powerful approach that relies on the program's knowledge of its own control structure and ability to examine its own rules. The user can now fully explore the system's current line of reasoning, rather than just a single level, as initially implemented. Privileged Communication 89 J. Lederberg Section 6.1.4 MYCIN PROJECT The language understanding capabilities of the question answering system have also been extensively revised. They now allow a broader range of questions to be asked and offer more precise answers. The use of this feature was also simplified so that the user no longer needs to classify his questions. A comprehensive review of the kinds of questions asked by users of the system has led to a number of important features. MYCIti can now answer a much wider range of questions, and can, in particular, explain why it did not take a specific action, as well as why positive conclusions were reached. It is our feeling that capabilities such as these are of great importance in enabling the project's staff and clinical experts to understand the program's rationale for its actions in instances where its recommendations do not appear to be the most appropriate and most correct. Thus, the line of reasoning of the program can be evaluated, and requirements for new or modified rules can be uncovered. These kinds of capabilities are also important in optimizing user acceptance of the system. A substantial addition to the question-answering facility enables the system to explain the process of therapy selection. In comparison to the diagnostic process, therapy selection is complicated somewhat by the need to consider a range of different factors simultaneously, such as the total number of drugs recommended, the de.gree of sickness of the patient, possible interactions between drugs, toxicity and other side effects, etc. Despite this complexity, explanations of therapy selection are phrased at a conceptual level that makes them comprehensible to the physician. As before, this makes it possible for the physician to verify the validity of 'the system's decisions, and makes it clear to him that the system reaches its results in nuch the sane way that he does. The explanation consists of a step-by-step review of the reasoning which led to recommending a particular drug for a specific organism. It considers such issues as why a drug was first considered for an organism, why a drug may have been chosen as the best therapy for that organism, how the total number of drugs was reduced by considering common drui: classes among the candidates, and consideration of possible contraindications based on the patient's allergies, age, and other factors. By characterizing each drug according to this scheme, the program can explain why a drug was or wasn't prescribed, as well as why one drug is to be preferred over another. This offers an important explanatory capability that will make the system more attractive and acceptable to clinicians. Several capabilities have been added to make the program easy to use. The system is now more tolerant of erroneous or inappropriate responses, and is able to provide a reworded question, along with a list of acceptable answers. In addition, it has the ability to recognize responses which are not sufficiently precise, and can rephrase its questions accordingly. We have recently added to the system th e ability to modify drug dosage in cases of renal failure. Where , previously, the system only issued a warning to modify doses, it is now able to use either creatinine clearance or serum creatinine levels to compute the level of renal function. The program then uses drug-specific information (e.g., half-life, percent loss of the drug via renal excretion, etc.) to adjust the regimen. It can either (a) ad just dose levels downward and leave dosing interval unchanged, or (b) increase dosing interval and J. Lederberg 90 Privileged Communication ?IYCIN PROJECT Section 6.1.4 leave levels unchanged, or (c) allow the physician to select a dose interval, for which it chooses an appropriate dose level. Since the problem of determining renal status and the proper adjustment of drug dose is important in the use of aminoglycoside antibiotics, cephalosporins, and other antimicrobial a.gents, the customization of drug dosage recommendations will be an important addition to the power of the system. We have found, in addition, that there is a substantial amount of information that is routinely collected in every consultation, like the date and site of each of the cultures, gramstain and morphology results for each of the organisms that grew out, etc. Currently, the program exhaustively analyzes each culture and all of its organisms in turn. Some users of the program appear to be impatient with this method, and would much prefer to enter all the relevant data on all the cultures and organisms at once. This is faster and easier, since the information can be gathered in a single review of the chart, instead of having to review it several times as each culture is processed. In response to this, we have reorganized the consultation slightly, so that it is possible to enter all of this data at once, at the beginning. This offers two other advantages in addition to improving the program's acceptability to its users. First, it provides a basis for our future efforts to write rules which deal with interactions between infections (see below, f'Spe.cific Aims"), and second, it suggests a mechanism for eventua1l.y merging our work with the product of existing efforts to organize and automate the recording and handling of medical record data. This latter development may in time make it possible for MYCIN to obtain a large part of the information it requires directly from such automated records, sharply reducing the number of questions it has to ask, and speeding up the consultation considerably. Finally, several new capabilities make the system convenient to use, in anticipation of its evaluation in the clinical setting. Among these are the option of the user to type a comment about system performance at any time during the consultation. His comment is recorded in a special file which is reviewed periodically by our medical staff, and provides an on-going opportunity for users to offer feedback aimed at improving the usefulness of the system. The user can also indicate his belief that the system has "broken down" in some way and he is invited to describe the problem. His description is saved along with information about the current state of the program, so that our systems programmers can deal with the problem later. Knowledge Acquisition A preliminary knowledge acquisition program was completed in the middle of 1974, and demonstrated the feasibility of having a physician teach the system new rul.es using a rather stylized subset of English. Building on the experience gained here, work began on a revised program designed to allow the user to examine and modify the program's knowledge and behavior as a single, unified action. This program was designed to make the explanation and knowledge acquisition capabilities available together, to make use of the fact that the nature of the explanations requested can give a clear hint about the content of a new rule. The program -was also designed to advise the user about the effect of his rule on the original deficiency, indicating, for instance, whether or not it corrects the problem he noticed. Privileged Communication J. Lederberg Section 6.1.4 HYCIN PROJECT Work on a preliminary version of this new program was completed in 1976, making available a broad range of useful features enabling our clinical experts to add rules to the system without requiring that they have a knowledge of programming. If the expert finds that MYCIN's handling of a particular problem is at variance with his own expert knowledge, he can use the explanation capabilities to discuss the line of reas0nin.g in use at that time, can add or modify rules in the knowledge base, and can determine the effects of the changes on MYCIN's subsequent performance. (Quality control is maintained on the overall system by regular meetings of our clinical and pharmacological experts who determine the f'official I' MYCIN knowledge base.) Technical Issues As MYCIN's clinical capabilities have expanded, efficiency has improved as a result of a number of modifications to the system's technical capabilitiei. Early in our work, for instance, a comprehensive review and modification of the control structure was undertaken to improve efficiency and generality. The resulting program was both more direct, and faster. More recently, modifications have been made so that the the large English dictionary can be kept on the disk and accessed only as needed, rather than keeping it in core, which slows down the system's response speed. The self documenting features of the program have also been improved to make them faster, and the system's interaction with the terminal has been made more uniform, to prepare for the time when different users of the system may have various different kinds of terminals. Evaluation Activities Since clinicians are likely to require documentation of MYCIN's competence and utility before seeking its advice, considerable time has been spent on evaluating the system and on implementing a rar::;;e of program features to supp.ort these efforts. In the past two years we have obtained many useful suggestions from clinicians when the system was presented to several different conferences. In February '1975 it was presented to the Western Society for Clinical Research, in September 1975 to the International Symposium on Clinical Pharmacy and Clinical Pharmacology, and more recently (June 1976), it was presented to the Drug Information Association. A large scale formal study and evaluation of MYCIpJ's performance was begun in January 1976. The same set of clinical data was provided to both XYCIN and a set of experts in infectious disease therapy. [Five of the experts were nationally recognized authorities in the field, the other five were clinical fellows in the Infectious Disease Division at Stanford. A complete list of names, titles and affiliations is found in Appendix 3.) The judgments of the program and the experts were compared, and the experts were asked to evaluate i4YCIN.s performance. J. Lederberq 92 Privileged Communication IWCIN PROJECT Section 6. I .4 To do this, we first designed a form to allow us to separate the variables requiring analysis. The parameters evaluated include A. the flqualityt' of the interaction - were any questions irrelevant or missing 8. the program's ability to determine organism identity C. the program's ability to determine organism significance D. the program's ability to select proper therapy E. overall performance evaluation F. potential impact as a clinical tool or teaching facility The evaluation form was designed to be informative yet simple to complete. It was tested in a pre-evaluation trial run, then used for the formal study. Consecutive patients with positive blood samples were evaluated for inclusion in the study by project personnel, until we obtained at least 10 patients for which NYCIN recommended therapy, and 15 patients overall (patients were rejected if they uere outpatients when the sample was drawn, if they had a previous blood culture in the preceding seven days, or if they had a diagnosis of meningitis or infectious endocarditis.) For each of the patients accepted, a one to two page clinical summary was prepared and combined with a summary of the laboratory test data as of the time when the first blood culture was obtained. Tnis information was then used to obtain a therapeutic evaluation frOn MYCIN. Each of the participating experts received a set of fifteen evaluation forms (one for each patient). Each form contained: (a) the clinical summary and lab data; (b) space for the expert to record his conclusions about the nature of tne infection, likely causntive organisms, and appropriate therapy; and (c) a transcript of the MYCIN consultation along with space for the expert to record his opinion of various aspects of ttiCI:\I's performance. By presenting the information in this order, we obtained a therapeutic regimen from the expert based on the same information supplied to NYCIN. This allowed us to compare the expert's answers to MYCIN's, and also gave us the expert's opinion of the system's performance. In the past few months a sufficient number of the forms have been returned that we were able to do a preliminary analysis. The figures below are based on the nine (out of ten) which have been returned. Since it is difficult to select a single number which summarizes performance, we have in general melsured each of the parameters listed above in three ways: (i) the percent of instances in which the program was judged exactly correct, (ii) the percent of instances in !qhich the program's performance was judged exactly correct or an acceptable alternative, and (iii) the percent of cases in which a majority of the experts judged its performance exactly correct or an acceptable alternative. By using all three measures, we obtsin a range of figures which give a good picture of the progra:J's performance. All of these attempts to evaluate performance are complicated by the fact that (as expected) the experts' own choices about each patient were not unanimous. Thus, we cannot ask whether MYCIN's answers were "correct" in any absolute sense, since there was no agreement on i&at constitutes "correct". Instead, we ask now often each individual expert rated the program's responses as Privileged Communication 93 J. Lederberg Section 6.1.4 MYCIN PROJECT correct. But given the variation among experts themselves, the program can never be expected to reach lOO$, and depending on the extent of the intra-group variation, the absolute limit may in fact be much lower. Thus the ideal question to ask is "Do experts rat e MYCIM's perfornance correct at least as often as they rate each othar's performance correct? 11 This would give a good indication of how close the system's performance was to that of the group of experts as a whole. We have been able to do this in a few isolated cases, but in general it requires more information than we were able to collect. This is discussed in more detail below, but in general terms the problem is that we were able to ask each expert for his choices for each patient, and ask him to rate MYCIN's choices. But, without a second round of questionnaires, which would ask each expert to rate the acceptability of the other 9 experts' responses, we lack direct information about intra-expert variability. The figures below should be reviewed with this caveat in mind. A. "Quality" of the interaction To measure the first item, the experts were instructed to mark any questions in the consultation which they felt were irrelevant, and to note any questions which they felt were omitted by the system. Overall MYCIN did quite well, as there were no consultations in which a majority of the experts felt that any particular question was irrelevant or omitted. On the average, there were 0.53 questions judged irrelevant and 0.55 indicated as omitted. Table I summarizes the next four measurements. J. Lederberg 94 Privileged Communication MYCIN PROJECT Section 6.1.4 MYCIN 1st choice MYCIN 1st choice MYCIN 1st choice identical to an identical to or an idsntical to or an expert's 1st choice acceptable alternative acceptable alternative to an expert's 1st jud;ed by a majority choice of experts -----------------------------+------------------------- +-----------------------+ I I 1 ORGANISM 56.3% ! 75.6% I 81.8% i IDENTITY I I i M= 414 : N= 414 I M= 11 I -----------------------------+-------------------------+ -----------------------+ I i I ORGANISM 91.7% 1 NA I 100% I SIGNIFICANCE I t I N= 36 I I N= 4 I -----------------------------+-------------------------+-----------------------+ I ! I THERAPY 12% I I 75% I I 91% I SELECTION I ! I N= 99 I N= 99 I N= 11 I -----------------------------+---------------------------- f-----------------------+ I I I OVERALL 17.0% I 59.3% I 60 .O% I PERFORMANCE I I I I N= 135 ! N= 135 I N= 15 I 1 Table I Summary of nine experts' responses to MYCIN's performance on 15 cases B. Organism Identity For organism identity, the experts were asked to rate each of MYCIN's selections as exactly correct (they agreed that the organism was likely to be present), an acceptable alternative (they had not chosen that organism, but agreed it might be present), or an unacceptable choice (they disagreed with its selection). Since 11 of the cases were not contaminants, and there was a total of 46 organisms chosen by the system, with 9 experts rating each of those choices we have an N of 414 for the first two colu.mns and 11 for the third. In 564% of the instances the system's choices were identical to the experts', 75s of them were either identical or acceptable alternatives, and in 82% of the cases, its results were acceptable to a majority of t-he experts. In addition, the experts were asked to indicate which organisms they felt NYCIN had overlooked in its diagnosis. For the 11 non-contaminant cases, the experts indicated an average of only 0.35 organism identities that were overlooked by the system. In no case did a majority of experts feel that any particular organism had been overlooked, suggesting that even the 0.35 figure is a result of intra-expert variation. Privileged Communication 95 J. Lederberg Section 6.1.4 MYCIN PROJECT C. Organism Significance The first question on the evaluation for3 gave the expert a chance to indicate that he felt the patient did not need to be treated. The first column of the second row indicates the number of times the expert indicated no treatment was necessary for a case in which MYCIN also judged the organism to be a contaminant. (Tnere is no number in the second column since we did not ask about a %lose call" on whether or not to treat. In addition, the measurement is based only on the contaminant cases, since in man-/ of the cases where both MYCIN and the expert determined that treatment was necessary, they based that decision on different organisms. We felt that it would be misrepresentative to call these situations "agreements".) As the figures show, in only three out of 35 instances was there any disagreement with the system's decision on ;ihether or not to treat. D. Therapy Selection The expert was asked to select therapy for the organisms which he felt were likely to be present before looking at MYCi!i's therapy recommendation. He was then asked to judge MYCIN's choice of therapy for that patient. Since MYCIPJ was selecting therapy for the organisms which it felt were present (which may have differed from those chosen by the expert), this provides a fundamental comparison of performance - it compares therapy selection performance of the two when they are faced with the same clinical situation. This comparison is a difficult one to make, since it is complicated by the difficulty noted above, of variability in the experts' performance and the need to judge MYCIN with respect to that variability. Looking only at exact agreements (i.e., two identical therapies) produces the fig?lre in the first column, which indicates that 12% of the time MYCIN's recommendation was identical to that of an expert. Comparing each expert's therapy choice with the other 8 indicates that 35% of the time (N= 396) any pair of experts chose identical regimens. The experts were also asked to judge whether MYCIN's therapy was an acceptable alternative (if it was not identical to their own), producing the figure in the second column. This indicates that it was either identical, or they felt it was an acceptable alternative 75% of the time. (Unfortunately, we have no reliable way of judging the intra-expert variability here, without a second round of questionnaires which asked each expert to rate the acceptability of the other experts' choices.) [As an alternative, we have attempted to develop a measure of how "far apart" two non-identical regimens are. Sut the problem is difficult: for example, for gram negative rods with salmonella most likely, is gentamycin and chloraaphenicol "very different " from gentamycin and ampicillin? We have been working on a "drug metric" to solve this problem, attempting to base the "difference" between two drugs on factors like organism susceptibility, toxicity, and drug efficacy, but this work is still in progress.] The figure in the third column gives a crude overall measure of therapy selection performance, and indicates that in 91% (13 out of 11 cases), a majority of the experts rated MYCIN's regimen as either identical to their own or an acceptable alternative. J. Lederberg 96 Privileged Communication MYCIN PROJECT Section 6.1.4 [The evaluation form also asked each expert to choose a regimen for the organisms which WYCIN had selected. The intent here was to compare the system's performance against the expert when both were faced with the same set of organisms (rather than compared with the same clinical situation, as above). Unfortunately, inconsistent answers on the part of the experts indicated that they were not answering the question according to the instructions. It appeared that they were not able to suspend their own judgments about organism identity sufficiently to select a regimen based on MYCIW's organisms alone. For this reason, we believe the data to be unreliable, and have not included it here.] E. Overall Performance At t'ne end of each evaluation form, the expert was asked to rate the system's overall performance as either excellent, good, fair, or poor. The first two columns of the last row indicate that 17% of these evaluations were "excellentl', and almost 60% were either "excellent" or rlgoodfl (only 13% were rrpoorfl). In 605 of the cases (9 out of 15), a majority of the experts felt that MYCIN's overall performance was either 'fexcellentlV or 'lgoodff. F. Present Utility and Future Potential Finally, after completing the entire set of 15 patients, each expert was asked to rate MYCIN's present utility and future potential as a clinical tool and as an educational tool, rating it as having "considerable", 11some71, or rrnolr potential. The table below summarizes their response. Evaluation of Present Utility *lconsiderable'l "somet "none" -----------------------+---------------+ ---------------+---------------+ clinical tool I 11% I 67% I 22% I -----------------------+---------------+---------------+---------------+ educational tool I 11% ! 89% I I 0% ! ---------------------------------------+---------------+---------------+ Evaluation of Future Potential "considerable" %30meff "nonefl -----------------------+---------------+ ---------------*---------------+ clinical tool I 11% I 89% i 0% t -----------------------f------------------ o ???????????????????????????????? educational tool I 67% I 33% I 0% I I "'-"---------------------------------+---------------~ ----------------+ Table II Opinions of 9 experts on MYCIN's present utility and future potential To aid these evaluation efforts, we have also implemented a number of useful features in the system. For instance, MYCIN now keeps-continuing Privileged Communication 97 J. Lederberg Section 5.1.4 MYCIN PROJECT statistics of the use of rules in its knowledge base. This will help us to monitor its long term performance, to study the interrelationship between rules, and perhaps detect automatically any inconsistencies or gaps in the knowledge base. We have also designed and implemented a mechanism for "on-line" evaluation. At the end of each consultation, the system asks a few questions about the quality of its performance from the clinicians who are using it. This interchange will be brief to avoid being a burden to the user, but it is expected to represent an important addition to the other evaluation efforts. It will, for instance, make possible a new form of evaluation of the system. Rather than using a series of nprepackagedl' cases as was done in our initial evaluation, the next stage will be carried out using information entered at a terminal by the evaluator. The participating panel of experts will be selecting patients in areas covered by the MYCIN knowledge base, and will engage in a dialogue with the system about those patients. Following completion of the session, the on-line evaluation feature will ask questions about system performance, and the responses will be tabulated and evaluated on-line by appropriate biostatistical programs. Specific recommendations which may point out problem areas in the consultation will be reviewed by our staff. By this process we expect to be able to maintain a continuing evaluation of MYCIN's capabilities in various areas, and pinpoint specific areas where performance is suboptimal. MYCIN Project Publications THESES -- Davis R, Applications of meta level knowledge to the construction, maintenance, and use of large knowledge bases, Thesis: PhD in Computer Science, AI Memo 283, 304 pp, Stanford University, July 1976. Shortliffe E H, MYCIN: A rule-based computer program for advising physicians regarding Antimicrobial therapy selection, Thesis: Ph.D. in Medical Information Sciences, Stanford University, Stanford CA, 409 pages, October 1974. Also, Computer-Based Medical Consultations: MYCIN, American Elsevier, New York, 1976. PAPERS -- Buchanan B G, Davis R, Yu V, Cohen S N, Rule-based medical decision making by computer, Proc. MEDINFO 1977, to appear. Clancey W. Chronicler: an explanation system based on set-predicate representation of computational processes, submitted to 5th IJCAI. J. Lederberg 98 Privileged Communication XYCIN PROJECT Section h-1.4 Aikins J 3. Use of models in a rule-based consultation system, short paper submitted to 5th IJCAI. Davis R. Interactive transfer of expertise: acquisition of new inference rules, submitted to 5th IJCAI. Davis R. Knowledge acquisition in rule-based systems: knowledge about representations as a basis for system construction and maintenance, to appear in Pattern Directed Inference Systems, Waterman and Hayes-Roth (eds.), Acade.mic Press, in press. Also to be presented at Pattern Directed Inference Systems Workshop, Honolulu, May 1977. Davis R, Buchanan B G. Meta-level knowledge: overview and applications, submitted to 5th IJCAI, Cambridge, MA, August 1977. Davis R. A decision support system for medical diagnosis and therapy selection, Data Base (SIGBDP newsletter), 8 (Winter 1977) pp 58-72. Wraith S, Aikins J, Buchanan B G, Clancy W, Davis R, Fagan L, Scott A C, van Melle W, Yu V, Axline S, Cohen S, Computerized consultation system for selection of antimicrobial therapy, American Journal of Hospital Pharmacy, 33 (December 1976) pp 1304-1308 Scott A C, Clancey W, Davis R, Shortliffe E H, Explanation capabilities of knowledge based production systems, American Journal of Computational Linguistics, Microfiche 62, 1977. Also, HP? Memo 77-1, Stanford Computer Science Department, February 1977. Shortliffe E H, Davis R, Some considerations for the implementation of knowledge-based expert systems, SIGART Newsletter, 55:9-12, Decenber 1975. Davis R, Buchanan B, Shortliffe E H, Production rules as a representation for a knowledge-based consultation system, Artificial Intelligence, 8 (Spring 1977) pp 15-45. (Also, AI Memo 266, Stanford University, October 1975). Davis R, King J J, An overview of production systems, in Elcock and Michie (Eds.), Machine Intelligence 8: Machine Representations of Knowledge, John Wylie, to appear, 1377. (Also AI Memo 271, Stanford University, October 1975). Shortliffe E H, Judgmental knowledge as a basis for computer-assisted clinical decision making, Proceedings of the 1975 International Conference on Cybernetics and Society, pp 256-7, September 1975. Privileged Communication 99 J. Lederberg Section 6.1.4 MYCIN PROJECT Snortliffe E H, Axline S, Suchanan B G, Davis R, Cohen S, A computer-based approach to the promotion of rational clinical use of antimicrobials, in Gouveia, Tognoni and Van der Kleijn (Eds.), Clinical Pharmacy and Clinical Pharmacology, pp 25+274, Elseiver/North Holland Biomedical Press, 1976. E H Shortliffe, R Davis, S G Axline, B G Buchanan, C C Green, S N Cohen, Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN systen, Coaputers and Biomedical Research, 8:303-320 (August 1975). E H Shortliffe and B G Buchanan, A Model of Inexact Reasoning in Medicine, Mathematical Biosciences 23:351-379, 1975. Shortliffe E H, Rha!ne F S, Axline S G, Cohen S N, Buchanan B G, Davis R, Scott A C, Chavez-Pardo R, and van Melle W J MYCIN: A computer program providing antimicrobial therapy recommendations (abstract only). Presented at the 28th Annual Meeting, Western Society For Clinical Research, Carmel, CA, 6 Feb 1975. Clin. Res. 23:107a (1975). Reproduced in Clinical Medicine, p. 34, August 1975. Shortliffe E H MYCIN: A rule-based computer program for advising physicians regarding antimicrobial therapy selection (abstract only); Proceedings of the ACM National Congress (SIGBIO Session), p. 739, November 1974. Reproduced in Computing Reviews 16:331 (1975). E H Shortliffe, S G Axline, B G Buchanan, S ?i Cohen, Design considerations for a program to provide consultations in clinical therapeutics, Presented at San Diego Bionedical Sylapsium 1974 (February 6-9, 1974). E H Shortliffe, S G Axline, B G Buchanan, T C Merigan, S N Cohen. An artificial intelligence program to advise physicians regarding antimicrobial therapy, Computers and 3iocedical Research, 6 :544-560 (1973) * Articles About MYCIN "Which Antibiotic?" Emergency Medicine, January 1977, pp 152-162. J. Lederberg 100 Privileged Comaunication MYCIN PROJECT Section 6.1.4 Current Funding Mycin is currently in the last year of a three year grant, (I-IS-01544, Dr. Stanley Cohen, principal investigator) from the Bureau of Health Sciences Research and Evaluation. The grant is for $149,982, and expires May 30, 1977. Applications pending A two year renewal of HS-01544 has been submitted to begin June 1, 1977, for $140,000 (direct costs) for the first year. A site visit has been held and the proposal approved but a decision for funding is still pending. A grant from NSF (Dr. Bruce Buchanan, PI) has been approved for two years, to begin June 1, 197'7, for $50,000 a year (direct costs). A joint application (with Dr. Jon Heiser of UC Irvine) is currently pending with the Biomedical Engineering Division of NIH. The Stanford part of the grant (Dr. Bruce Buchanan, PI) requests a total of $146,751 over 3 years ($46,609 in the first year), to begin June 1, 1977. Dr. Heiser's budget requests $147,655 over 3 years ($46,423 in the first year), to begin July 1, 1977. A 5-year proposal to the Biotechnology Resources Program is being prepared for submission by June 1, 1977. II) Interactions with Sumex-Aim resource - __-- --__ Collaborations and medical use of programs Dr. Jon Heiser We have been working with Dr. Jon Heiser of the Department of Psychiatry of the University of California at Irvine, in an effort to create a consultant for the use of psychoactive drugs. We began by creating a version of Mycin that had all of the infectious disease knowledge removed from it, and showed Dr. Heiser how to build up the required base of knowledge about the new field. He has, with his students, developed a small, but fuznctional system that demonstrates encouraging performance on the task. Work has now begun in earnest to extend the competence of this pilot system, to produce a consultant with a useful level of performance. It is interesting to note that the explanation capabilities required no modification whatever, and worked in the new system exactly as designed for the original system, despite the change in domains. Privileged Communication 101 J. Lederberg Section 6.1.4 MYCIN PROJECT INTERNIST Project The Sumex computer has made possible a valuable interaction between researchers on the MYCIN project at Stanford University and those working on the INTERNIST project at the University of Pittsburgh. These researchers are studying the possible representations and uses for disease models in a medical diagnosis system. Both research groups have been able to run each others programs and to study the medical knowledge bases which are stored on the Sumex computer. Communication between project members has also been greatly facilitated through use of the Sumex system. Stanford Infectious Disease Faculty Dr. Victor Yu of our group has been actively soliciting the involvement of the Stanford ID faculty in the development and evaluation of IYycin. He recently presented the system to the faculty and fellows of the Department, and has been seeking ways to involve the system in the Department's educational activities. For instance, medical students under his supervision have used the system during their ID rotation, comparing its results and reasoning process with their own on problems encountered in patients on the wards. The Pulmonary Function Facility Members of the Mycin project have also b een collaborating with Dr. John Osborn and his co-workers of the Presbyterian Hospital/Pacific Medical Center in San Francisco on the development of a program to interpret the results of standard pulmonary function tests. The program is designed to perform a range of tasks, including: identifying the need to repeat tests because of poor patient effort; identifying the need for additional information in order to make a more definitive diagnosis; reporting and explaining the reasons for primary and secondary diagnoses and severity of any disease state; identifying the relation between diagnosis and any referral diagnosis; and interpreting any change from previous tests, or limitations on the interpretation because of the test methodology and the patient effort. Sharing with other projects Groups at Rutgers University, the University of Pittsburgh, Rochester University, and the University of Virginia t '4edical School have all been involved in varying degrees with running E4ycin and evaluating its perfOrmanCe. They have suggested to us improvements in its design, and stock of medical knowledge, and made useful contributions to its development. In addition, we have made use of the programs developed at both Rutgers and Pittsburgh. The former has been instructive to us in its handling of dynamically changing situations, while the latter has helped us to develop our own ideas about the modelling and use of prototypical descriptions of disease states. The Molgen group at Stanford has also prorited from much of our experience in acquiring knowledge and building large knowledge bases. *Several of their J. Lederberg 102 Privileged Communication MYCIN PROJECT Section 6.1.4 techniques for accumulating knowledge about genetics are based on extensions to ideas first suggested in some of our work. In all of these cases, the use of Sumex as a national resource has clearly been a critical factor in making possible this sort of interaction. Critique of resource services Local management of the existing resources has been carried out in exemplary fashion. The utility of the facilities has consistently increased, as a direct result of the staff's efforts to identify and respond to needs of the user community. They have actively sought out user comments on current and future services and developed programs to support the research work of the community. In particular, the numerous programs for file editing, searching, manipulation, and storage allocation have helped both in data and program management, and in making the best use of available disk storage. There are, however, additions to the existing resources that would help overcome shortcomings in the available services. In particular, we feel that the addition of more main memory to the system would be an important investment with a significant payoff. First, with the increasing size of the user community, the typical daytime load on the system has increased to the point where running anything but the smallest program requires substantial patience. Second, our project, like several others, is LISP-based, and uses a large address space. Such programs receive lower priority from the scheduler, and especially with the recently changed scheduling algorithm, our effective service level has decreased significantly. The addition of more main memory would ease both of these problems considerably for a number of users. The addition of nore disk space would also be an important improvement in the existing facilities. While it is typically true that disk usage can expand to meet the storage available, we feel that once again the growth of the user community has put a strain on the available resources. TsJe 'have made extensive use of the archiving facilities, and feel that additional disk space would contribute to the system's utility. As noted a moment ago, the recently revised scheduling algorithm has also made its impact felt. We have seen our effective service level on the system decrease, as compared to the amount of service we had been getting at a given load average. While we recognize the national scope of the Sumex charter, and the importance of providing adequate service to the whole community, there are a number of major projects located at Stanford. The majority of large projects are thus competing for the same share of the system. It seems unreasonable for, say, three sizable LISP program s to be competing for the same part of the machine, just because they are at Stanford, while a single remote user is receiving nearly all the remaining resource. We recognize the desirability of keeping Sumex a national resource, but wonder if there is a way this can be done without penalizing systems just because they originate at Stanford. Finally, there is a smaller scale project which would also make a substantive contribution to the utility of the resource. Currently a program called PUB is the major text formatting ("word processing") program in use. It Privileged Communication J. Lederberg Section 6.1.4 MYCIN PROJECT is something of an historical relic, and is quite large, not totally reliable, and rather difficult to use. It is remarkably powerful, but most users make relatively little use of its more impressive powers. Since preparing technical reports, progress reports, and thought-pieces on proposed or in-progress work are all an integral part of doing research, facilities that ease the task can make an important contribution to the progress of work. A new program, designed along the lines of PUB, but much smaller and of proven reliability, would be an important contribution to the research efforts of the community. It would require on the order of one man-year to create, but given the anticipated drain on system resources presented by the amount of technical writing done by the community, this investment would quickly be paid back many times over. III) Follow-on Long range project goals The long-term goals of our project center around further development of our ideas on computer-based medical consultants. We intend, for instance, to extend both the depth and breadth of the system's range of competence. The extension in breadth will be an important demonstration of the power of the approach we have taken, since the problem of scale is a traditional pitfall that has trapped a number of other efforts in AI. We believe that our techniques provide the basis for continued effective performance, even with a much larger knowledge base that handles a wider scope of medical problems. This can only be tested, of course, by actually enlarging the knowledge base and widening the program's scope. By extending the "depth" of the program's competence, we mean dissecting still further the concepts on which its judgments are based. The current system, for instance, asks the doctor if the patient is "febrile due to the infection". In practice, this is a difficult judgment to make, and it is precisely on such difficult judgmental issues that Xycin should be able to offer assistance. By asking our clinicians to specify how they decide that a patient is febrile due to an infection, we can break down this vague notion into a number of distinct decision rules. The resulting program will make fewer demands on the user, and hence will offer a more effective source of consultative advice. We also believe that the best hardware for many AI research efforts lies in the direction of independent minicomputers arranged as a satellite to a central system, and capable of running high level languages (like LISP). A second of our long-term goals, then, is to develop a version of our program capable of running on such a system. Since there are currently a number of efforts aimed at developing both high level languages for mini-machines, and minicomputer architectures capable of running high level languages, Sumex could benefit substantially fro:n this work if the AIM Committee begins now to plan to take advantage of these developments. We also plan to extend the generality of the system we have developed, to make it possible for experts in other medical (and medically-related) areas to J. Lederberg 104 Privileged Communication HYCIN PROJECT Section 6.1.4 use it as a framework for assembling their own set of decision rules, to create consultants for their own Specialties. We have already attempted several pilot studies along these lines (the work with Dr. Xeiser on psychopharmacology, and with Dr. Osborn on pulmonary function). Each of these has demonstrated to us a number of generalizations that our current techniques require. We plan to make these changes, and continue to develop a system usable by a wide range of specialists, as part of our interest in the art of building expert systems. A necessary parallel development to thi s will be improvements in the rule- based representation of knowledge and a better understanding of the process of clinical decision making. While our decision rules offer a number of advantages, we have also seen some drawbacks in them, and plan to work on overcoming the problems without losing the advantages they offer. Our present model of decision making under uncertainty is still elementary and intuitive -- further work is needed to make it more formal and ground it firmly in well understood principles. This will also facilitate work on other problem, such as checking the internal consistency of the entire set of rules. Justification O~lr project is concerned with a range of problems that are central to both medical care and AI research. Earlier sections of this report covered the significance of the specific problem of antibiotic misuse. More generally, the problem of medical decision making is one that has received much attention, and has not yet yielded to a definitive solution. The availability of computer-based advisors for difficult clinical problems would be a useful step in combatting the current imaldistribution of specialists. With network links to centralized machines, or mini-machines inexpensive enough to be exported as a unit, hospitals in outlying rural areas might have available a sophisticated source of medical advice. The development of computer-based consultants is a mainstream issue in AI research . Its specific goals are to produce expert performance on a "real worldfl problem, and to make that expertise available to users who might not normally be involved with computers. Producing a system that both offers high performance and presents a reasonable interface to the user means solving a difficult problem with a number of constraints. High performance alone is not enough, since the system must be usable by a computer-naive audience. This means more than simply reasonable I/O facilities, and implies the need for such things as the explanatory capabilities currently a part of Xycin. More generally, the issue of accumulating, representing, and using large stores of task-specific knowledge is an important thrust of current AI research. Ever since the failure of the original GPS-type approach to problem solving (in which problem solving power comes from a single, domain-independent paradigm), interest has been focussed on the use of large stores of domain-specific knowledge as a source of high performance. This has been a primary theme of the work on Hycin from the outset, and our efforts have produced a number of insights about the design and construction of such systems. We have emphasized, for instance, the importance of keeping a sharp distinction between the base of task- specific knowledge and the interpreter which uses that information to solve problems. This design pays off both by easing the task of building the knowledge Privileged Communication 105 J. Lederberg Section 6.1.4 M'ICIN PROJECT base, and by increasing the range of applicability of the underlying system (i.e., different knowledge bases can be "plugged in" to the same underlying system). Finally, a number of other projects have been "spun off" as a direct result of ours. The pulmonary function work and the work by Dr. Heiser's group are both outgrowths of Mycin, and have both begun to produce their own substantive results. Future resource goals As noted earlier, we see the development of minicomputers that run high level languages as an important future trend that will affect much of the work in AI. We believe it will be especially advantageous for Sumex to take advantage of these developments. Adding a small number of these minicomputers as satellites to the main system would present a number of important advances. First, many of the research efforts currently underway involve large, LISP-based programs that significantly impact the system load. By providing satellite machines to which those large systems could be shifted, the system load would lighten considerably and the large systems would themselves run much faster. Second, it would mean more efficient use of resources, since adding these satellite systems would require little or no additional tapes, disks, printers, etc. Finally, many projects are in a situation parallel to ours, in that work proceeds on two fronts simultaneously. One one hand, new ideas are being generated about how a program should work, or what tasks it might perform. These are implemented and tried out in a test version of the program. On the other hand, once those ideas prove practical, there is often an extensive period of development that requires a more stable version of the program. The architecture suggested here, of a main System with sateilite machines, offers an excellent environment for this work, since smaller test versions of a program can be used as a "proving ground" on the main machine, while the larger, stabilized versions are further developed by running them on the satellite machines. The sort of arrangement is most effective when transition between systems is almost invisible -- that is, when little or nothing need be done to shift from the central machine to a satellite. This is easiest to do when there are high- bandwidth data links betwen machines, and satellite machines capable of running the same programming language as the central machine. We believe it would be important to provide Sumex support for both the software as well as the hardware problems involved in creating this sort of environment. One effort in this direction (Mainsail) is currently underway, and parallel efforts at other locations are involved in prodilcing a version of LISP that will run on small machines. While there is no need to duplicate these latter efforts, we feel it would be important for Sumex to stay closely coupled to them, so that their results can easily and quickly be implemented here. Given the number of projects which could make significant use of these results, and the impact those projects currentl:y have on the system, we believe the investment in time and effort would pay off quite well. J. Lederberg 106 Privileged Communication i4YCIN PROJECT Section 6.1.4 References [?I Reiman H H, D'ambola J, The use and cost of antimicrobials in hospitals, Arch Environ Health, 13:631-636 (1966). [2] Kunin C M, et.al., Use of antibiotics: a brief exposition of the problem and some tentative solutions, Anns Int Med, 'j'9:555-560 (1973). [ 31 Sheckler W E, Bennett J V, Antibiotic usgae in seven community hospitals, J Amer 14ed ASSOC, 2133264-267 (1970). [4] Roberts A W, Visconti J A, The rational and irrational use of systemic antimicrobial drugs, Amer J !losp Pharm, 29 :825-93Q (1372). [5] Simmons H E, Stolley P D, This is medical progress ? Trends and consequences of antibiotic use in the United States, J Amer Med Assoc, 227: 1923-1026 (1974). C61 Kagan B M, Fanin S L, Bardie F, Spotlight on antimicrobial agents, JAMA, 226:306-310 (1973). Privileged Communication 137 J. Lederberg Section 6.1.5 PROTEIrJ STRUCTURE PROJECT 6.1.5 PROTEIM STRUCTURC PROJECT Protein Structure Hodeling Project Prof. J. Kraut and Dr. S. Freer (Chemistry, U. C. San Diego) and Prof. E. Feigenbaum and Dr. R. Engelmore (Computer Science, Stanford) I. Su%mary of research program -- ----- A. Technical goals The goals of the protein structure modeling project are to 1) identify critical tasks in protein structure elucidation which may benefit by the application of AI problem-solving techniques, and 2) design and implement programs to perform those tasks. We have identified two principal areas which have both practical and theoretical interest to both protein crystallographers and computer scientists working in AI. The first is the problem of interpreting a three-dimensional electron density map. The second is the problem of determining a plausible structure in the absence of phase information normally inferred from experimental isomorphous raplacernent ciata. Current emphasis is on the implementation of a program for interpreting electron density (e.d.) maps. B. Medical relevance and collaboration The biomedical relevance of protein crystallography has been well stated in a recent textbook on the subject (Blundell K Johnson, Protein Crystallography, Academic Press, 1976): "Protein Crystallography is the application of the techniques of X-ray diffraction . . . to crystals of one of the most important classes of biological molecules, the proteins. . . . It is known that the diverse biological functions of these complex molecules are determined by and are dependent upon their three-dimensional structure and upon the ability of these structures to respond to other molecules by changes in shape. At the present time X-ray analysis of protein crystals forms the only method by which detailed structural information (in terms of the spatial coordinates of the atoms) may be obtained. The results of these analyses have provided firm structural evidence which, together with biochemical and chemical studies, immediately sug.%ests proposals concerning the molecular basis of biological activity." The project is a collaboration of computer scientists at Stanford University and crystallographers at the University of California at San Diego (under the direction of Prof. Joseph Kraut) and at Oak Ridge IJationsl Laboratories (Dr. Carroll Johnson). J. Lederberg 108 Privileged Communication PROTEIN STRUCTURE PROJECT Section 6.1.5 C. Progress summary During the past year we have been designing and implementing a system of programs for interpreting three-dimensional e.d. maps. Progress has been made by attacking the problem from two directions: working upward from the primary data (i.e. the array of e.d. values) to higher level symbolic abstractions, and brorking downward from the given amino acid sequence and other experimental information to generate candidate structures which can then be confirmed by the abstracted data. In the "bottom-up" area of research we have developed and implemented programs for analyzing topological features of the skeletonized e.d. map in terms of protein structural elements (e.g., side chains, chain ends, bridges, etc.), for finding local maxima, and, recently for generating a critical point network, i.e. a three-dimensional spanning tree which connects all critical points (peaks, saddle points) found in the map. In the "top-down" area we have designed and implemented, in INTERLISP, a structure inference program which generates structural hypotheses at several levels of detail. At present the program can infer, from the amino acid sequence and other chemical information, and the symbolic abstractions of the e.d. map, the location of heavy atoms, cofactors and chain ends. Those features provide toeholds, i.e. islands of certainty, from which additional structure is inferred by extension. Work is currently in progress on identification of the main chain, disambiguation of nultiply connected regions and classification of side chain regions. The system under development is knowledge-based. Both the corpus of knowledge of the task domain and the problem-solving strategy knowledge are incorporated as production-like rules. D. List of Publications I) Robert S. Engelmore and H. Penny Nii, "A Knowledge-Based System for the Interpretation of Protein X-Ray Crystallographic Data," Heuristic Programming Project Memo HPP-77-2, January, 1977. (Alternate identification: STAN-CS-77- 589 1 2) E.A. Feigenbaum, R.S. Engelmore, C.K. Johnson, "A Correlation Bet-&en Crystallographic Computin, n and Artificial Intelligence," in Acta Crystallographica, A33: 13, (1977). (Alternate identification: HPP-77-25) E. Funding status The project recently received a renewal of its funding from the National Science Foundation. The new research period began on May 1, 1977, and is for a two year period at a funding level of $75,000 per year. No other applications are pending. Privileged Communication 103 J. Lederberz 3ection 6.1.5 PROTEIlJ S'rRUCTUiiE PROJECT II. Interaetion with the SUHEX-AIM resource --- - -- --- A. Collaborations The protein structure modeling project has been a collaborative effort since its inception, involving co-workers at Stanford and UCSD (and, more recently, at Oak Ridge). The SUMEX facility has provided a focus for the communication of knowledge, programs and data. ?iithout the special facilities provided by SU@X the research would be seriously impeded. Computer networking has been especially effective in facilitating the transfer of information. For example, the more traditional computational analyses of the UCSD crystallographic data are made at the CDC 7600 facility at Berkeley. As the processed data, specifically the e.d maps and their Fourier transforms, become available, they are transferred to SUHEX via the FTP facility of the ARPA net, with a minimum of fuss. (Unfortunately, other methods of data transfer are often necessary as well -- see below.) Programs developed at SUeaX, or transferred to SUMEX from ot!ler laboratories, are shared directly among the collaborators. Indeed, uith some of the programs which have originated at UCSD and elsewhere, our off-campus collaborators frequently find it easier to use the SUFIEX versions because of the interactive computing environment and ease of access. Advice, progress reports, new ideas , general information, etc. are commlA?icated via the message and/or bulletin board facilities. B. Interaction with other SUHEX-AIM projects Our interactions with other SUMEX-AI>i projects have been mostly in the form of personal contacts. We have strong ties to the DENDRAL, Meta-DENDRAL and MOLGW projects and keep abreast of research in those areas on a regular basis through informal discussions. The SU%X-Air4 workshop in June, 1976 provided an excellent opportunity to survey all the projects in the community. Common research themas _ , e.g. knowledge-based systems, as well as alternate problem- solving methodologies were particularly valuable to share. (That workshop was very likely the most significant conference for applied AI to be held in 1976.) C. Critique of Resource services On the uhole the services provided by SUI4EX have been exceilel?t, considering the large demand on its resources. gith the important exceptions of high peaks in the weekday prime-time load average, the ratio of CPU time to total wait time during program execution is usually acceptable. The facility provides a wide spectrum of computing services which are genuinely useful to our project -- nessaze handling, file management, Interlisp, Portrail and text editors come immediately to mind. !ioreover, the staff, particularly the operators, are to be commended for their willingness to help solve special problems (e-g., reading tapes) or providing extra service (e.g., and i,m.mediate retrieval of an archived file). Such cooperative behavior is rare in computer zenters. A serious fault in the system is the lack of reliable tape drives, and the paucity of the present software for handling tape files. iquch of our data from the outside world is re ceived on magnetic tape, and almost never in the unusual J. Lederberg 110 Privileged Communication PROTEIN STRUCTURE PROJEZT Section 6.1.5 PDP-10 format. He urge that the existing tape drives be replace:!, and software be provided to facilitate the input of data in non-standard formats. (At the present time there is not even a program to provide a byte-by-byte dump when all else fails.) III. Use of SUMEX durinp the follow-on grant period (d/78 - 7/83) --vs. A. Long-range goals Our current research grant extends through April, 1979. Dlzing that time we intend to bring the structure modeling syste+n to a level of perforaanca that permits reliable qualitative interpretation of high resolution e.d. maps, derived from real data and a correct amino acid sequence. We also plan to exploit the flexibility of the rule-based control structure to permit investigation of alternate problem-solving strategies and modes of explanation of the program's reasoning steps. Beyond the next two years, emphasis will be placed on expanding a r-xl generalizing the system to relax t'l i e constraiats of resolution af1.l 3ccilracy in tne input data. B. Justification for continued use of SU;$% The biomedical relevance of the protein structure modeling project, coupled with the need for building a computational system with a significant component of synbolic inference, qualifies the project as an AIM-relevant endeavor. SUMEX provides an excellent computing environment for creating and debugging pro~~ra.4~s lia a variety of languages), for sharing and distributi:lz info+nation among geographically dispersed co-workers, and for keeping up with current research in other AIM areas. Our project is clearly too small to justify an independent computing facility, and other large computer centers that are conveniently accessible do not fulfill our requisites. Consequently SUYEX has been and hopefully will continue to be an integral research tool in this project. 1 b. Comments and suggestions Two improvements to the system which, though not critical, would appreciably upgrade the service provided: 1. Connection of SUNEX to a non-military network which permits file transfer at a reasonably high rate (at least 480~) baud). The restrictions ilnposed on the use of the ARPA network prohibit using it to trans,nit iarzt? ;)-~o~ra~fi .%ic1/or .fsta files between SU?L?X and the UCSD computing Pa.?ilitiis. The availability of such a connection would, for example, permit us to use their E&S interactive graphics system to display and visually examine the structures hypothesized by our automated modeling system. 2. Addition of 255K of main memory, to give more rapid response during the peak hours. This would seem to be a natural extension to the system, to complement t h,e second KI-lC) installed last year, and would more fully realize t'ne potential of the second CPU. Privileged Communication 111 J. Lederberg Section 6.2 NATIONAL AIM PROJECTS 6.2 NATIONAL AIM PROJECTS The following group of projects is formally approved for access to the AI?4 aliquot of the S'JKEX-AM resource. Their access is based on review by the AI14 Advisory Group and approval by the AI;1 Exeoutivs Co.KLlitt~se. 112 Section 6.2.1 6.2.1 ACQUISITIOti OF COGMITIVC PROCEDIJRES (ACT) ---- -- Acquisition of Cognitive Procedures (ACT) Dr. John Anderson Yale University (Grant NIMH MH29353 $25,000 this year) (Contract ONR NO014-77-6-9242 $74,030 this year) I. Summary of Research Prozra!n --- -._- A. Technical goals: To develop a production system that will serve as an interpreter of the active portion of an associative network. To model a range of cognitive tasks including memory tasks, inferential reasoning, language processing, and problem solving. To develop an induction system capable of acquiring cognitive procedures with a special eaphasis on language acquisition. a. Medical relevance and collaboration: 1. The ACT model is a general model of cognition. It provides a useful model of the development of and performance of the sorts of decision making that occur in medicine. 2. The ACT model also represents basic work in AI. It is in part an attempt to develop a self-organizing intelligent system. As such it is relevant to the goal of development of intelligent artificial aids in medicine. We have been evolving a collaborative relationship srit:h Dr. .Jznes G`raefio and Allan Lesgold at the University of Pittsburgh. They are applying ACT to modeling the acquisition of reading and problem solving skills. We plan to make ACT a guest system within SUMEX. ACT is currently at the state where it can be shipped to other INTERLISP facilities. We have received a number of inquiries abotit the ACT system. ACT is a system in a conti_nu31 state of development ;,ut LJ% geriodicslly freeze versions of ACT which we snaintsin and make available to the national AI community. n L . Progress and acconplish*i?enls: ACT provides a uniform set of theoretical mechanisms to model such aspects of human cognition as memory, inferential processes, language processing, and problem solving. ACT's knowledge base consists of two components, a propositional co:ngonent scld a procedural com;x~ae:lt. T>1? propositional r,o~npow~lt i; provided by an associative network encodin; 2 SS~ 07 fasts !r(noyJll about the urorld . This provides the system's sexantic me-nary. The prxedurnl coaponent Privileged Communication 113 J. Lederberg Section 6.2.1 ACQUISITION OF COGt~JITIVE PROCEDURES (ACT) consists of a set of productions which operate on the associative network. ACT's production system is considerably different than many of the other currently available systems (e.g., Newell's Psi;). Tns e differences hare been introduced in order to create a system that will operate on an associative nzttiork and in order to accurately model certain aspects of hu:aan cognition. A small portion of the semantic network is active. at any point in time. Productions can only inspect that portion of th e network which is active at the particular time. This restriction to the active portion of the netrqork provides a means to focus the ACT syste,n in a large data base of facts. Activation can spread down network paths from active nodes to activate new nodes nnrl links. To prevent activation from growirls continuously there is a dampening process whic:l periodically deactivates all but a select few nodes. The condition of a production specifies that certain features be true of the active portion of the network. The action of a production specifies that certain changes be made to the network. Each production can be conceived of as an independent :'demon.!' Its gllrpose is to see if the network confisur?tion specified in its co:nlition is s.stisfied in t'ne active portion. If it is, the production wili e_: resource, we have decided not to allow extensive use of ACT by other researchers through our SUGX account. We feel that extensive use of the ACT system in SUP~X by another researcher must have the status of an independent project and must be able to justify independently its use of the SUNEX-AIM resource. B. Justification for continued use of SU:$EX: We feel that the justification for our use of SiJ;K~ has only bsen strengthened since the time of our original appiicatiorl for user status. The project meets a number of criteria for SUNEX relevance: Project support comes from NIi'W . The project is concerned with cognitive modeling which is a SUNEX goal. The project is also developing an AI tool which can be used to help automate various medically-relevant tasks. We also think we are the type of need that the SU?lSX facility was designed to iqeet. That is, we do not have nearly as powerful computing facilities local at Yale; ~13 are non-local user; we are using +.. ad?lZX ar; a base for collaborating with scientists in other parts of the country; and we are trying to develop a system that will be of general use. C. Comments and suggestions for future resource goals: We would, of course, be delighted if the computational capacity of the SUNEX facility could be increased. We suffer nost severly with the file space limitation. The other limitation is the slowness of the Sy.ste:n at peak hours. This problem is perhaps less pievous for us than Stanford-based users because of our ability to use morning hours. We do not feel any urgent need for development of new software. Our work is growing to such a size that we would find it useful to have a local AHP&JET tip. We are currently discussing this possibility with our OiJR officials. Such a tip might be justifiabl? =iire.rl additional needs 04 other AI people at Yale. The consequence of slle:l a 'iI? for the future p'lanflin,; of .SU?VZX resources is that we would then change oil? access to 3UZ42X from the PY%VET to the RRP&4ET, thus relieving S:J23EX of the need to support our TYMS'rIARE costs. Privileged Comnunication 117 J. Lederberg Section 6.2.2 CHEXICAL SYNTHESIS PROJECT (SECS) 6.2.2 GIEMICAL SYNTHESIS PROJECT LSECS)- -- ------ -_I ___ W. Todd jl'ipke Department of Chemistry Ucliversity of Zalifornia at Santa Cruz I. Sucmry of Kesearc'h P-rogram - -- A. Technical Goals. The lon g range goal of this project is to develop the logical principles of molecular construction and to use these in developing practical computer programs to assist investigators in designing stereospecific syntheses of complex bio- organic molecules. Our specific goals this past year focused on improvement of the library of chemical transforms, co!apletion of the perception of molecuiar symmetry and integrating the use of symmetry information throughout SECS including the strategy module. We also wanted to improve the execution speed of SECS, and the speed of graphical interaction over remote communication lines. We planned to simplify the program from the user's viewpoint by including automatic file failsafing, improvement of HELP commands, and non-fatal handling; of all errors, as well as production of user's manuals for operation of the program and the writing of chemical transforms. Additionally 'r1e intended to initiate applications of SECS to the areas of biosynthesis and metabolism of compounds, as well as phosphorus chemistry. Finally we hoped to improve the strategic constraints and controls that guide SECS in growing a synthesis tree. B. Medical Rale-vance and Collaboration. The development of new drugs and the study of how drug structure i3 related to biological activity depends upon the chemist's ability to synthesize new molecules as well as his ability to modify existing structures, e.g., incorporating isotopic labels into bioaolecular substrates. The Simulation and Evaluation of Chemical Synthesis (SECS) project aims at assisting the chemist in designing stereospecific syntheses of biologically important tnolecules. Tile advantages of -this computer approach over a manual approaches are manyfold: 1) :2;r'e:ltCC .speed in designing a synthesis; 2) freezlo:n frcj.2 bias of !>ast experience and past solutions; 3) thorough consideration of al . 1 cossible syntheses using a more extensive library of chemical reactions than any individual person can remember; 4) greater capability of the computer to deal with the many structures which result; and 6) capability of computer to see molecules in graph theoretical sense, free from bias of 2-D projection. SECS was designed to be able to apply any kin3 Of chemical kransfOriil3tiorl, &lid because of this generality we see SECS finding application in biogenesis and metabolism (see section II A below). The objective of using SECS in biogenesis is to predict possible biogenetic pathways for a given natural product and also J. Lederberg 118 Privileged Communication Z3Z:'IICAL SYNTHESIS PROJECT (SZCS) Section 5.2.2 to predict related compound s which might also co-occur in nature. This can be a great aid in searc'hing for new natural products and in structure elucidation. The objective of using SECS in metabolism is to predict the plausible metabolites of a given xenobiotic in order that they may be analyzed for possible carcinogenicity. Metabolism research may also find this useful in the identification of netsbolites in that it suggests wh.at to look for, and in tne identification of possible metabolic patilaays connecting 7 . B a :flei;;i;):> L L OIL? to -1 xeaobiotic. C. Progress and Accomplishments. RESEARCH ENVIROWENT: At the University of California, Santa Crux, we have a ST-40 graphics terminal connected to the SUMEX-AIM resource by a 1200 baud leased line and a TI 725 thermal printin g teletype connected via TYMAZT at 3QO baud. rJCSC has only a small IB$i 370/I 45 and a PDP-11/45 (limit of 12 K words per user) available, both of which are unsuitable for this ra5earch I Fraa July until December our research group had to occupy temporary space &iring r:afi3vdtiorl, Ijut is now finally in permanent space in Thimann Laboratories i;here we have close collaboration with other organic chemists. CHEMICAL TRANSFORMS: The library of chemical transforms has been reorganized and reevaluated during the past year by iAs. Dolata, a student of Professor D.A. Evans of Cal Tech. New reactions were added and the scope and limitations of others idere updated and leadin .g rerecenzes pracided. Additionally, Merck, S'narp, and Dohme Research Laboratories pioovi3ed revisions of inany transforns which a group of 25 synthetic chemists had carefully researched. SYMETRY: An efficient algorithm for recognizing molecular symmetry was developed last year. This year that algorithm has been tested against all possible molecular point groups and a few problems which developed were corrected. Tne algorithm has been docwneate3. and initial studies besurl nn actually determining the point group of a Irloleclule. The symmetry groilp is nor{ utilized in conjunction Ki.th tha synaetry of a chemical transform so the transform is applied in all possible unique ways, to generate a non-redundant set of precursors. This symmetry of course takes into account stereochemistry of saturated centers and double bonds. We have surveyed literature syntheses for examples of existing heuristics based on symmetry w?icl? can be used for automatically generating high level strategies. This information ?;as never been pulled together before and should make an interesting contribution also to organic synthesis. STRATGIC CO!JTROL: Last year we began developing an implementation of strategic control for SECS, and a simple language for expressing strategies independent of chemical transforms. Since these strategies contain expressions v!hiCh refer to the molecular stru,n'cure, it das also necessary to incorporate sy.n;1etry here too. For exanyle, if a pnrticjular bond is ~l?si~nat.e:J d.s :;tratezic :;I break, but a transforn brs:;ks another boLll, the? st?ate'1 733 2 3:&w, including the ability to recognize enantio?er.s. T:la A'LC35`4 lan~i;l?+e fsr r~+3raaen'cing chenical transforms was extended to facilitate manipulation or^ T9B'S) including changes from trigonal and tetrahedral configurations to square base pyramid and TBP. Queries may deal with apicophilicity, and axial or equatorial orientation. The fine details of phosphorus chznistry such as thz fact that groups entering or leavin g the phosphorus coordination sphere nor;n,ally do so from the apical position. Pseudo rotatioil, agi.~Opilil-ir:i;.y , 3 1~1 3tcaiLq .:::icyzy at-<3 23~lsidered in evaluating th3 si;aSla T3P eoriQiSuratio~n:s aqcl ln z:h::c't;ing ;";,r i!lzalt.i.:sf .3trustures. .A libt-ary of phosphorus tciie:,i?is';ry is 113;~ b:i'3g p?!2p5xrei irl collaboration urith a group at the University of Stol-;brJtir:T -4, Frm::e. C3XPUTER-AIDE9 ELJCIDATiOti 0;; BIOGE!\lEIiC PATH211IS: Altnou3h 9 great amount of effort has been spent on various areas of biogenesis, there have been few attempts to develop general techniques for the elucidation of biogenetic schemes. As a result, the formulation of biogenetic schemes has often been criticized for its lac'k of rii;or atid z-xplicit criteria. Our approach is to dsvelop general techniques which lead to t'ne postulation OP platusibi~ bio,`?netia !)ath~zys, usin,: A th2 SEC.2 as an aide icl obtaining and anslyzing solutio2.s k.3 tki.3 co q,91ex pr~:);.~~. it is our hope this application of computer pro'ble!a s121$i.nz t;?;::l'li,~,1;?.3 !!i.l; IN; ,I-ily un~ovsr ne;; ways of recognizing aa evaluating bio~metic pathways 'out also provide added support to deductions made from biogenetic schemes, such as the generality of a scheme which may be tested in only a few species. With the proper input information and goals well defined there may be explicit rlules to suide the cile:niu', lie plausible biogenetic patiuays for a particular natural product. Unfortunately, the v3st mjority of sollltiorls to c. L.II& problem are deter;nifl3:1 by a coqi)ai;;lr 3.7- prc~~i:l.;~,i CHEMICAL SYMTHESIS PROJECT (SECS) Section 6.2.2 chemist's ability to consider the most important rules involved and his unique set of experience-based prejudices. There may be some means to represent and utilize all of the known relevant rules, data and possibly even experience-based prejudices to arrive at the best plausible pathways. The most precise method for representing, developin g and testing such a theory is in the form of a computer program. To implement such a computer program, known rules and constraints must be clearly defined, then those that are applicable can be applied at each step of the analysis toward the desired goal. This will keep the solution pathways 1oSically pure and insure that all alternatives which satisfy the rules and constraints are considered. This guarantee of completeness simply can not be imade using hand analysis. A new reaction library containing biogenetic transformations have been written. After inputting a natural product the program will apply the biogenetic transforms which fit the natural product. This generates a set of plausible biogenetic precursors to the target natural product. By continuing this process with the precursors generated, the plausible biogenetic pathtiays for the natural product can be discovered. The structures of marine natural products were entered into the program and the plausible biogenetic pathways for these compounds were generated and analysed. Biogenetic pathways which had been proposed in the literature were anong the pathlqa-ys discovered, as were other plausible pathways which would now have to be considered. The success we attained in this research effort verified the applicability of the SECS program as ai? aid in the analysis of metabolic pathways. COMPUTER-AIDED PREDICTION OF METABOLITES FOR CARCI>1OSENICITY STUDIES: S/e have initiated a research project in collaboration with the Chemical Carcinogenesis group at the National Cancer Institute. The objective of this research is to establish a computer program by which a biochemist or metabolism expert can explore the metabolism of a chemical compound. The investigator enters the substrate molecule by interacting with an input and structure editing module. Then the program will apply the biological transforms which Irfitl' the structure, taking into consideration all the context information (2-D, 3-0, and electronic) available about the transform and all perceived information about the structure. This will generate a set of metabolites which are one step away from the substrate structure. The metabolites will b e ranked according to expected probability or yield. The exact parameters which should be monitored will be determined during the course of this research. An evaluation module may then screen these metabolites according to criteria specified by the investigator. Duplicate aetabolites arising from different pathways will be labelled to indicate that fact. Finally the investigator will be showrl the set of metabolites together with data about the transform which produced each one and the values of the parameters being monitored. The invest,igator may select one metabolite for further metabolism or may request that all be processed for a specified number of steps. In this way a "tree" of metabolites is produced and displayed. The entire state of the user's tree may be saved to permit continuation of the analysis at another time. Exploration of the metabolism tree will be predoainat.ely guided interactively by Privileged Communication 121 J. Lederberg Section 5.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) the expert investigator. We feel that at this stage of development of the field of metabolism and carcinogenicity that interactive guidance by the expert is necessary. There are many areas where the theory is very thin and a given biological transformation may have been observed for only a few substrates. When this transform is applied to a new substrate, some unrealistic metabolites may be generated owing to the deficiency of contextual information and constraints. An expert is necessary to prune the tree and prevent the automatic processing of those unreasonable intermediates. It is much more efficient for the expert to do this pruning as the tree is being grown, rather than later after an enormous tree has been completed. At some point either during tree generation or at the end, the aetabolites will be passed to another program which will identify those metabolites which are identical or tfsimilaril to known carcinogens. Those will be so marked in the tree. Presently, the major task is the aquisition of the metabolism knowledge base, i.e. the writing of the transformation library to be utilized. Metabolism experts at the tiational Institute of Health are gleaning this information from both their own research and the metabolism literature. This information will be encoded and the first testing of this new appiication for the SECS program will begin in June 137'7. E. Funding Status. Sandoz Unrestricted Grant to support Computer Synthesis $2500 National Cancer Institute Contract X01-CP-75816 r'Computer-Aided Prediction of Metabolites for Carcinogenicity Studies" $56,328 for 18 months. Proposal RR-01953 submitted 1 March 1375 to Division of Research Resources, lVResource-Related Research: Biomolecular Synthesis", $227,816 for 3 years, approved 1 Ott 76, but still awaiting funding. Note : iJere it not for the leased line and computer access granted to us by SLJ?GX, the entire SECS project would not have been able to continue for the past 18 months. D*. Current List of Project Publications W.T. Wipke and P. Gund, "Simulation and Evaluation of Chemical Synthesis. Congestion: A Conformation Dependent Function of Steric Environment at a Reaction Center. Application with Torsional Term 1 s to Stereoselectivity of tiucleophilic Additions to Ketones," J. Am. Chem. Sot., 98, 8107(1975). d.T. Nipke, G, Smith and H. Draun, lfSECS-Simlulation and Evaluation of Chemical Syntheses: Strategy and Planning, If AC3 Sympositim Proceedings, 1977. W.T. h'ipke, Computer Planning of Research in Organic Chemistry, Proceedings of the Third International Symposium on Computers in Chenical Education, Research , and Technology, Caracas, Venezuela, 1376. J. Lederberg 122 Privileged Communication CHEMICAL SYNTHESIS PROJECT (SECS) Section 6.2.2 S-A. Godleski, P.v.R Schleyer, E. Gsawa, and 'tJ.T. i\r'ipke, "The Systematic Prediction of the Most Stable Neutral Hydrocarbon Isomer," J. Am. Chem. me., 99, OOOO( 1977). F. Choplin, R. Marc, G. Kaufmann, and W.T. :?ipke, "Computer Design of Synthesis in Phosphorus Chemistry. Automatic Treatment of Stereochemistry," J. Am. Chem. Sot., 99, OOOO(1977). Manuals: SECS Users Manual, June 1976. SECS Users Guide, Aug. 24, 1'376. ALCHEM Tutorial, Sep 21, 1975. II. Interactions with SUMEX-AIM Resource 1_1_--- - A. Examples of Collaborations and Medical use of Programs via SUMEX. SECS is available in the GUEST area of SUMEX and has been accessed experimentally by many others as well. Professor R. V. Stevens (UCLA) explored some syntheses of lycapodine while visiting Santa Cruz and as a result has requested UCLA to obtain a graphics terminal so he and others ~5 UCLA can access SECS via SUMEX. Professor W. G. Dauben's group (Berkeley) has utilized the SECS model builder on SUMEX is now extending the capabilities of that module of SECS. Mr . Mel Spann of the National Library of Medicine toxicology program is collaborating with us in developing a metabolism library for the metabolism of catechol amines. Also collaborating with us on metabolism are Drs. Ted Gram from Guarino's lab, Harry Gelboin, Dhiren Thakken and Aarukiko Hagi from Jerina's lab, Lance Pohl from Gillette's lab, Sidney Nelson from Witchell's lab, Lionel Poirier from Weisburger's lab, and Ken Chu and Sidney Siegel all of whom are from the National Cancer Institute. Dr. Steve Heller of the EPA and Dr. G.A. Milne of the National Heart and Lung Institute have expressed interest in putting SECS on the Cyphernetics network as a part of the NIH chemical information system. Restrictions on the allowed core image on that system nave so far held up tne negotiations. For the past two years SECS has been available over TELENET from First Data Corporation and has been accessed by industry: Squibb; tilerck, Sharp and Dohae; Pfize; Searle; Ledcrle Labs; F1'4C; and recently 3rd Corporation and Stauffer. Dr. Baryl Do;niny of Fizer recently presented a paper before the Pharmaceutical Manufacturer's Association entitled I'SECS and the Information Scientistl' in which he describes his experiences with SECS, including an example where a synthetic chemist was having difficulty with a particular synthesis, he then went to SECS for possible solutions. SECS suggested another route as being better and indeed that is what he found when he tried it later in the lab. The availability of SECS on SUMEX-AI:< has also serve3 health-related research at the University of California, Santa Cruz. ilodel building using the SECS model builder is being performed for Professor Edward Dratz (UCSC) to generate conformations of fatty acids isolated from visual membranes ("Structure Privileged Communication 123 J. Lederberg Section 6.2.2 CHEKICAL SYNTHESIS PROJECT (SECS) and Function of Visual photoreceptors ,I' EIOOI~~), and for Professor Howard Wang (UCSC) to study how conformations of steroids may affect the local anesthetic - membrane interaction ("Role of Membrane Proteins in Local Anesthetic Action," GM22242) . We have assisted Professor J. E. MciQrry in his synthetic work towards Aphidicholine and Digitoxigenin by using t.h e model builder for predicting possible reaction pathways. An example is given below, where the conformation of the epoxy-ylide was calculated along with the strain energies of the two possible closure products. C / \ / \ C C ! ! \ ! ! 0 ! ! / +P-R C C / 3 \ / / \ / - c C ! \ ! \ c===o \ / \ / C Utilizing the SECS model builder, we have shown that attack on the epoxide to form the. fused system should be much more favorable then attack to form the bicycle compound. Similar studies have been undertaken to predict the stereochemistry resulting from the acid catalyzed cyclization of !4c?4urry's Digitoxigenin precursor (HL-18118 "Total Synthesis of Cardiac Aglycones.lf): application of SECS using a special library of cationic sigmatropic rearrangement transforms generated the possible products which facilitated identification of some of the side products in the early cyclization experiments. G/e have also collaborated in the biogenesis work with Professor Phil Crews (UCSC) in marine natural product biogenesis. Dr. iJipke has also used several SUMEX programs such as COKEN in his course on Computers and Information Processing in Chemistry. a. Examples of Sharing, Contacts and Cross-fertilization with other SUMEX-AIN projects. In collaboration with Dr. Ray Carhart and Dr. Dennis Smith of the DENDRAL/CONXi1 Project, a Computers in Chemistry Workshop was held at U.C. Santa Cruz on the weekend prior to the Fall 1975 American Chemical Society National Meeting held in San Francisco. The workshop attracted participants representing all parts of the chemical community, academia, industry and government. Morning lecture/discussion sessions introduced the SECS and CONr;E?? programs running on J. Lederberg 124 Privileged Communication CHEMICAL SYNTHESIS PROJECT (SECS) Section 5.2.2 SUMEX and the afternoon and evening sessions allowed "hands-on" experience for the participants. The response of the workshop participants was a very positive one with many participants showing so much interest that future collaboration and/or use of the powerful non-numerical computing tools available on SUMEX was discussed. The SECS project has held joint research group meetings at Stanford with the DEiJDRAL and AI #groups to discuss common problems and research goals. This has been very rewarding since the groups are complementary in orientation. These joint meetings also let the members meet in person after having met on-line on the network. Last year's AIM Conference at Rutgers was also a valuable experience, which allowed us to meet people interested in similar problems in different disciplines. It was particularly useful to have the opportunity to talk with experts designing new languages for knowledge representation and to hear them compare their systems. C. Critique of Resource Services. k/e find the SUNEX-Ali4 network very well human engineered. The ability to leave messages on the network, and to LINK to other users on-line for advice has been extremely useful to us since we have only the network to keep us educated about what is changing on the system, etc. The fact that we have been able to get productive research accomplished remote from Stanford speaks well for the SUNEX-AIM concept. The SECS project finds the SU."IEX-AIM staff and community extremely helpful, and anxious to extend themselves to meet olur needs. SUPIEX provided a leased line and modems to us and provided TYMNET access as well. i?ere it not for SUfGX, this research effort would have perished since there is no adequate computer facility on the UCSC campus or even in the UC system. The only problems we have experienced are 1) until recently we were short of disk space, and 2) response time during the day can get pretty bad at times, particularly when using interactive graphics, so sonsequently most interactive graphics work is done at off hours. Basically we have found that SUi4BX-AIW provides a productive and scientifically stimulating environment and we are th`ankful that we are able to access the resource and participate in its activities. III. Follow-on SU&EX Grant Period (8/78-7/83) A. Long-range User Project Goals and Plans: Over this period of time the SECS project will continue research aimed at synthesis design and planning. Areas of interest include the formation of high level plans to guide the detailed chemical analysis, the capability for depth- first analysis, the evaluation of proposed synthetic path;lsys by forward simulation, and oidirectional search from target to key intermediate. At some time during this period the SECS program should be reimplemented in ZIAIKAIL to allow renovation of the SECS control structure and alloz more machine Privileged Communication 125 J. Lederberr: Section 6.2.2 CHEMICAL SYNTHESIS 'PROJECT (SE=) independence. We also hope to incorporate an explanation system to justify the decisions made by the program, which we feel is important for the same reasons MYCIN needed this capability. A new model builder will also be implemented to increase the speed and generality of 3-D model building. The metabolism project development will parallel the SECS project, but has special requirements for ALCHEM and aromatic chemistry, as well as for a pattern recognition module. A major problem here is how to develop and maintain such a complicated data base on metabolism. We expect to benefit from the experience gained by others in the medical diagnosis programs. We hope at UCSC to have some local data handling capability such as printing, plotting, and tape handling to facilitate our work. Of course interactive graphics will continue to be our method of man-machine communication and we plan to add a GT-44 graphics terminal in the near future to expand current capability. Another graphics terminal is planned for the more distant future. r;le would continue to depend on SUMEX for host computing and file storage. We would hope that higher speed communication lines might become possible in the future. 13. Justification for Continued use of SU?lZX by'0ur Project: The SECS project requires a large inter active timesharing capability with high level languages and support programs. UCSC is not likely in the future to be able to provide this kind of resource. Thus from a practical standpoint, the SECS project really needs access to SUMEX for survival. Scientifically, interaction with the SUMEX community has been extremely important to the SECS project. Many of our future goals involve incorporation of ideas from other AIM projects into the chemical synthesis project. iie would like to believe some of the ideas from the SECS projects are also influencing other AIM projects. Our metabolism project requires collaboration with the metabolism experts at the National Cancer Institute 3000 miles away. The networking aspects of SU%X-AIM will be very valuable to this important project. Several collaborations for development of strategies in SECS are being also planned and would require networking. C. Comments and Suggestions for Future Resource Goals, Development Efforts, etc: From our standpoint multiplexing'to Stanford might give us higher speed communication for graphics and file transport. Development of MAINSAIL seems important, but until that materializes, support of FORTRAN and standard DEC compatibility is crucial to the SECS project. FORTRAN-10 and LINK-10 are becoming the DEC standard and provide overlay capabilities which are needed in moving programs from machines with virtual memory to ones with limited memory. It would be useful if there were a good file transfer program--the standard DEC FAILSAFE should be implemented so we can send out files and have their names, versions, etc preserved. It would also be convenient to have a way to send files over TYLINET and TELENET to other machines. !ie could use this in updating programs at First Data Corporation. J. Lederberg 126 Privileged Communication CHEMICAL SYNTHESIS PROJECT (SECS) Section 6.2.2 The SUMEX-AIM resource should have an annual workshop for the individuals actually implementing and building systems on SUiEX--the students, postdocs, etc. The purpose of this would be to spread innovation and techniques as well as actual sharing of programs among users of SUMEX. It would also be an opportunity to plan collaborations, development software, and plans for SUWX. Importantly, it would also develop personal contacts to compliment network contacts. This could be in conjunction with or in addition to the current annual AIM workshop. The current AIM workshop should alternate between coasts. Privileged Communication 127 J. Lederberg Section 6.2.3 HIGRER MKNTAL FUNCTIONS PROJECT 6.2.3 HIGHER YENTAL FUNCTIONS PROJECT Modeling of Higher Mental Functions Kenneth 1`4. Colby, M.D. Professor of Psychiatry and Diobehavioral Sciences University of California at Los Angeles I) Summary of Research Program - A. Technical Goals: There are three technical goals of the Higher Mental Functions Project: (1) To improve and lltherapeuticallyti experiment with a computer simulation of paranoid processes in order to make treatment recommendations to clinicians based on experience with the model. (2) To develop a new taxonomy of psychiatric patients based on the conceptual patterns appearing in accounts of their illnesses. (3) To develop an intelligent speech prosthesis for patients suffering from communication disorders. 6. Medical Relevance and Collaboration: Ti?e Higher Mental Functions Project. is located in the Neuropsychiatric Institute at UCLA. The medical relevance of its research concerns the fields of psychiatry and neurology. The Project collaborates with clinicians and investigators in psychiatry, neurology, the neural sciences and neurolinguistics. C. Progress Summary: tie have improved the paranoid model to the point where it can be utilized for therapy experiments. (The model has now passed a true Turin& Test in which it cannot be distinguished from real patients.) The taxonomy effort is just under way, using the language recognition program which serves as the front end of the paranoid model. This program will have to be added to and modified to serve the purpose of finding and classifying the conceptual patterns appearing in patients' accounts of their illnesses. We have interfaced a micro-processor with a voice-synthesizer to provide a speech prosthesis for patients unable to speak. The next step is to write an "intelligently algorithm which attempts to figure out what the patient is trying to say from his partial input information. J. Lederber; 128 Privileged Communication HIGHER MENTAL FUiJCTIONS PROJECT Section 6.2.3 D. Funding Status: (1) Current funding. This project is currently funded by research grant NII'EIEI $lit 27132-02 and by a Genepal Research Support Grant from the UCLA Neuropsyzhiatric Institute. (2) Pending applications and renewals. Four additional srant applications have been submitted and are pendin. g at the IJIii for support of the above-described research. II. Interactions with the SU&X-AIN Resource ---- ~ - A. Collaborations: The project collaborated with Professor Jon Heiser, Department of Psychiatry, University of California, Irvine, and consulted with Professor Robert K. Lindsay, Departlnent of Psychology, University of Xichizan, in conducting a Turing Test of the paranoid model. Other users of SUHEX have received advice and suggestions regarding their problems as well as opportunities to contrast their simulations with ours. We have benefitted geatly from others' comments on the adequacy and inadequacy of our paranoid model. B. Sharing, etc.: Members of the project have participated in two workshops held at Rutgers, presentins several papers, chairing panels, and conducting discussion groups. Infomal discussions with large numbers of workers in Artificial Intelligence in Medicine have led to a helpful sharing of ideas and techniques. SUXEX is valuable to us as a comnunication channel combining the advantages of a telephone and the U.S. mail without the disadvantages of either. For widely scattered researchers, it facilitates the intimate, low-level comnunisation which is nornally accomplished in hallways or around water coolers. The individual discussions are not very profound, but the cumulative effect subtly improves our research. The existence of SUr$X as an independent project naturally relieves numerous researchers of the burden of separately financing and staffing a large computer facility. C. Critique of Resources: The few complaints we had regarding difficulties of network access have been remedied. The computer system perfornance is admirable with the staff beins most receptive to suggestions. Privileged Cozmunication 129 J. Lederberg Section 6.2.3 HIGHER MENTAL FUNCTIONS PROJECT III. Follow-on Period A. Long-range Project Coals: We anticipate working on the above-da- -scribed projects for at least 5 years or more. The problems are highly complex and will reqluire years of sustained effort to solve. B. Justification for Continued Use of SU>ZX: The paranoid model and the conceptual pattern recognizer require a large time-shared computer because of the large size (1OOYK) of these programs written in a high-level programmin- d language (MLISI-UC1 LISP). The speech prosthesis effort does not require a large system in itself because it stands as an independent unit. However, for constructing and developing dictionaries for types of speech prostheses, it is most efficient to do this on a large and fast system such as SUHEX. C. Comments and Suggestions for Future Research Goals: It seems that the resource fulfills all of its stated goals of facilitating research in the field. The only drawback is that there isn't more of a good thing. Doubling the computing power and memory storage capabilities would not be unreasonable. D. Up-to-date List of Publications: Colby, K-M., Parkison, R.C. and Faught, B, Tattern-natchine Rules for the Recognition of Natural Language Dizloogue Expressions. An. J. Computational Linguistics, Microfiche 5, Sept., 1974. Colby, K.M. Clinical Implications of :. Simulation i'lodel of Paranoid Processes. Archives of General Psychiatry, 33, 854-857, 1976. Faught, W., Colby, K.X. and Parkison, 3-C. Inferences, Affects and Intentions in A Hodel of Paranoia. Cognitive Psychology, 9, 153-187, 1977. Colby, K.H. An Appraisal of Four Psychological Theories of Paranoid Phenomena. J. of Abnormal Psycho?ogy, 85, 54-59, 1977. Parkison, R.C., Colby, K.H. and Faught, 'r1.S. Conversational Language Comprehension Using Integrated Pattern platching and Parsing. Artificial Intelliqetxe (In Press) 1977. Colby, K-M., Christinaz, D. and Graham, S. A Computer-driven, Personal, Portable and Intelligent Speech Prostha ,sis for Aphasic Disorders. Brain and Language (In Press) 1977. Colby, K.tl. On the Way People and Models Do It. Perspectives in Biology and Medicine (In Press) 1977. J. Lederberg 133 Privileged Communication HIGHER MENTAL FUXCTIOCJS PROJECT Zection 6.2.3 Heiser, J., Colby, K.H., Faught., Y. and Wrkison, R.C. Testing Turing Test (Forthcoming). Faught , 5J.S. Conversational Action Patterns in Dialogs. Proceedings of the Workshop on Pattern-directed Inference Systems, Flay, 1977. Privileged Communication 131 J. Lederberg Section 6.2.4 INTERNIST PROJECT 6.2.4 INTERNIST PROJECT INTERNIST - Diagnostic Logic Project J. Xyers, M.D. and H. Pople, Ph.D. University of Pittsburgh I. SiRWARY 0% RESEARCH PROGRAZ - A. Objectives The principal objective of this research project has been and continues to be the development, evaluation, and implementation of a computer-based diagnostic consultation system for internal medicine. This work, which was initiated at the University of Pittsburgh approximately six years ago, has been supported for the past three years by a grant from the Bureau of Health Resources Development. A heuristic diagnostic program called INTERNIST has been developed, along with an extensive medical database now comprising more than four hundred disease categories and two thousand manifestations of disease. The system has been tested with a wide variety of difficult clinical problems: cases published in the medical journals, CPC's, and other interesting and unusual problems arising in the local teaching hospitals. In the great majority of these test cases, the heuristic INTERNIST program has proved to be effective in sorting out the pieces of the puzzle and coming to a correct dia.qosis. In some cases, as many a3 six distinct disease entities have been identified correctly. We believe that by the time of the expiration of the SHRD grant in June, 1977, our original objective, which was to develop a system providing expert diagnostic capability with regard to the major diseases of internal medicine, will have been accomplished to the extent possible in tne current laboratory framework. At that time, we propose to initiate a broader collaboration, which will invite the participation of re,mote users in (a) further evaluation of the INTERNIST programs and data--base. (5) development of specialized data-bases and procedures for various medical subspecialties. (c) refinement of the user interface. (d) investigation of alternate uses of the IrjTzRNIS'T data-base. de believe that the expansion of the experience base of INTERNIST users, which will result from this type of collaboration, will significantly enhance the further course of Ii,JTERNIST development. J. Lederberg 132 Privileged Communication INTERNIST TROJECT Section 5.2.4 B. Progress Summary Expansion of the medical data-base to encompass new areas of disease is an on-going activity of the project. Much of this work is carried out by medical students ~110 elect to take part in the project as part of their fourth year clinical rotation, with the period of participation varying from 6 to 18 weeks. Each student is assigned a group of diseases, usually in a specific clinical area, for study. Tne literature on a disease is studied exhaustively for all quantitative data available. Frequently clinical experts on the faculty are consulted, particularly about controversial data. Tne student compiles a complex list of the manifestations of the disease under study and assigns tentative measures of strength of association. The clinical principal investigator together with any other clinicians working on the project then review the dat.a exhaustively in order to assure the appropriateness and completeness of the disease profile. The profile is then entered into the computer and tested for completeness and reliability against a typical or fltextbooklf example of clinical cases. If available, other cases of the disease from the floors of our university hospital and from published cases such as the clinical-pathological conferences from the New England Journal of Eledicine and the American Journal of p!edicine are also used. Further refinement occurs in the course of the continued use of the data- base. In addition to this data-base development, work on a refined diagnostic program has also been an on-goin g activity during this period. The present INTEHIJIST process employs a 'problem - formation' heuristic, which identifies one of perhaps several problems in a clinical case as its initial focus of problem-solving attention. Although only one problem is considered at a time, the process recycles after each problem is solved, thereby uncovering the entire complex of diseases present. In t'ne great majority of clinical cases tested, this strategy of iterative problem formation and solution has proved to be effective in sorting out the complexities of a case and rendering a correct diagnosis. In many respects, however, it seems clear that performance could be significantly enhanced if the program were to attend to the various component problems and their inter-relationships simultaneously. Use of a more global problem - formation strategy could be expected to yield more rapid convergence on the correct diagnosis in many cases, and in at least some cases to prevent missed diagnoses. Alternative problem formation strategies that exploit the type of pseudoparallel processing facilitated by the INTERLISP 'spaghetti stack' are presently beins investigated. We believe that this research will also set the stage for subsequent development of a therapeutic management component of the It4TERNIST consultation facility; however at the present tize it is not possible to project a precise timetable for the development of t.hes3 additional capabilities. Wivileged Communication 133 J. Lederberg Section 6.2.4 INTERNIST PROJECT c. Publications 1. Pople, H.E., b'lyers, J.D., h IYiller, R.A., "The DIALOG Model of Diagnostic Logic and its use in Internal Medicine". Procesdings of the Fourth International Joint Conference on Artificial Intelligence, Tbilisi, USSR, September 1975. 2. Pople, H.E., l'Artificial-Intelligence Approaches to Computer-based Medical Consultation, Proceeding IEEE Intercon, New York, 1975. 3. Pople, H.E., ':The Syntheses of Composite Hypotheses in Diagnostic Problem Solving: An Exercise in Hypothetical Reasoning". Proceedings of the Fifth International Joint Conference on Artificial Intelligence, August 1977 (forthcoming). D. Funding Status 1. Current Funding: Granting agency - BHRD; Number: 1 ROI MB 00144-03 Total period of the award - 3 years (6-30-74 to 6-29-7'7) Current year of the award - 1977 Current annual funding - 148,636 2. Pending Applications: 1. Granting agency - NIH; Title: Clinical Decision Systems Research Resource First year request - 1,023,883 2. Granting Agency - BHRD; Title: DIALOG: A Computer Zodel of Diagnostic Logic Fourth year request - 190,176 II. ,( IITERACTION iiI'TH SUMEX-AIM RESOURCE ---- A. Iledical Use of Programs and Collaborations Because of the research and development nature of our work on the INTEWIST system over the past several years, we have been somewhat limited in our ability to establish wide-spread collaborations. However, members of the medical house staff in the local hospitals having some prior experience with the project have continued to work with INTERNIST while pursuing their medical training. In addition, project staff often have occasion for interaction with individuals and groups who have interest in the characteristics of the diagnostic system froa both medical and computer science perspectives. Future plans for more extensive collaboration are discussed in section III. J. Lederberg 134 Privileged Communication IL?TERiJIST PROJECT Section 6.2.4 B. AIM Interactions We have benefitted considerably from interactions with other members of the SUr4EX-AIM community. In June '76 we participated in the AI."4 workshop at Rutgers, whicn provided an excellent perspective as to what else is going on in the field. During the past several months we have had useful exchanges with Randy Davis, Victor Yu, and John Foy, three individuals participating in the MYCIN project. In addition, we rather routinely interact with SUZEX staff regarding fine points and problems relating to our use of system facilities. The opportunity to keep abreast of developments in a fast changing field is one of the principal benefits to be derived from the collegial environment fostered by SUpjEX-AI3l. C. Critique Of Services We have found the SiJ1%X-AIM resource to be a superb facility for the conduct of research and development activities related to the INTERNIST project. The general high level of user services, documentation, staff support and reliable operation, which characterizes this unique resource, has contributed significantly to the rate of progress our project has achieved. Iii. FOLLOW-ON SUMEX GRA?JT PERIOD (8/75 - 7/83) -- - .- -- _I_ A. Long-Range User Project Goals And Plans Continued research and development of the medical data base and diagnostic programs characteristic of our past and current work at SU!l%X in anticipated. We estimate that two to three years will be required to complete the medical data-base presently envisioned for IiJTZRiJIST. However, by the end of this grant period (June 30, 1977) we expect that the knowledge base should have reached "a critical mass" sufficient to allow initial clinical trial on a routine basis . Sometime in mid-1977, we intend to begin limited field trials of the IiJTERi\lIST system by installing terminals in selected wards of Presbyterian University Hospital in Pittsburgh. A number of the members of the house staff have indicated their desire to participate in the evaluation studies, and several have expressed willingness for all cases entering their service to be run and rerun as necessary, in order to enhance our understanding of the strengths and weaknesses of the INTERNIST system. As we move from tne R&D stage to this more production-oriented.phase of activity, it seems inevitable that the requirements for support of IidTERtiIST activities will become increasingly incongruous with the general purpose nature of the facility provided by SUMSX. Privileged Communication 135 J. Lederberg Section 6.2.4 INTERNIST PROJECT Our expectation is that on the services initially supported at Presbyterian University Hospital, there will be as many as 20 INTEWiST case analyses run each day. Based on our experience operating IMTSRNIST at SUblEX, we would anticipate that eacn of these studies would require 3 to 5 minutes of CPU time and entail an elapsed time on the order of 30 to 53 minutes during lightly loaded periods on the system. We have also found, however, that the only feasible time to perform such studies is in the early morning hours, and that by 11:00 or 12:00 Eastern time the response provided by SUi4EX is unacceptable for such activities. Mhile marginally capable of supporting the heavy case load anticipated in tile local evaluation studies, SWIEX-AIIJI will clearly not serve the more extensive collaboration - involving up to fj remote user sites - which is presently contemplated for the second stage of field evaluation which we hope to have underway before January 1978. We believe it to be critically important during these field trials, that highest priority be given to providing a responsive system, scheduled for the convenience of those clinical personnel asked to participate in the project. This suggests that dedicated hardware facilities, which can be optimized to support this central user service, be made available for the exclusive use of INTERNIST staff and collaborators. For this purpose, we have proposed to NI9 the establishment of a Clinical Decision Systems Research Resource, which trould be a node in the AI14 network having DECSYSTEW-20 hardware and software, a TYI0IET interface, and the specialized mission described above. Our hope is that this new facility can be in operation by January 1, 1975. B. Justification For Continued We of SiJI~!EX By The INTERNIST Project SUNEX will be used in the initial field trials of INTERNIST, which we hope can be accomplished without overload and interference with the work of other users. Nith establishment of a dedicated ItITERNiST resource, this production case load will be removed from SU$?EX, but at present it is not possible to define precisely when this changeover will take place. In any case, a continuing research effort requiring SU?%X facilities can be expected t.o require approximately the same level of resource utilization as in the past. c. Comments and Suggestions 1) The members of the IfiITERNIST project agree that the plans to auzgent the SUNEX resource by the addition of more core memory and disk storage and retrieval facilities can be expected to provide quite tangible improvement in system performance. 2) In the experience of developing program access to the large INTERNIST data base, project members have perceived thz potential value of a general system designed to facilitate the interface of user programs and structured data- bases. We would be interested in collaborating with the SUMEX staff in such a development, which night prove beneficial for the user community at large. J. Lederberg 135 Privileged Communication INTERAIST PROJECT Section 6.2.4 3) Another potentially valuable research area would be the investigation of methods to provide support for a project's efforts to improve real-time performance of its programs. While 'the design of program specific algorithms xust be the concern of project staff, it is in the interest of the SUC4EX community that user's be provided with information an,d tools to enable efficient use of SUHEX' languages and operating system. Tnis is one of few areas in wnich we have found documentation of system features and facilities to be less than adequate. Perhaps special performance uorkshops, involving systems personnel from the various AI23 sites, could be convened to address these issues. Privileged Communication 137 J. Lederberg Section 6.2.5 MEDICAL INFORMATION SYSTEMS LABORATORY 6.2.5 MEDICAL INFORMATION SYSTEMS LABORATORY MISL - I'ledical Information Systems Laboratory rl. Goldberg, M.D. and B. McCormick, Ph.D. University of Illinois at Chicago Circle I) SUWMARY OF RESEARCH PROGRAM --- A.) TECHNICAL GOALS The Medical Information Systems Laboratory (MISL) was established under grantKM-0114 in Chicago to pursue three activities: i) Construction of a database in ophthalmology, ii) Clinical knowledge system support, and iii) Network-compatible database design. Priorities in year 04 of MISL's operation are the same as in previous years: investigations into how to construct a database in ophthalmology, and into distributed database design, are ancillary to the exploration of a clinical knowledge system to support clinical decision making. We are developing ways to get reliable clinical information into the ophthalmic database primarily because we are interested in getting out significant clinical decision support. B) APPROACH AND HEDICAL RELEVANCE B.l) Construction of the database in Ophthalmology A specific aim of this project is to construct a workable database in ophthalmology, using the outpatient population of the Illinois Eye and Ear Infirmary. We view this database as a testbed for developing clinical decision support systems. The Ophthalmology Department of the Illinois Eye and Ear Infirmary provides an excellent environment for evaluating new techniques for capturing and using clinical information. B.2) Clinical knowledge support system The goals for clinical knowledge system development are to provide a flexible user interface for a prototype relational database system, to devise means of accessing alphanumeric and pictorial information stored in the database system, and to provide efficient means for logically restructuring a database so that it can be adapted to different operating environments in a network- compatible distributed medical information network. No clinical database, however, has intrinsic significance beyond its ability to support the diagnosis and management of disease. Additional goals for the clinical knowledge system are therefore to devise computer-based consultation systems for glaucoma and selected retinal/choroidal diseases, and t.o provide J. Lederberg 138 Privileged Communication MEDiCAL INFORMATION SYSTEMS LAHORATORY Section 5 -2.5 formal models which permit the relational development and evaluation of rule- based consultation systems containing 2,000 - 10,000 rules. In recognition that a continuum exists between physician-guided decision support and computer-based consultation, we choose to describe these services as a Clinical Knowledge System: a consortium of a clinical database and rules for its interpretation. C) pRO;IRESS SUfllLiARY (Ii(CLUDIPJG ITEi4S OF INTEREST TO SUMEX-AIM CCMMU:dI'.CY ONLY) C-1) The database in ophthalmology Physician terminals and interfaces to ophthalmic instruments have been positioned in the general eye clinic and several key ophthalmic subspecialty clinics. Systematic, modular hardware and software for clinical source data acquisition have been established. The clinical support system computer will shortly be transfered to the newly dedicated Goldberg Research Center, adjacent to the Illinois Eye and Ear Infirmary. We look forward to stabilizing the hardware configuration, telecormaunication linkages and software support. C.2) Clinical knowledge system support C.2.a) Development of the relational database includes the following: - A user interface through which unsophisticated users communicate with the database. - An intelligent coupler that serves as an intermediary between the end user and the distributed database system. The coupler listens to the user's retrieval requests; helps the user formulate his requests correctly; efficiently translates user's retrieval requests into a network-compatible retrieval command lan.guage; and obtains authorization from the system for data retrieval and/or update. - Tools for picture data management. Graphical indexing techniques are provided so that the clinical researcher and physician can easily retrieve pictorial/graphical information from the medical database. - Means for logical database synthesis. This involves conversion of the user's view of the database into a logically coherent physical organization. c.2 .b) Development of a computer-based consultation system for diagnosis and management of glaucoma. and, This involves on-going collaboration between Dr. Jacob Wilensky at MISL, through SUMEX-AIM, other investigators around the United States. Included are the original investigators in glaucoma consultation: Dr. Casimir Kulikowski (Rutgers), Dr. Shalom Weiss (Xt. Sinai Hospital, NY), and Dr. Aaron Safir (Mt. Sinai Hospital). Privileged Communication 139 J. Lederberg Section 6.2.5 XEDICAL IiiFORMATION SYSTEMS LABORATORY C.2.c) Development of a consultation system for diqnosis and management of retinal/choroidal diseases. A design has been proposed (in Walser and XcCormick, see below) for MEDICO, a consultation system that advises non-expert physicians in the management of chorioretinal diseases. In addition, a major subsystem of MEDICO, responsible for mediating tne acquisition and organization of rules, has been implemented. C.2.d) Formal models for consultation systems. Petri nets have been studied, primarily by Xurata (see below), as a formal representation for interacting parallel processes. Petri nets are similar to causal networks, as described by Kulikowski and Weiss at Rutgers, except that, with Petri nets, cyclic activity is easily represented. The similarity between Petri nets and inference nets has also been noted (Walser and McCormick). The utility of the Petri net framework for nodelling physical processes was explored by Walser, with the construction of a simulated coffee maker. Further studies are planned. 0.1 LIST OF i4ISL PUBLICATIOBS Chang S. K., Donato N., McCormick B. H., Reuss J., and Rocchetti R. (1977) A relational database system for pictures. Proc. IEEE Workshop on Picture Data Description and Management, April 20-22, 1977, Chicago, Illinois. Chang S. K. and Cheng W. H. (1975) A database skeleton and its application to logical database synthesis. MISL report X.D.C. 1.1.17. Cnang S. K. and McCormick B. H. (1975) An intelligent coupler for distributed database systems. MISL report 1'1.D.C. i.l.7. Malone, J. E. (15176) Interval generalization of structure representation. MISL report F4.D.C. 1.1.22. Malone J. E. (1975) User's guide to uniclass co7er synthesis. MISL report M.D.C. 4.4.1. 1'4alone J. E. (1975) Addendum to AQVAL/l (A37), part 1: User's guide and program description. MISL report F1.D.C. 4.4.1. iYlanacher G. K. (1977) The case for strong loops and selection structures in ordinary computer languages. KISL report X.D.C. 1.1.21. Manacher G. K. (1975) On the feasibility of implementing a large relational data base with optimal performance on a minicomputer. Proc. International .COnference on Very Large Data Bases, Framingha!, Mass. lYcCormick B. H. and Nordmann B. J. Jr. (1977) Zodular as,ynchronous control design. Forthcoming in IEEE Transactions on Computers. Also MISL report M.D.C. 1.1.25. J. Lederberg 140 Privileged Communication MEDICAL INFORMTION SYSTEMS LABORATORY Section 6.2.5 14cCormick B. H. and Amendola R. C. (1977) Cytospectroneters for subcellular particles and macrosolecules: design considerations. Presented at Workshop on Theory, Design and Biomedical Applications of Solid State Chemical Sensors, Case Western Reserve University, March 23-30, 1977. Also MISL report M.D.C. 1.1.24. NcCormic;c B. H. and Wilensky J. (1975) Clinical knowledge a,cquisition: design of a relational data base in ophthalmology. Proc. Second Annual Medical Information Systems Conference, TJrbana, 111. McCormick 8. H., Goldberg 14. F., and Read J. S. (1974) Clinical decision-making: design of a data base in ophthalmology. Proc. First Annual `riedical Information Systems Conference, Urbana, Ill. Michalski R. S. and Chang S. K. (1976) A self-model for a relational database. MISL report M.D.C. 1.1.15. Kichalski R. S. (1975) On the selection of representative samples from large relational tables for inductive inference. r'IISL report M.D.C. 1.1.9. 14urata T. (1975) On liveness and other properties of E-Nets. MISL report M.D.C. 1.1.15. Wurata T. (1975) Bibliography on Petri nets and related topics. MISL report M.D.C. l-1.20. Murata T. (1976) A method for synthesizing marked graphs from given markings. Presented at 17th Annual Symposium on Foundations of Computer Science, October 25-27, Houston, Texas. Xurata `I'. (1976) On dead1 ock and the liveness of E-nets. Presented at the 17th Annual Symposium on Foundations of Computer Science, October 25-27, Houston, Texas. illurata T. (1975) State equation, controlability, and maximal matchings of Petri nets. MISL report M.D.C. 1.1.10. Nurata T. and Church R. iJ. (1975) Analysis of marked graphs and Petri nets by matrix equations. MISL report 11.D.C. 1.1.5. Vere S. A. (1975) Induction of concepts in the predicate calculus. Proc. Fourth IJCAI. Vere S. A. (1975) Relational production systems. Forthcoming in Artificial Intelligence. Also C4ISL report 1uI.D.C. 1.1.5. Walser R. L. and McCormick B. 3. (1976) Organization of clinical knowledge in MEDICO. Proc. Third Illinois Conference on iledical Information Systems, Urbana, Ill. dalser R. L. and McCormick B. H. (1977) A system for priming a clinical knowledge base. Fort"lcoaing in Proc. 1977 National Computer Conference, June 13-76, Dallas, Texas. Privileged Communication 141 J. Lederberg Section 6.2.5 MEDICAL INFORMATION SYSTEI% LABORATORY E.) FUNDING STATUS Year 03 -- 6/30/76 - 6/30/77: $228,000. Year 04 (projected, pending renewal) -- 7/l/77 - 6/30/78: $278,109. II) INTERACTION ZITH SiJrlEX-AIiYI RESOUJ --- A.) COLLABORATION Major collaboration at present is through the ONET, involving the ophthalmology departments of five medical schools. Dr. Jacob Wilensky is actively engaged in evaluatin g and modifying the Glaucoma Consultation Program, written originally by Shalom Weiss. B.) CRITIQUE OF RESOURCE SERVICES Users at MISL are pleased with SUMEX-AIM services. The availability of up- to-date on-line documentation makes it easy to learn how to use the systea and stay abreast of new developments. The on-line bulletin board is especially conmendable. Since docunentation is so readily available, consultation with SW4EX staff has rarely been necessary. III) FOLLOW-014 SU:4EX GRAiqT PERIOD -- A.) LONG RANGE USER PROJECTS AND GOALS In the future, we expect to becoac more involved in the development of software for decision support. We also anticipate snore extensive collaboration, especially sharing of databases, with investigators at other sites. ~3.) SPECIFiC PROJECTS AND JUSTIFICATION FOR CGNTINUED USE OF SUMEY While nuch of our development to date has been conducted in a ZKLnicomputer environ?ient, we have now reached a stage at which we can benefit greatly from software available from SU@X. Access by our staff to SU%X facilities and opportunity for inter-institutional collaboration will be enhanced by a SUZEX (PDP-10) - :IISL (PDP-11) phone connection, which we plan to iinplement shortly. This connection will be valuable to our decision support group, since it will be possible to develop and test programs in INTERLISP at SUMEX, then to translate them into the lower level HARVARD LISP, which is available On our UMX (PDP-11) operating system. It will also be possible to edit prograTls on our machine (which is an advantage for us since ;qe can operate at 9600 baud), then execute the programs on the SUi+EX PDP-10. Also, using SUKEX, we have recently implemented the planning system described by Earl Sacerdoti in his thesis "A structure for plans and behavior" (Stanford, 19 75). iJe are inpressed by the potential power of the system and are J. Lederberg 142 Privileged Coamunication MEDICAL INFOR?lATION SYSTE:MS LABORATORY Section 6.2.5 considering it as a basis for our consultation system for managing chorioretinal diseases. Since our version has only been tested in a blocks world, further development is necessary, and we would, of course, require continued access to SU?lEX and IidTERLISP. It has also been proposed that the planning system be used to construct sequences of database retrieval statements in AM;?, a relational algebraic interpreter developed by Dr. S. I:. Chans at MISL. This could benefit our user interface, since physician's requests coluld be phrased at a hiP;h level, and then translated into appropriate RAIN commands. The planning syst.em provides a convenient, procedural representation for the database semantics necessary to make the translation from a high level language. IMTERLISP is also being used by Dr. Brian Phillips and his students to code a model of knowledge developed over a period of years at the State University of New York at Buffalo, and later in the Department of Information Engineering and MISL in Chicago. While the model of knowledge is well-developed, and has been implemented at another site in SNOBOL, the IMTERLISP version requires further work. It is anticipated that the implementation, when Complete, will be useful to the decision support group. C.) SUGGESTIONS F3R FUTURE RESOURCE DEVELOP:43?T EFFORTS As mentioned above, we are very interested in coupling our PDP-11 based UlJIX operating system with the SUMEX-AIM network. and would like to encourage similar connections at other sites. There are several advantages. Maintaining voiuminous patient-related data on minicomputer systeas would provide for local security, and help to keep SUZEX secondary storage free for service and develop)xent programs and docunentation. The enhanced opportunity for inter-site collaboration and database sharing is obvious, and wotild be beneficial to the SUI4ZX-A1?4 community as a whole. Privileged Communication 1 !I3 J. Lederberg Section 6.2.6 RUTGERS COt4PUTERS IN BIOWEDICINE 6.2.6 RUTGERS CO?l?UTERS IN BIOlllEDICII?E Rutgers Research Resource - Computers in Biomedicine Principal Investigator: Saul Amarel Rutgers University, New Brunswick, New Jersey 1) SUMNARY OF RESEARCH PROGRAH - A) Goals and Approach The fundamental objective of the Rutgers Resource is to develop a computer based framework for significant research in the biomedical sciences and for the application of research results to the sollltion of important problems in health care. The focal concept is to introduce advan ted methods of coaputer science - particularly in artificial intelligence - into specific areas of biomedical inquiry. The computer is used as an integral part of the inquiry process, both for the development and organization of knowledge in a domain and for its utilization in problem solving and in processes of experimentation and theory formation. The Resource community includes 48 researchers - 30 members, 8 associates and 10 collaborators. Members are mainly located at Rutgers. Collaborators are located in several distant sites and they interact, via SUMEX-AIM, with Resource members on a variety of projects, ranging from system desiqn/improveaent to clinical data gathering and system testing. At present, collaborators are located at the Plt.Sinai School of Medicine, rJ.Y.; Wasnington University School of tledicine, St. Louis, Ho.; .JE The knowledge base and the strategies of our CASMET glaucoma consultation system are being strengthened and refined continuously in the ONET environment. The system is now at a point where it is considere d by leading ophthalmolo,gists as "highly competent to expert" in several subspecialties of glaucoma. The ONET group was confident enough about the system to demonstrate it at the October 1976 meeting of the American Academy of Ophthalmology and Otolaryngology. The reactions to the system were most favorable. The response of an independent sample of ophthalmologists taken at this meeting strongly emphasized the importance of the system for glaucoma research. In addition to the main glaucoma research activities, the Resource has coilaborated with the tit. Sinai-Rutgers Health Care Computer Laboratory in the development of models for refraction and visual fields. These will be used by --- clinical prototype programs for guiding paramedical personnel in data acquisition and decision-making. These programs run on the PDP-11 computers of the clinical ophthalmological system at ILit. Sinai, which are to be linked to the PDP-19 at Rutgers for accessing the more complex models of disease when they are needed. The activities in conjunction with the Health Care Computer Laboratory reflect the more applied aspects of our work in the medical area. The collaboration with Dr. R. Mordyke of the Straub Clinic on thyroid disease consultation systems has continued at a low level of activity during 1976. In the area of Belief Systems, collaboration has continued with Professor Andrea Sedlak and her group at the University of North Carolina. This collaboration is focusing on developmental aspects of action perception. In the AI Area we had extensive interactions with researchers in several -- institutions on problems of representation, problem solving systems, natural language processing, automatic programming, data base systems, and interactive systems. Contacts continued with the natural language group at BBN (Woods, Bruce) on the design of natural language processors for medical systems. Also, we had contacts with the Stanford-Xerox group (Winograd, Bobrow) which is involved in the development of KRL (Knowledge Representation Language). Following the Rand Workshop on EGome" -'ical Modeling (February 18-20, lg76), in which S. Amarel participated, preliminary contacts started with Dr. D. Garfinkel from the University of Pennsylvania in connection with possible applications of AI methods to the modelin g of metabolic processes. Our close contacts with the Stanford projects on Heuristic Programming (Drs. Buchanan, Feigenbaum, Lederberg) are continuing. The orientation and approach of these Stanford projects are very similar to ours. Ve continue to share with the investigators in DE$lDRAL and METADENDRAL a strong interest in computer-based methods of scientific inference and in AI ideas and techniques for representation of knowledge in computers, diagnostic problem solving and theory formation. One of the significant collaborative developments this period was the joint work of Ed Feigenbaum'and his students at Stanford, and Saul Amarel and his students at Rutgers, on the development of an AI Yandbook. This handbook is being prepared on the SUHEX-AIM and RUTGERS-10 conputers, and it is intended to J. Lederberg 146 Privileged Communication R'LTTCERS COMPUTERS Iti BIOMEDICIME Ssction 6.2.6 provide a network-accessible encyclopedic coverage of the AI field for the AIM community and AIM guests. C) Progress Sumnary 1. Areas of Study and Projects a) Medical Nodeling and Decision-Yakin -- --- The consolidation of the opthalmological network (ONET) of collaborating glaucoma investigators using the SUMX-AIM shared resource facility, the testing and improvement of the CASNET consultation system with the help of the collaborators, the design and implementation of a time-oriented database system and a set of analysis programs for aiding joint clinical research activities within ONET, and the development of a new knowledge-based consultation system (IRIS), represent the main achievements in the last year. The network of investigators in glaucoma is designed to foster development of consultation systems that embody sufficient depth for knowledge and expert opinion in a variety of subareas to be useful as research and teachinq tools. The collaborative activities, coordinated by Dr. A. Safir at Mt. Sinai, bring together selected scientist-users with complementary interests and strengths in different aspects of glaucoma, and Resource investigators who are concentrating on the development of new computer science methodologies in modeling and problem solving. During this period, there has been more extensive testing of the CASNET glaucoma consultation program. The collaborators had several meetings to discuss the structure of the glaucoma model and suggested many improvements and additions. A significant new capability of the proqam is the inclusion of alternative interpretations that capture differences of opinions among the experts on aspects of the model that are currently under debate. A neii development during this period has been the implementation of a time- sequenced data base for glaucoma, which has the dual purpose of aiding the clinical research of OWI collaborators and of providing a systematic means for evaluating and improving the performance of the consultation programs. In the area of general methods and systems we have developed a nultilevel- semantic network representation for characterizing disease processes, their anatomical descriptions and their taxono*ni:: identification. This is used by a set of normative rules for diagnostic, prognostic and therapeutic reasoning, which results in a very general and flexible system for clinical consultation. A prototype model called IRIS is being developed using the .glaucoma knowledge-base. Ye have also continued our investigations of other representation paradi,gms: a frame-based approach and the relationship to mathenatical models of optics and refraction. Another subproject is concerned with developing; methods of inference over nettrork structures that will permit us to incorporate the results of clinical experience with different groupings of case-types into the models of consultation, aidin,g at the sue time in the evaluation of the programs. Privileged Communication 147 J. Lederberq Section 6.2.6 RGTfERS COMPUTERS IEl BIONEDICIME b) Modeling of Belief Systems and Common-Sense Reasoning - Durin? this period a major achievement 'was the development and implementation of the AIHDS system. This is an XDS-based system that is specialized and augmented for use in modeling reasoning about actions. A notetiorthy aspect of the system is the use of the FIDS concepts of Consistency Conditions and Residues to guide frame instantiations and the drawing of further inferences from such frame instantiations. The BELIEVER theory is a psychological model of the processes involved in the interpretation and common-sense reasoning about observed human actions. The AIMDS system is being constructed to provide a framejrork for-formulating, studying and testing the BELIEVER theory. The computer system and the psychological theory are growing together, and they are strongly influencing each other's development. The domain of common-sense reasoning about actions represents a prototypical example of knowledge based reasoning. The richness of the psychological data that this theory must explain, namely, persons' linguistic descriptions and summarizations of everyday behavior, has forced us to think very carefully about how knowledge is to be represented and used. Out of this has emerged a general scheme that not only seers psychologically plausible but also appears to provide a useful framework for viewing a wide variety of problems of interpretation including medical diagnosis and theory-based interpretive problems involved in organic chemistry. Along with the implementation of the system, we have developed the representation of the central knowledge components of the BELIOuVER theory. The central common-sense concepts of Person, Plan and Act have been represented as frames. These frames are highly articulated structures which express the core assumptions of the common-sense psychological theory. By expressing these concepts as frames we have been able to provide a representation of these assumptions that can be used to guide and control the overall processes of reasoning about particular persons, plans an?. actions. The procedural components of the theory have been defined and are closely linked to these frames. This interplay and association between processes and highly articulated structures promises to provide a basis for strongly decomposing the knowledge of the donain. Since the interdependencies of these concepts are represented structurally rather than procedurally, the active database of our HDS-based system provides the basis for communication and cooperation between the processes that monitor these person, plan and act frames. The definition of these central structural components together wit`n the general system components have also provided a competence theory within which detailed predictions of.the BELIEVER theory were specified. These predictions about the structure of summary protocols were tested and borne out by the data. This provides one of the few examples of the verification of predictions derived from work on the development of psychological theory using AI concepts in the process of theory formation. J. Lederberg 143 - Privileged Communication RUTGERS COMP;TT"cRS IN BIOXEDICINE Section 6.2.6 c) Artificial_ Intelligence; Reoresentations, Reasoni- and - Systems Development -- Our work in this area continues to be oriented to collaboration with investigators in other Resource projects and to study of basic AI problems that are related to Resource applications. The collaborations involve adaptation and augmentation of existing AI methods and techniques to handle specific key problems identified in the application projects. The close collaboration wit'n investigators in the Belief Systems area has resulted this year in the development of the AIMDS System for handling problems of action interpretation of the type encountered in the domain of the BELIEVER theory. This system has provided one of the first examples of a working frame- based AI system. In addition, it has led to several important AI results, such as elucidation of the "frame problem" and unification of previous approaches to planning in heuristic problem solving. Our research in language processing has led this period to two important applications - in Medical Systems and in Belief Systems. In one project, the PEDAGLOT system is being adapted to provide a natural langua-,e interface for communicating patient cas e histories to our glaucoma system. In a second project, PEDAGLOT is providing the basis for implementing the experimental component of a competence theory within which the BELIEVER theory can be evaluated. Empirical work in this area requires the ability to process summaries and other natural language data. In the basic component of our work on language processing, we continued to develop a language inference system based on a "developmental paradigmff for grammar acquisition. We made progress in the area of coalescing rules of hypothesized grammars, and we started to look into ways of using semantic information to guide the hypothesis formation process. In another project, which is also focusing on hypothesis for-nation, we are studying processes of computer assisted acquisition of domain knowledge from empirical data, where knowledge is in the form of weighted production rules. This type of kno:;ledge can be represented as a stochastic graph. This year me obtained several new results in this area. We explored the implications of these results with the help of an experimental program which co; nstructs a stochastic graph from empirical data. Also, we wrote a program :,&ich makes use of a file of graph-structured knowledge to make decisions about a domain. In our rqork on theory formation in pro~remGnZ, we de:reloped a formation strategy which combines a global, model-guided, approach with a local analysis of special cases. In order to study experimentally this strategy, we are now developing a system for acquiring and handling information about programs in various stages of specification, as well as other knowledge which is relevant to the formation task. During this period we made important progress in building a strong basis of AI languages for our work. The UCI-LISP and FUZZY programming languages were adapted to the RUTGERS-10 and they rqere further innroved. The availability of these languages made possible the inple:nentation of r?,ajOP parts of AIMDS over a relatively short period of time. Work has now started on exploring the use of Privileged Communication 149 J. Lederberg Section 6.2.6 RUTGERS COMPUTERS IN BIOClEDICINE FUZZY (including its features for effective use of incomplete and/or uncertain knowledge) and AIMDS in certain problems of medical decision making. 2. AI>"1 Workshop -.- The Second AIi4 Workshop took place June 1 to 4, 1976 near the Rutgers campus, and it was attended by about 150 participants. The program included reviews of recent AI developments in i'dedicine, Biochemistry and Psychology; lectures and panel discussions on knowledge representation and AI system design; papers summarizing recent AI work in other application areas (outside AIM); and presentations of current research on computer-based biomathematical models. The Workshop included panels on networking and shared resources; in addition, there were a number of informal meetings in which specific projects or issues were discussed in depth. Hands-on experimentation and demonstration of AI systems (which were accessed via TYMNET and ARPANE'I) were an important feature of the Workshop. All indications are that the Workshop was very effective in stimulating scientific interactions and in disseminating work being done in the area of AIM. In support of the AIM Workshop series we devoted considerable effort this period to systems developma, to related computer and networking enhancements, to preparation of proceedings for the first \-!orkshop, and comprehensive supporting documentation for the second. A panel on Aoolications of AI to Science and ------ Medicine -was organized for the - week following thzcond AIM Workshop at the National Computer Conference in New York. It was intended to further augment the dissemination activities of AIM by bringing to a wide audience of professionals in the computer field recent developments in the AIM com`nunity. D) Up-to-Date List of Publications Amarel, S. and Kulikowski, C. (1972) "Medical Decision Making and Computer Modeling, Proc. of 5th International Conference on Systems Science, Honolulu, January 1972. Amarel, S. (1974) "Inference of Programs from Sample Conputationsl', Proc. of NATO Advanced Study Institute on Computer Oriented Learning Processes, 1974, Bonas, France. Amarel, S. (1974) "Computer-Based Modeling and Interpretation in Medicine and Psychology: The Rutgers Research Resourcel', Proc. on Conference on the Computer as a Research Tool in the Life Sciences", June 1974, Aspen, by FASEE; also appears as Computers in Biomedicine TR-29. June 1974, Rutgers Universit.y, also in Computers in Life Sciences. --__ (eds.), Faseb and Plenum, 1975. W. Siler and D. Lindberq Amarel S. (1976) Abstract of Panel on "AI Applications in Science and ;\ledicinel! in 1976 National Computer Conference Program, N.Y., June 7-10, 1975. Bruce B. (1972) "A idode f or Temporal Reference and its Application in a Question Answering Programit, in IfArtificial Intelligence", Vol. 3, Sprins 1'372. J. Lederberg Privileged Communication RUTGERS COMPUTERS IN BIWEDICINE Section 6.2.6 Bruce, B. (1973) "A Logic for Unknown Outcomestt, tiotre Dame Journal of Formal Logic; also appears as Computers in Biomedicine, T&35, NOV. 1973, Rutgers University. Bruce, B. (1973) "Case Structure Systems", Proc. 3rd International Joint Conference on Artificial Intelligence (IFCAI), August 1973. Bruce, B. (1975) "Belief Systems and Language Understanding", Current Trends in - the Langua,:e Sciences, Sedelow, and Sedelow (eds.) Houton, in press. Chokhani, S. and Kulikowski, C.A. (1973) r'Process Control Model for the Regulation of Intraocular Pressure and Glaucomas", Proc. IEEE Systems, Man & Cybernetics Conf., Boston, November 1973. Chokhani, S. (1975) "On the Interpretation of Bioaathematical Models Within a Class of Decision-Making Procedures", Ph.D. Thesis, Rutgers University; also Computers in Biomedicine TR-43, Hay 1973. Fabens, W. (1972) "PEDAGLOT. A Teaching Learning System for Programming Language", Proc. A01 Sigplan Symposium on Pedagogic Languages, January 1972 * Fabens, w. (1975) "PEDAGLOT and Understanding Natural Language Processing". Proc. of the 13th Annual Iieeting of the Asso. of Computational Linguistics, October 30 - Nov. 1, 1975. Kulikowski, C.A. and Weiss, S. (1972) "Strategies for Data Base Utilization in Sequential Pattern Recognition", Proc. IEEE Conf, on Decision and Control, SYmP. on Adaptive Processes, December 1972. Kulikowski C.A. and Weiss, S. (1973) "An Interactive Facility for the Inferential Wodeling of Disease", Proc. 7th Annual Princeton Conf. on Information Sciences and Systems, March 1973. Kulikorrski C.A. (1973) "'Theory Formation in Medicine: A iiletwork Structure for Inference", Proc. International Conference on Systems Science, January 1973 - Kulikouski, C.A. iu'eiss S. and Safir, A. (1973) lfGlaucoma Diagnosis and Therapy by Computer", Proc. Annual Meeting of the I\sso. for Research in Vision and Ophthalmology, May 1973. Kulikowski, C.A. (1973) "Hedical Decision-l+la'king and the Hodeling of Disease!*, Proc. First Interntl. Conf. on Pattern Reco.gnition, October 1973. Kulikowski, C.A. (1974) $'Conputer-Based Medical Consultation - A Representation of Treatment Strategies", Proc. Bawaii Interntl. Conf. on Systems Science, Jan. 1973. Kulikowski, C.A. (1974) "A System for Computer-Based Iledical Consultation", Proc. Nat.1 . Computer Conf., Chica,go, i-lay 1974. Kulikowski, C.A. and Safir, A. (1975) "Computer-Based Systems Vision Care:', Proceedings IEEE Intercon, April 1975. Privileged Communication 151 J. Lederberg Section 6.2.6 RUTGERS CONPUTERS IN BIOilEDICINE Kulikowski C.A. and Trigoboff, M. (1975) "A Multiple Hypothesis Selection System for Medical Decision-Making", Proc. 8th Hawaii Internatl. Conf. on Systems. Kulikowski, C. &N.S. Sridharan, (1975) "Report on the First Annual AIM Workshop on Artificial Intelligence in Medicine. Sigart Newsletter No. 55, December 1975. Kulikowski C. (1976) "Computer-Based Consultation Systems as a Teaching Tool in Higher Education, 3rd Annual N.J. Conf. on the use of Computers in Higher Education, Narch 1976. Kulikowski, C., Weiss S., Safir, A. et al (1976) "Glaucoma Diagnosis & Therapy by Computer: A Collaborative Network Approach" Proc. of ARVO, April 1976. Kulikowski, C. Weiss, S. Trigoboff, N. Safir, A., (1976) "Clinical Consultation and the Representation of-Disease Processes", Some AI Approaches, AISB Conferences, Edinburgh, July 1976. LeFaivre, R. and Walker, A. (1975) "Rutaers Research Resource on Computers in Biomedicine, HI', Sigart Newsletter'3No. 54, October 1975. LeFaivre, R., (1976) "Procedural Representation in a Fuzzy Problem-Solving System", Proc. Natl. Computer Conf., New York, June 1976. LeFaivre,R. (1977) "Fuzzy Representation and Approximate Reasoning", submitted to IJCAI-77, MIT. Mathew, R., Kulikowski, C. and Kaplan, I(. (1977) IrR Multileveled presentation for Knowledge Acquisition in Medical Consultation stems", Proc. MEDINFO 77 (in press). Mauriello, D. (1974) "Simulation of Interaction Between Populations in Freshwater Phytoplankton", Ph.D. Thesis, Rutgers University 197'1. Schmidt, C. (1972) "A comparison of source unidiaensional, multidimensional and set theoretic models for the prediction of judgenents of trail implicationl', Proc. Eastern Psych. Asso. r?eeting, Boston, April 1972. Schmidt, C.F. and D'Addanio, J. (1973) "A Model of the Common Sense Theory of Intension and Personal Causation", Proc. of the 3rd IJCAI, August 1973. Schmidt, C.F. and Sedlak, A. (1973) IIAn Understanding of Social Episodes", Proc. Of Symposium on Social Cognition, American Psych. Asso. Convention, Montreal, August 1973. Schmidt, C.F. (1975) 'IUnderstanding Human Action", Proc. Theoretical Issues in Natural Language Processing: An Interdisciplinary Workshop in Computational Linguistics, Psychology, Artificial Intelligence, Cambridge, Nass., June 1975 - Also appears as Computers in Biomedicine, TX-'-17, #June 1375, Rutgers University, J. Lederberg 152 Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE Section 6.2.6 Schmidt C. (1975) '!Understanding Human Action: Recognizing the Motives", Cognition and Social Behavior, 5.3. Carroll and J. -- Lawrence Earlbaurn Associates, in press. Payne (eds.), New York: Also appears as Computers in Biomedicine, TH-45, Juhe 1975, Rutgers University. Schmidt C.F., Sridharan, N.S., and Goodson, J.L. (1975) Recognizing plans and sus3arizing actions. Proceedings of the Artificial Intelligence and Simulation of Behavior Conference, University of Edinburgh, Scotland, July 1976. Schmidt C. (1975) Understanding human action: Recognizing the plans and motives of other persons. In (eds. J. Carrol and J. Payne) Cognition g-& Social Behavior, Potomac, i4aryland: Lawrence Earlbaum Associates, 1976. Schmidt, C.F. and Goodson, J.L. (1975) The Subjective Organization of Summaries of Action Sequences, 17th Annual Meeting of the Psychonomic Society, St. Louis, 1976. Sedlak, A.J. (1974) "An Investigation of the Development of the Child's Understanding and Evaluation of the Actions of Others", Ph.D. Thesis, Rutgers University. Sridharan, M-S. (1976) "The Frame and Focus Problems in AI: Decision in Relation to the BELIEVER System. Proceedings of the Conference on Artificial Intelligence & the Simulation of Human Behavior, Edinburgh, July 1976. Sridharan, N.S. (1976) "An Artificial Intelligence System to Model and Guide Organic Chemical Synthesis, Planning in Chemical Synthesis_ b-y Computer_, American Chemical Society Press, September 1976. Sridharan, N.S. and Schnidt,C.F. (1977), Knowledge-Directed Inference in BELIEVER, Workshop on Pattern-Directed Inference Systems, Hawaii, P4ay 1977. Srinivasan, C.V. (1973) !'The Architecture of a Coherent Information System: A General Problem Solving System", Proc. of the 3rd IJCAI, August 1973. Trigoboff, bi. (1976) Propagation of Information in a Semantic Net", Proc. of the Conference on Artificial Intelligence and the Simulation of Behaviour, Edinburgh, Scotland, July 1976; updated version appears in CBW-TX-57, Dept. of Computer Science, Rutgers University, 1377. Tucker, S.S. (1974) Cobalt Kinetics in Aquatic ilicrocosms'f, ?h.D. thesis, Rutgers University. Van der Mude, A. and Stalker, A. (1976) "Some Results on the Inference of Stochastic Grammars", abstract in Proc. Symposium on New Directions and Recent Results in Algorithms and Complexity. Dept. of Computer Science, Carnegie-lliellon University. Vichnevetsky, R. (1373) "Physical Criteria in the Evaluation of Computer Methods for Partial Differential Equations", Proc. 7th Internatl. AICA Congress, Pra:zue, Sept. 1973; reprinted in Proc. of AICA, Vol. XVI, 80. 1, Jan. 1974, European Academic Press, Brussels, Belgium. Privileged Communication 153 J. Lederberg Section 6.2.6 RUTGERS COllPUTERS IN BIOl'IEDICINE Vichnevetsky, R., Tu, K.W., Steen, J-A. (1974), "Quantitative Error Analysis of Numerical Methods for Partial Differential Squations", Proc. 8th Annual Princeton Conference on Information Science and Systems, Princeton University, March 1974. Walker, A. (1975) "Formal Grammars and the Regeneration Capability of Biological Systems", Journal Comp. and Syst. Sciences, Vol. 11,No. 2, 252-261. Irieiss, S. (1974) !'A System for Model-Based Computer-Aided Diagnosis and Therapyff, Parts I and Ii, Ph.D. Thesis, Rutgers University; also Con;puter in Biomedicine TR-27, Feb. 1974 - Usiss, S., Kulikowski, C. and Safir, A. (1377) "Glaucoma Consultation Computerl', Computers in Biology and Medicine (in press). E) Funding Status 1) Granting Agency: Eiotechnology Resources Program, DRR, NIH, 2) Grant number: RR-643. 3) Period of award: This is the 3rd year of the second 3-year period of the Resource. 4) Direct cost funds for the period Se?rlenber 1, 1976 to August 31, 1977: $336,314. 5) A proposal for a five-year extension of the Rutgers Resource was submitted in October 1976. The proposal is currently being evaluated by NIH. in our proposal we are requesting a substantially higher level of funding in order to cover increased levels of effort in all areas of t.he Rutgers Resource, and also to support the acquisition/enhancement of the RUTGERS-10 computer which we propose to use, in coordination with the SWEX-AIM facility, as a shared resource for the national AIX community. II) INTBRACTIOBS WITH THE_ SUMEX-AIM RESOURCE --- During the past year we have continued to use the SUMEX-AIM resource for program development and testing, for communications between collaborators distributed in different parts of the country apd for preparation and runnin? of the AIM 1Jorkshop. We continue to access S~%:C-AiE4 via TY!lNET, and to a smaller extent via ARPANET. SUMEX-AIi4 played a key role in consolidating our network of collaborators in ophthalmology (ONET) and in providing, the support needed for establishing, a productive collaboration among the ON3T investigators. Also, it has been most useful in communicating, planning, and helping to set up the information pool for the Second AII?l Workshap. Computing in the Rutgers Research Resource continues to be distributed betcjeen SULqEX-AIL4 and the RaTGERS-10. The two computers are providing complementary resources for our research and for our national collaborations, At present, the distribution of our computing is about 3 to 1 between RUTGZRS-10 and J. Lederberg 1'4 2 Privileged Communication RUTGERS CO?iPUTERS IN BIO?,lEDICINE Section 6.2.6 SU?lEX-AIM. Our total demand at SUMEX-.9IiI is estimated at about 5003 connect hours for the current year with most of the work done in IiJTERLISP (about 805 of our total connect hours) and the rest devoted mainly to conxunications and to limited program testing within OiJET. The SUMEX-AIM facility was used for demonstrations of AI14 proGrams in first year classes and in second year seninars at the Rutgers Nedical School, CFIDNJ; CASNET, MYCIN, INTERNIST and PP.RtiY were interactively accessed in these classes and seminars. Another innovative use of SWEX-AI!4 has been the collaborative development of the AI HANDBOOK, which is intended to provids a computer-based and network accessible encyclopedic coverage of the AI field for the AIM community and AIM guests. The AI HArJDBOOK was initiated by Dr. E. Feigenbaum and his students at Stanford. During the year, a graduate class at Rutgers, given by Dr. S. Amarel, worked on the AI HANDBOOK and contributed several articles. We find that the SU?ZX-AIM bulletin board plays an important role in communicating ideas and information on services among users. Since the MYCI11 group at Stanford regularly posts sunnaries of meetings; and other technical information, on the MYCIN bulletin board, we have been able to keep track of their program and problems. This was particularly useful for our work on IRIS where concepts close to the p4YCI?J CF formalism are being studied. System support at SUMEX-AIM has been more than good; it has been friendly. Problems or questions concerning the system are consistently handled quickly and competently by SUr!EX-AIM staff. Service is simply outstanding. The system is under heavy usage for most of the day, which causes painfully slow response times for large jobs; thus, it is usable for Rutgers users in the early morning or in the late evenin?. In most days the load average stays over 7 from noon EST to about 7 p.m. During these hours, the computer is only marginally useful for work with a large LISP system such as IRIS (currently this system has 245 pages of an INTERLISP core image). For relatively small jobs (about 70 pages), the response time has improved consequent to the chan?;e- D in the scheduler in early Spring. Access to SUtGX-AIM via TYMNET has improved considerably. Occasionally, however, problems persist with spurious characters and with broken connections. In the last year, several new areas of collaboration between Rutgers and SWEX-AIM have developed, most.ly along the lines of systems and support software. These include the following specific efforts: 1. MA1 HSA I L . During the past year, the design of the GINSAIL system has been stabilized to a great degree, and Rutgers has follosq?zi the development of the MAINSAIL effort in order to be in a position to aonly it to Rutgers' AIM activities, particularly in the ophthalmology area. We have made several passes over the FIAIidSAIL design durin,g this period, with particular interest to the issues of memory allocation and the possibilities of doing list processing in MAINSAIL. During April, Clark Wilcox and others from the Stanford group installed a prototype ?lAIVSAIL system on the Ri:tgers PDP-10, and it. is presently being Privileged Communication .J. Lederberr, Section 6.2.6 RUTGERS CO:IIPUTERS IN BIOJ?EDICINE 2. 3. 4. 5. used by a group from NIH who are interested in evaluating ~1AINSAIL for their own work. SOFTWARE. Two text processing programs, TVEDIT and PUB, were brouqht over from SUFlEX and installed at Rutgers, and are now being used on the RLJTGERS- 10. These tools, which were developed at Stanford's IMSSS and AI Laboratories, reduce the overhead in program and document preparation and maintenance. ALLOCATION and ARCHIVING .-- --* The d,esign of the allocation and archiving systems that have been in use at SU$lEX have been adopted, with some modification, for use at Rutgers. One of the important products of the SU?+EX research has been the models for interaction between a variety of collaborators; the way in which the allocation of system file space and the archiving of unneeded files have been accomplished at Stanford have been adopted at Rutgers. CG: A program for Explanation of an AI System. -- - - ..- -- In a somewhat different area, Prof. David Levine of the Rutgers faculty collaborated with Dr. Ray Carhart of the Stanford Heuristic Programming Project to produce a program that provides a dynamic,display-oriented interface to the CONGEN program. CONGEN examines the chemical formulas that are possible from a particular empirical formula, under a set of constraints on the generation of formulas. CG, the program that effects this interface, was written at Rutgers, and can run either at Rutgers or at STJbEX-AILI; CONGEN, which is currently written in INTERLISP, runs only at SUMEX-AIM. SYSTEiLi MODEL: The SU?EX staff has continued to be a model of cooperation -- and support for research. More importantly, the protocols that the SUMEX staff have developed for solving problems of system/user and user/user interaction continue to be models that we find it possible to apply to the fiutgers environment. III. FUTURE PLANS OF THE RUTGEtiS RESOURCE; RELATIONS EQ SUMEX-AIpi -I- Our plans for the future are to continue along the main lines of our current research. We expect our computing needs to grow at a rate of about 205 per year. About a quarter of our total computin g will be done at SU?,lEX-AIM; most of this work will be concerned with large program development (mainly in INTERLISP). In our application for renewal of the Resource grant (which is currently being reviewed at RIJJ) we propose to acquire and augment the RUTGERS-10 computer in order to provide sufficient capacity to satisfy the projected computing demand of the Rutgers Resource , and also to provide added computing capacity for the national AIM community and to enlarge the scope of the AIM resource sharing activities. We are proposing a XL-10 configuration with TGPS-20 software, which promises compatible operation with the TENEX system at SUZEX-AIM. We expect the configuration to have 50% more capacity than the pr esent RUTGERS-IO in the first year of the renewal period. Two thirds of the enhanced system capacity will be allocated to the Resource; this capacity share will be evenly divided between internal Resource projects and the national AIEl community. We expect the J. Lederberg Privileged Communication RUTGERS COMPUTEfiS IN EIX4EDICINE Section 6.2.6 RUT;;EXS-10 to be operated in close coordination with the SU:%CX-AIM facility, within a common management framework. This plan will provide an additional node to the AI4 network. tie envision a move towards specialization and differentiation of functions among the nodes in the network. Ne propose to use the hutgers Aim center for promotion of AI applications in clinical medicine (and in related biological modeling) with special emphasis on collaborative network- based projects of the type that have developed within our Resource to date. In addition to our computing plans, we propose to increase our AIM dissemination and training efforts (AIM Workshops, conferences, post doctoral program), and to continue our system development activities with the aim of enhancing scientific communications within the AIM community and between AIM reseamhers and other interested scientists. Xe expect increased collaboration with SLJH!U-AIM in these areas. Privileged ComuLu?ication 157 J. Lederberg Sect.ion 6.3 PILOT STA?IFOHD P3OJECTS 5.3 PILOT STANFORD ,PROJECTS The followin,o are descriptions of the inforaal. Dilot projects currently using the Stanford portion of the SUNEX-AI!,! resource pendinz fundins, and full revie&$ and authorization. J. Lederberg 153 Privileged Co!munication GEXETICS APPLICATIONS PROJECT Section 5.3.1 6.3.1 GENETICS APPLICATIONS PROJECT Computer Science Applications in senntins Prof. L. L. Cavalli-Sforza Department of Genetics Stanford University School of :4edicine We have been quite satisfied with the use of programs such as REDUCE, MLAB, SPSS. REDUCE has been used by graduate student D. Wagener, to check algebra, and also by L. Cavalli-Sforza and has been of great help in circumstances in which algebraic manipulations were too lengthy for hand verification. Unfortunately REDUCE has a maximum length of algebraic expansions that can be manipulated by computer, which is not always generous enough for our purposes; the maximum allowed was increased but there is now no warning as of when the length of expression overruns the new limits. The penalty is the total loss of the information. If this could be mended, the program would be much more useful. MLAB is very useful for least square fitting of complex systems of equations. SPSS is widely used and well known; it is working fine in the system. Special modellin; efforts involved: 1) a progran of information storage and retrieval which may be useful also for analysis of multi-dimensional contingency tables. The material to which it was applied derives from anthropological and archeological survey and excavation data in Calabria, Italy by A. Ammerman. The information collected on coordinates of sites, material found, elevation, land form, soil, ecological and geological data etc. refers to hundreds of sites and will eventually be subject to analysis according.to models of growth and spread of Neolithic populations. It is eventually hoped to investigate the porter of new techniques of statistical analysis, employing spectral analysis of the matrices representing the data. 2) Similar situations, on the basis of other data available from the literature, are also bein,g investigated by means of simulations of the population growth and spread, e.g. for the Bandkeramik populations in Central Europe. It is thus hoped to obtain, eventually, an explanation of the geo2rap:hi.c distribution of genes in Europe, the Middle. East and nearby areas, based on the hypothesis that the present distribution reflects predominantly a major radiation of a population of farmers irhich took place with the spread of agriculture from the Middle East, from 9030 to 5000 years ago. 3) The geographic distribution of genes, as observed today, is analyzed by means of gene frequency maps. de have developed many methods of interpolation of data for map construction, and many methods of graphical display of the maps obtained. We are currently comparing the methods of construction of maps. Some of the methods of construction are fairly sophisticated, but more work will be necessary to develop further our programs so that they can be considere-' to interpolate intelligently. Our tests of validity are based on e liminacing each observation in turn, computing its expected value with the observed one (a sort of jack- knifing). It is clear that results could be improved if this procedure could be carried out simultaneously for several genes and alleles; at the moment it is done for one allele at a time. The simultaneous analysis is an ambitious program but would considerably improve present resu1t.s. At t.he moTeat, for instance, we have no way to make ,gene frequencies of all alleles at a locus sum to 103% (except approximately, because we cannotconsider more thah one allele at a Privileged Communication J. Lederberg Section 6.3.1 GFXETICS APPLICATIONS PROJECT time). In addition, other information on the populations (whether they are isolates, etc.) could be introduced, and verified by the program. Also, specific hypotheses on the evolutionary factors affectin, '3` the gene frequencies could be tested more directly. At the moment, the major limitation to these more sophisticated analyses is the availability of computer space. J. Lederberg ,160 Privileged Communication BAYLOR-HETHODIST CEREBROVASCULAR PROJECT Section G .3.2 6.3.2 BAYLOR-METHODIST CEREBROVASCULB PROJECT -- Baylor-:lathodist Cerebrovaseular Project John L. Gedye, M.D. Data Services Research Laboratory Department of Neurology, Baylor College of Hedicine During the year the Data Services Research Laboratory has had a total of about 2,500 hours of man-effort available, of which about 5% has been devoted to activities directly related to the Sumex pilot study. I> Summary of research program - A) Technical goals The general goal of the laboratory - the creation of a computer-based system for the support of clinical research in neurology, as described in the 1975-76 annual report - remains unchanged. In spite of the limited manpower available during the year, good progress has been made toward the specific goal of developing the PDP11/35-based clinical research system 'CLINSYS' to a point where it can begin to give real support to Departmental projects. We have made good progress in recent weeks with the development of software which will allow easier access to the resources of SUIEX for users of our local system. It is now possible to give the command 'SU:E4' to our local system executive and have the entire login procedure through to receipt of the "final" S(J~:llzX '0 ' carried out automatically. Control characters allow the user's terminal to be switched between SU$EX and the local system, and these have been chosen to be compatible with the BAMANARD control characters, so that this can be operated without interference. Facilities have been provided which allow ASCII files to be be created on either system and transferred to the other. These facilities will operate under our local PDPl l/35 batch system, and we have tested them by creating a test data file of about 1,000 ASCII characters on an account on the ?DP11/35, and submitting a batch job (to run at specified time) which logs into S'JXEX, transfers the test data file and copies it back again ont.0 tne PDP11/35 account and logs out. It then logs in again and repeats the whole process with the latest copy of the file. In this way we hope to estimate the reliability of this form of data transmission - at present it looks as if the error rate will be less than 1 in 16,003 characters - and to lay the foundations for a system that will allow us to make maximum use of SUi%X off-peak time in the projects described below. Privileged Communication 151 J. Lederberg Section 6.3.2 BAYLaR-2ETHODIST CEREBRCVASCULAR PROJECT B) medical relevance and collaboration The development of CLINSYS has continued on the gnneral lines described in the 1975-76 annual report. Specific data acquisition procedures have been designed and implemented for: clinical psychology - both conventional and automated testing techniques have been accozmodated; clinical physiology - facilities for the manual entry of Xe133 inhalation regional cerebral blood flow measurements have been provided, and work is now in progress on a system for direct transmission of data to the PDP11/35 from the integral PDP11/05 which is part of the equipment ; and hematology - provision has been made for the acquisition of data from tests of platelet function. Because of it's central importance, a major emphasis has been placed on making provision for the acquisition of suit.ably sumnarised CT scan data, and a number of exploratory studies have been carried out with the result that we hope to have the first edition of a 'CT scan system' working in the near future. This will have an important part to play in future projects. No further progress has been made with the implementation of a work station incorporating the hand-held OCR wand developed by Recognition Equipment Incorporated - which was described in the 1975-76 report - but we intend to make US?? of such a 'wand' work station in the context of a system for acquiring data from the radiologist's 'CT scan report' as part of the 'CT' record. C) Progress summary The aim of our 'pilot study' remains unchanged - to formulate a project relevant to the activities of the Department which will provide an acceptable and legitimate 'point of entry' for artificial intelligence research, and which will allow the systematic formulation of objectives for the future. Work nas continued along the lines discussed in the 1975-76 report, usin?, as test data, results from 69 demented patients and 15 controls who had had regional cerebral blood flow measurements. This work has led to a promising 'AI' approach which is now being applied to CT scan data, and when the feasibility of this has been demonstrated the way will be open for work to go head on the implementation of a general purpose program. D) Publications 'There are as yet no publications dealing with the 'pilot study' as such. Certain aspects of the work referred to in this report have been mentioned in publications but these are all currently 'in press'. Details are available on request. J. Lederberg 162 Privileged Communication BAYLOR-METHODIST CEREBROVASCULAR PROJECT Section 6.3.2 E) FundinE status 1) Current fundinq Tne work is currently supported by a section of the 3-year grant for the Center for Cerebrovascular Research, but at the present time this is only approved up to January 31st, 1977. 2) Pending applications and renewals tiork is currently in progress on a grant application for submission by July 1st for support for the laboratory from April lst, 1978. This will concentrate on the use of CLINSYS to support of the study of brain-behavior relations in demented patients using CT scan data and the results of automated behavioural assessment. II) Interactions with the SUKX-AIM resource --- A) Little has so far been achieved by way of collaborations through the network, althou$l the SNDi%G facility has been useful for keeping in touch with contacts made at the 1975 workshop. It is hoped thoqh, that in the future we may be able to test out the concept of a CT scan archive created by the joint efforts of a dispersed community of users. B) For some reason I did not hear about the 1976 workshop until it was over, and so far have heard nothins about. a 1977 one. I found the 1975 workshop very useful, and would strongly support the continuation of the workshops in some form - particularly if one could get down to fundamentals with people working on similar problems. I have kept in close contact with Paul Blackwell at Columbia, itissouri since the 1975 workshop, and we last met at an N.S.F. Conference on 'MATHE13ATICAL STRUCTURZ IN THE HWIAM SCIEi\ICES' at Penn State in March. C) I have no criticisms of resource services beyond the usual one of slowness of response time at peak periods. III Follow on SUi6ZX Trant (S/78 -Z/m --- - -- ZAP A) The main long range user goal of relevance is the establishment of a demonstration CT scan reference archive usin the resources of SUiW. It is not clear just what resources this will need, but at the present. time it looks as if the feasibility of the approach could be established with an allocation of 500 pages of storage, and possibly less. B) The main justification for continued use of SUEZ is that it provides a unique opportunity to explore the possibility of settina, up a dispersed CT scan research community rdith a reasonably hi$h chance of being able to Privile-;ed Communication 163 J. Lederberg Section 6.3.2 BAYLOR-MET2ODIST CEREBROVASCULAR PROJECT demonstrate something of potential clinical value in the relatively short term. C) I would like to see attention given to the communications potential of the SUNE): resource. Ye have not been able to make full use of this in last two years because of a lack of local resources, but now that we have our local system interfaced we are beginning to get a real feel for the potentialities. We have also found that visitors to our laboratory are very impressed with the ease of setting up the interface, and many - including computer company representatives - have confessed to being unaware of the possibilities provided by the existing technology. In particular we have found little experience of the use of autodiallers. J. Lederberg Privileged Communication COt"IPtiTEH ANALYSIS OF CORONARY ARTERIOrJRXIJ Section 5.3.3 6.3.3 CO;IIPUTER ANALYSIS OF C3RCNARY ARTCi?IXRX% - Computer Analysis of Coronary Arteriograns Donald C. Harrison, ?4.D., Edwin L. Alderman, M.D., and Lynn Quam, Ph.D. Division of Cardiology, Stanford University Medical School The goal of this project is to develop computer techniques for automatic aquisition of the anatomic distribution of coronary arteries and a quantitation of the deqree of narrowing of these vessels. In order to do this, two different types of image processing techniques will be developed. First, a three- dimensional representation of the coronary arterial tree will be automaticaly constructed from coronary arteriograms taken sequentially from several different views. Second, the amount of stenosis will be measured by combining information from multiple sequential frames in order to improve resolution and reduce radiographic noise. RACKSROUAD: Coronary arteriography is the definitive test for the evaluation of patients with coronary artery disease. There is no other test currently available which provides information concerning the location and severity of coronary narrowings and the distribution of coronary blood vessels in the myocardium. Numerous studies document that prognosis in patients with coronary disease reflects the severity of anatomic disease. Coronary vascular anatomy and the extent of lesions are, in a epidemiologic sense, more precise indicators of prognosis than are clinical symptoms. At the present time, categorization of the extent of coronary vascular disease is based somewhat simplistically on the number of major coronary vessels involved and a rough estimate of the percentage obstruction. Computer representation of the coronary tree, coupled with either interactive or automatic entry of degree of stenosis will permit the development of more precise indices of anatomic disease of the ayocardium. Computer image processing techniques offer the possibility of objectively measuring the severity of coro.nary stenosis, both at the point of maximal narrowing and averaged over a segment of the vessel. APPROAW: An extensive set of image processing functions have been developed and applied to detect the regions of the arteriograms which correspond to the arterial tree. These regions are then transformed to a "skeleton" which roughly corresponds to the midlines of the vessels in the arterial tree. This skeleton is then transformed to a graph representation which can be topologically and geometrically analyzed to distinguish vessel intersections (in the 2-d projection, not real j-space intersections) from vessel bifurcations. The result is a ZJ-aph structure interpretation of the arterial tree with quantitation of the Privileged Communication J. Lederberg Section 6.3.3 COMPUTER MALYSIS OF COROilIARY ARTERIOrJRA?IIS locations (2-d) of bifurcations, and for each vessel segment the path of the vessel midline and the vessel diameter. The cornouter algorithms are described in more detail in the following sections. Data Aquisition: We have digitized a number of 35 mm tine frames from three subjects using both an Optronics film scanner and a Dicomed film digitizer operating at 25 and 50 micron pixel resolution. For each subject frames are manlually selected to provide good contrast in the proximal vessels from both LAO and RAO projections and be approximately synchronized within the cardiac cycle. Pre-processing: The digitized frames are computer enhanced using high frequency filtering to eliminate the x-ray exposure gradient and emphasize sharp edges which tend to correspond to the vessels. High contrast areas in tne enhanced frames are detected by a simple threshold region detector. Currently, many regions are detected which do not correspond to the arterial tree, but are caused by background features such as vertebra. We are in the process of digitizing another set of frames which have been chosen to include time synchronized pre-injection frames in order to permit background subtraction. The result of this step is a binary image corresponding to high density areas in the frame. The root of the arterial tree is manually specified by the operator, and a connected point region grower finds all points connected to the root. This usually finds all medium and large sized vessels, and some smaller vessels. Unconnected background is totally eliminated. Sometimes, substantial pieces of tne arterial tree are not connected to the root. ii/hen this occurs, the operator can run the region grower from new starting points. The result of this step is a binary image corresponding to most of the arterial tree. We expect that by using background subtraction we can very reliably detect the arterial tree and eliminate most of the manual "hand-holding" in the previous steps. Arterial Tree Graph Formation: The binary image of the arterial tree is "skeletonized" by computing the distance transform of the image and connecting peaks and ridges in distance. The distance transform computes for each point in the image, t.he Euclidean distance to the nearest zero (point not in region). Points at vessel midlines are easily detected because they are local maxima (ridges) in distance L^roa their vessel walls. The Z-dimensional array of ridge-pYa. 0-k information is next processed to form a graph structure describing the connectivity of vessel segments (distance ridges) to nodes (points where 3 or more ridges converge). J. Lederberg 166 Privileged Communication COi4PUTEH ANALYSIS OF CORONARY ARTERI3GRAHS Section 6.3.3 The graph is simplified by detectin 5 and eliminating insignificant terlninal segments which are usually the result of noise in the image. We have now accomplished a siqnificant simplification of the data from the original 2-dimensional array of x-ray density data to an essentially l- dimensional description of the vessel midlines and points of bifurcation and intersection. This data (when vessel width is included) is sufficient to completely reconstruct the binary image of the arterial tree. Topologic and Geometric Graph Analysis: The graph is next analyzed to determine the proximal-distal orientation of each vessel segment. Starting at the distal node of a vessel segment, all segments which are attached to that node aust be within 90 degrees in pointing direction. Any segment violating this rule is identified as an intersection. Starting from the root of the arterial tree, all segments are classified by this procedure. Nodes which have been identified as intersections are now analyzed in order to correspond distal segments with proximal segments according to the a set of rules about arterial topology and geometry. Having resolved vessel intersections, we now transform the graph to a simple tree structure which corresponds topslogicZlly to the arterial tree. Fut.ure Directions: The above computer algorithms have been successfully applied to the images in a few sets of digitized data. He plan to digitize frames prior to injection to enable backsround subtraction, which we believe will greatly improve the reliability and accuracy of the initial vessel detection. The algorithms have not yet been tried on cases with abnormal anziograms, and we expect that as more cases are incorporated into our image library, it will be necessary to develop more rules and analytical techniques in order to properly interpret the 2- dimensional images. Based on the encouraging progress which has been made in processing coronary arteriograms and based on other areas of expertise in i3a%e processing within the Stanford University Medical Center, we have developed and submitted on Noveaber 1, 1976 to the NHLEI a new grant proposal titled "Computerized Medical Image Processing Laboratoryl'. This proposal contains a detailed report of the progress had been made up to that tiae and details the further steps which we propose to pursue. USE OF SUYEX RESOURCE: --.-m---z- Work of this project has been dependent on the SUZEX facility for several reasons. First, this project has not been funded to Drovide its own computer facilities. Second, although the Stanford Division of Cardioloqy does have minicomputer systeas which could be used for this oroject, it is considerably Privileged Communication 157 J. Lederberg Section 6.3.3 COMPUTER ANALYSIS OF CORONARY ARTERIOGFAMS easier to develop image processing and artificial intelligence techniques on a larger scale system in which many powerful tools already exist. It is important in the research phase of this project to be able to easily and quickly perform experiments, without the difficulties of fitting the experimental programs into the small computer memory environment. We believe that our use of the SWEX facility is completely within the guidelines for SUMEX use, since our primary purpose is to develop image analysis and understanding techniques for the quantitation of coronary artery disease. A secondary result of this research project is the development of general purpose image analysis and modelling algorithms. SPONSORSHIP: Granting agency: NIH Grant : 5 RO 1 HL18873-02 Period of award: 06/01/76 - 05/31/78 Current annual funding: $20,807 + indirect costs J. Lederberg 158 Privileged Communication QUANTUPI CHEWICAL INVESTIGATIONS Section 6.3.l+ 6.3.4 QUAl\rTUM CHEMCAL INVESTIGATIONS Theoretical Investigations of Heme Proteins and Opiate Narcotics Dr. Silda Loew Department of Benetics Stanford University (Grant, PCX 76 07324, 2 years, $20,500 this year) SUILifiX is used for the calculation of various one-electron electronic properties of iron containing compounds. The programs were formulated and written by David Steinberg, Michael Chadwick and David Lo. David Lo was responsible for converting the program for interactive use on the PDP system. Slight improvements were made by Robert Xirchner and Sheldon Aronowitz has expanded the formulation to include additional spin and oxidation states of the iron atom. The properties that are calculated include the electric field gradient at the iron nucleus, quadrupole splitting, isotropic and anisotropic hyperfine interaction, spin-orbit coupling and zero field splitting, g values and temperature dependent effective magnetic moments. The calculated values are compared directly to experimental results obtained from published Mossbauer resonance and electron spin resonance spectra. Such a comparison determines not only the reliability with which these properties can be calculated but also gives an indication of the ability of the model of the iron active site to mimic the actual environment found in a particular compound or iron containing protein. The major input to these properties programs is a description of the electron distribution of the compound under consideration. This description is obtained using a semi-empirical molecular orbital method employing the iterative extended Huckel procedure. Such a calculation requires up to 65Oh' core and is performed elsewhere. When the calculated electron distribution yields a set of calculated properties in agreement with observation, we have increased faith in the description of the model of the active site and can carry the model one step further to make qualitative inferences about certain properties relevant to the biological functioning of the compound. We are currently performing a systematic study of heme proteins. The electromagnetic properties of these proteins and of synthesized model compounds which mimic the observed behavior of the proteins have been well studied experimentally. Specifically, we have addressed the following problems: (1) Cooperativity of oxygen binding to hemoglobin. Calculations have been made for high and low affinity forms of deoxyhemoglobin. This work has been submitted to Nature (Loew and Kirchner). (2) The nature of oxygen binding to the hene unit. Calculations were made of model oxyherne compounds with varying olcjrpn geonetry and electron configuration. This work is now in press in the Journal of the American Chemical Society. (Kirchner and Loew) . Privileged Comsunication J. Lederberg Section 6.3.4 QUMITLJM CBEKICAL INVESTI*2ATIOMS (3) The enzymatic cycle of an oxiclative metabolizing heme enzyme called cytochrome P-4513. This enzyme is responsible for drug metabolism and toxicity and for activation of many chemical carcinogens. Preliminary characterization of the enzymatically active state has been made. This work is in press in the Journal of the American Chemical Society (Loew, Rert Hjelmeland and Kirchner). In a completely different context, we have been using SUMX to calculate the conformation of pentapeptides (enkephalins) which have been recently found to be endogenous opiates. The aim of this study is to determine in what way, if any, they can mimic the structure of prototyp e opiates such as morphine and meperidine. For this work, we use a protein conformation program with empirical interaction potentials. Quantum mechanical conformations calculations of the SXle peptides are being performed by us elsewhere and the results of the two methods being compared. J. Lederberg 170 Privileged Communication PILOT AIN PROJECTS Section 6.4 5.4 _____. PILOT AIcl P3OJECTS The followin are descriptions of the informal pilot projects currently USin; the AI&l portion of the SiTP?C,X-AIP'l resource pending funding, and full review and authorization. Privileged Comnunication 171 J. Lederberg Section 6 -4.1 COF4iQJ~~ICATION ENHANCEWENT PROJECT 6.4.1 CO~MJXICATION ENHANCEL'IENT PROJECT --- Communication Enhancement Project John 3. Eulenberg, Ph.D. and Carl V. Page, Ph.D. Department of Computer Science Michigan State University I) Summary of research program. - A) Technical goals. The major goal of this research is the design of intelligent speech prostheses for persons who experience severe communication handicaps. Essential subgoals are: (1) Design of input devices for persons with greatly restricted movement. (2) Development of software for text-to-speech translation. (3) Research in knowledge representations for syntax and semantics of spoken English in restricted real world domains. (4) Development of micro -computer based portable speech prostheses. a> Medical Relevance and Collaboration. We have exchanged visits and had many conversations with Dr. Kenneth Colby of UCLA who is working on similar probleills for a domain of people who have aphasia. The need for such technology in the medical area is very great. Millions of people around the world lead isolated existence s unable to communicate because of stroke, traumatic brain injury, cerebral palsy, and other causes. The emergence of inexpensive micro-processors and sound synthesizers makes it possible to develop devices now that can be the prototypes for widespread use. We have organized institutes to bring together the many professionals who have an interest in this area. Together with the Tufts ?Jew England Medical Center, the TRACE Center of the U. of F!isconsin, and the Children's Hospital at Stanford, we have begun the first newsletter for dissemination in this area. Dr. John B. Eulenberg helped to organize the first Federal workshop for governmental agencies who have some interest in funding work in these areas. Represented were the Bureau of Education for the Handicapped, The Veterans Administration, NIMH, NIWCDS, NSF, and others. We have also been in touch with United Cerebral Palsy associations at the state and national levels. There is much interest in this area from medical, educational, and governmental communities, b!Jt no traditional means of supporting it. J. Lederberg 172 Privileged Communication COMMU&ICATION EFJHAYCEMiENT PROJECT Section 6.4.1 C) Progress summary. Although some facets of the research have been underway at ?4SU for several years, we have been using StiIZIEX-AIM for only six welts at this time, having received our password in March, 1977. During the last six weeks, we have: 1) Designed and built hardware and software allowiny us to transmit files to SUMEX from our Nova 2/10 at 330 baud. 2) Organized a research team of 4 students posessing background in artificial intelligence led by Dr. Carl V. Page to develop a semantics- based speech generator. We expect to have a prototype running in June (written in SAIL). To this end we are concentrating on semantics associated with personal needs, small talk (weather etc.), and perhaps obtainins geographic directions. 3) Have begun conversion of ORTHOPHONE, MSU's large English text-to-speech program from its CDC5500 Fortran implementation to a SAIL version. 4) Obtained temporary local support for terminals and tie-lines to use the SUMEX-AIM facility. !ie requested these in our original proposal but were not granted them. We have to share with others in the use our tie-lines and terminals. At present the lack of a dedicated tie-line from East Lansing to Tymshare in Ann Arbor or Detroit is a problem for us during 0600 to 0900 PST. During the past few months, Dr. Richard Reid of our project has: 5) Developed a personal communication system for a lo-year-old person who has cerebral palsy. It is micro-computer-based and can accept inputs via an adaptive s-witch from a series of menus displayed on a TV screen, via Morse code, or by a keyboard. Its outputs can be TV display, hard copy, Morse code, spoken English, Morse code, or musical sounds. We expect to use knowledge gained from the SUMEX-AIM semantics project to specify the content and connection of the choice menus for this project. During the past three months, 6) We have begun to experiment with the interaction of knowledge sources (letter and word frequencies, syntactics, semantics and pragnatics) as a means of anticipating likely inputs and displaying them for a person to choose from. 7) Built and tested a nyoelectric interface and used it (together with a miniature FM transmitter) for input of changing muscle potentials into a computer. There is reason to believe that this means of input may provide a higher bit rate than any other known means for those people who experience severe motoric problems due to cerebral palsy. Privileged Communication 173 J. Lederberg section 6.4.1 COXXLJ?IICATION ENHANCEMENT PROJECT D) Up-to date list of publications. (1976 to date) For John D. Eulenberg: "Technical Systems Development, Headend", Interim Report, April, 1976, Experimental Applications of Two-;iay Cable Delivery, NSF Grant No. APR 75-14286. "Interactive iuew Hired Information A-m- --+s.s System with 90th Voice and Hard copy output: User's Guide to NHQU3RYn, April 11, 1976 (With Steven Kludt and Jerome Jackson (Artificial Language Laboratory Report AEB 041176)) "Language Individualization in a Computer-Dased Speech Prosthesis System", National Computer Conference, New York, June 9, 1976. "Individualization in a Speech Prosthesis System", Proceedings of 1975 Conference on Systems and Devices for the Disabled, June 10, 1976. "The LEAF Language", Interim Report, September, 1976, NSF Grant No. APR 75- 14286. '*A Programmable Multi-Channel Modem ,3tLltput Swit.ch", September 22, 1976, with Joseph C. Gehnan and Juha Koljonen (Artificial LanKuase Laboratory Report ASB 092276) "SILIPTE Time Code Interface and Computer-Controlled Video Switcher", with LYichael Gorbutt and Dennis Phillips, Interim Report, March, 1977 NSF Grant APR 75-14285. For Carl V. Page: "Heuristics for Signature Table Analysis as a Pattern Recognition Technique", IEEE Transactions on Systems, Man and Cybernetics,Vol. SMC- 7, No. 2, February 1977. "Discriminant Grammars, an Alternative to ?arsinglt. with Alan Filipski, Proceedings of the IEEE iiorkshop on Picture Processing, Computer Graphics, and Pattern Recognition, April 22, 1977. "Pattern Recognition and Data structures". Chapter in "Data Structures in Computer Graphics and Pattern Recognition" idited by Allen Klin;er, Academic Press, 1977. During 1976 Dr. Eulenberq presented 15 lelture s around the country on his research, was interviewed for TV eight times and was on radio five times. E) Fundin Status. 1) Current funding. Wayne County (Detroit) Intermediate School District. $230,000. (second year) Jackson County Intermediate School District $21,500 (Second year). Both of these are on a one year at a time basis. J. Lederberg 172 Privileged Communication COPII'IUMICATION ENHANCEMEiiT PROJECT Section 6.1+.1 Some of this money is being used to purchase equipment which is the property of WCISD or JCISD for use in demonstration classrooms in the schools. Very little of it can be used to support the research seals which we have communicated to SUN%-AIM because of other commitments in the grant. However, the special communication devices, students, and other research facilities provides the critical ;nass which will allow us to do the work that we have proposed. 2) Pending applications and renewals. State of fiichigan Vocational Rehabilitation Services $30,000. (application) United Cerebral Palsy Association of Michigan $5O,c)OO. (application) United Cerebral Palsy Association (National) $60,000. (For study of control by myoelectric inputs) (application) Oakland County Intermediate School District $2OO,OOc). (application) Genessee County Intermediate School District $2Oo,Goo. (Being written) As one can see from this list of sources, there is a lot of interest in this area from agencies which are not experienced in funding high-technology and research, since a mandatory special education act has become law in Michigan. Ii) Interactions with the Sti?IEX-AIEI resourc* - --- ---- 222 Again we point out that we have been a part of this community for only about 6 weeks and we will have more to say next year. A) Examples of medical collaboration and medical use of programs via SUN%. The faculty in the MSU College of Buman Medicine who teach medical decision making were shown a demonstration of the SUbtEX system, MYCIi< and PARRY. We plan to present a demonstration to advanced medical students and faculty at the Medical School in the near future. A member of our Medical School faculty, Dr. Richard Ropple, an expert on myoelectronics, is a member of of our research group. The Dean of our College of Human Medicine visited our laboratory in April, 1977 and we expect encouragement and collaboration. B) Examples of sharing, contacts, and cross-fertilization with other SUMEX-AIN projects. 1. We have met with Dr. Kenneth Colby on many occasions including the SUNEX-AII.1 workshop in June, 1975. Our work in many ways complements his and we have had several worthwhile interchanges of information. We are Privileged Communication 175 J. Lederberg Section 6.4.1 COMMUNICATION ENHANCEMENT PROJECT converting our major software programs for speech generation and adaptive inputs to the SUiEX AIM system in part so that they can be used by Dr. Colby and his group. 2. Mr. Douglas Appelt, a doctoral student at SU-AI was our principal systems programmer last summer. He is currently doing research in the same area as ours with Dr. Gary Hendrix of SRI. We have used his knowledge of your system (via the message sending routines) to assist us in starting our project. Mr. Appelt will be working with us at MSU again this summer (June-Sept., 1977), and he will be using the SUMEX-AIM system. C) Critique of resource services. We have found the HELP files to be a lot of help. We are beginning to understand our own needs and your services to the extent that it may be helpful to meet with one of your staff. Dr. Eulenberg will be in California in early June and plans to visit your facility. However, we have found that your system is easy to use and do not feel more distant from you than from other computer installations on our own campus. III) Follow-on SUMES grant period (8/78-7/83). A) Long-range user project goals and plans. We want to do fundamental research in artificial intelligence in the context of the generation 'of speech from very minimal amounts of input. This problem seems closely related to the understanding of speech. It seems that the methods of representation of.knowledge used for speech or vision understanding can be used in a natural way'for fluent generation of speech. Our area seems almost unique in AI in that it is socially desirable (without question). Even relatively primitive systems can improve the quality of life for hundreds of thousands of people. Major long range goals are: 1) To do research in transposing the vocal tract to another region of the body in which an individual has suitable myoelectric control for the generation of speech. 2) To define a suitable system of semantics and to encode world knowledge in that system that would be useful for the generation of speech fluently. 3) To discover primitive operations on semantics which allow new and appropriate combinations of speech to be generated. (Using other sources of knowledge.) J. Lederberg 175 Privileged Communication COWirNICATION EXHANCEMENT PROJECT Section 5.4.1 4) To develop means for individuals who are phjrsically unable to use standard input devices to program and personalize their own speech and environmental control system. 5) To study means of using speech output to aid blind persons both throu:gh experiments with simplified text to so .eech devices and through means of training blind persons to write in cursive and manuscript. 6) To study the educational consequences of communication aid systems for individuals who, because of previous misdiagnoses as mentally impaired, have been excluded for the mainstream education system. 7) To improve the prosodic qualities of generated speech, using its semantic aspects. 8) To design portable speech prostheses which allow maximum use of state of the art kno\ Summary of research program - The goal of this research project is to develop new methods for the design and analysis of organ culture experiments, using techniques of artificial intelligence. The cultivation of organ fragments is an important method for the study of disease processes. In contrast to cell culture, organ culture is designed to . inhibit outgrowth of cells and to deal with normal tissue relationships as they exist in the body, divorced from the complexities or organ interaction. The tec'hnique involves the maintenance of differentiated cells as a group within their normally associated tissues. With an ability to maintain differentiated tissues in culture, a direct histologic and biochemical assessment of factors influencing an organ is possible. Such a biologic model would permit investigation of the structural and functional effects of various substances directly on the target organ. With a chemically defined medium, the technique would allow a simultaneous evaluation of metabolites or hormones released by the organ fragments. The research is being done in collaboration with Professors Raymond Kahn, Theodore Fischer, and William Burke1 of the Department of Anatomy, the University of ?lichigan Medical School. We have been working on methods of image analysis of microscope slides. This has been approached from two directions. On the one hand we are writing programs for special image analysis hardware. These programs will calculate various indices of the condition of the cultivated organ fragments based upon measured morphological features. The second approach is to translate the biologist's verbal descriptions of microscope slides into computer data structures which encode conditions not detectable by our image analysis programs, though readily seen and reported by trained human observers. We have developed a dictionary of anatomical terms and programs for morphological analysis. At present we are working on the syntactic analysis of the scientist's verbal descriptions. A grant application titled "Application of Computer Science to Organ Culture" has been written and will be submitted to the National Institutes of Health on June 1, 1977. Current support is from the University of Michigan with computer services supplied by SUMEX-AIM. Privileged Communication 18') J. Lederberg Section 6.4.3 ORGAN CULTURE PROJECT II) Interactions with the SU%X-AI!4 resource --- We have had valuable contacts with members of the DENDRAL project and the 143LGEN project, which share certain goals and ze'uhods with our own work. The resource services received from SUZEX-AI31 continue to be excellent. The staff is very helpful, and the system is wzll-aaintained and reliable. The only serious difficulties which arise are due to syste? saturation and limited file space. III) Follow-on SUMEX >zrant_ period Our proposal, if funded, would commit us to expanding our efforts to develop a histology knowledge base and methods to rationalize the design of organ culture experiments. This would involve heavier use by a larger group of the SUr~~X-AII4 resource. Our work to date, though of limited scope, is encouraging. The work is dependent upon continued availability of the SiiMEX-AIM system, which we would like to see expanded not only to provide more services for present projects, but to include a wider range of relevent bio-medical and artificial intelligence research. The commonality of resource and the opportunities for communication which SUKZX-AIM provides are extremely valuable in our view. Given the community of resource consumers attracted b;i SU~~3X-AIH, we think it would be an excellent focus for the encouragement of neM techniques, new ideas in programming languages, and increased variety of input and output media. J. Lederberg 190 Privileged Communication NEUROPROSTHESES PROJECT Section 6 -4.4 6.4.4 NEUROPROSTHKSES PROJECT Neuroprostheses Project M. G. Mladejovsky, Ph.D., Director Division of Artificial Organs University of Utah Zedical Center Salt Lake City, Utah 84112 I. Research Summarl --- Our research involves the investigation of artificial vision by electrical stimulation of visual cortex and artificial hearing by electrical stimulation of the cochlea. This effort has involved the collaboration of several people from many disciplines, not only from the University of Utah, but also from the Ear Research Institute, Los Angeles; University of Western Ontario, London, Ontario; and Columbia University, New York. The instrumentation involved is controlled by a minicomputer system consisting of a PDP-8 and a PDP-1 l/05. Experimental protocols are implemented by programs running in the PDP-11. We sought access to SUMEX in order to use the aLIS%11 compiler which runs on the PDP-10. We are using BLISS-11 as the implementation language for an interactive programming system which will enable more flexible control and variation of our experiments. The base language we are using is BALM (?+Ialcolm Harrison, "BALM Programmer's Manual", Courant Institute, NYU, 1974). This language is defined in terms of an abstract machine called the NBALM machine. The plan of attack is as follows: 1) implement the MBALM machine in BLISS-11 2) bring up BAL!q, using a dummy garbage collector and no virtual memory 3) implement garbage collection and virtual memory 4) add floating point operations 5) add a graphics package 5) add real-time capabilities 7) provide an interface to PDP-11 machine language The project has progressed to the point that step 2 is almost complete. This has involved installing a new version of BLISS-11 at SUMEX, writing software to allow file transfers between SU?lBX and our PDP-11 (which is connected to the Utah-TIP as a terminal), writing HBALM and various support routines in BLISS-11, implementing an I/;) package for BALM in assembly lsnn,uair,e, and performing a bootstrapping process with the BALM self-definition. Our schedule calls for completing steps 3, 4, and 5 by 1 July 1977. Steps 6 and 7 have not been planned in detail at this time. Privileged Communication 191 J. Lederberg Section 6.4.4 MEUROPROSTBESES PROJECT We are planning to run the resulting programming system on our PDP-ll/05 with 28K core, GT-40 graphics system, and running the RT-11 operating system. Modifying the system to run under a different operating system should be straightforward. However, whether the system will run efficiently on a machine with less than 2UK core is questionable. It is too early now to say. There have been no new publications by our group since our application was filed last year. Currently several papers are in progress but have not yet been submitted for publication. A partial list of previous publications is attached. When the BALM system has reached a stable state, we will be happy to provide documentation and sources for it to anyone who requests them. The support for our human C. Fleischmann Foundation. This proposal is now being prepared. Ii. Interactions with SUMEX -- experiments is provided by a grant from the Max grant expires 30 June 1977, and a renewal We have been perfectly satisfied with our use of SUIs!EX. By far our greatest use of the system has been of text editors and the BLISS-11 c0mpiler. We have also become acquainted through SU.`Q,v n with the OHSJIGRAPH graphics package available from NIH and have obtained a copy of the ONtiIGRAPtI manual. We have not used O&li\lIGRAPH yet but may wish to in the future. bJe are considering the features of OIIINIGRAPH in the design of the graphics package for our interactive system. Ne are quite interested in using the IViINSAIL system being developed at SU?lEX and have been told that RT-11 is one of the first operating systems under which it will be available. III. Long-range Plans Our plans for the period beyond July 1978 will depend to a large extent on results of experiments which have not yet been performed. Our use of SUMEX for the purpose of developing an interactive programming system will presumably be complete sometime in 1977. It is possible that future needs will require non- real-time access to a machine of greater capabilities than our PD?-11/05 and PDP- 8. IV* ---___ Publications Dobelle, W. H., Mladejovsky, M. G., and Girvin, J.P. Artificial vision for the blind: electrical stimulation of visual cortex offers hope for a functional prosthesis. Science, 183, 1 February 1974, 440-444. Dobellc, GJ. H., and ?+lladejovsky, PI. G. Phosphenes produced by electrical stimulation of human occipital cortex and their application to the development of a prosthesis for the blind. J. Phsiol., 243, 1974, 553-576. J. Lederberg Privilege. Communication NEUROPROSTHESES PROJECT Section 5.4.4 Dobelle, W. H., Mladejovsky, N. G., Evans, 3. R., Roberts, T. S., and Girvin, J. P. 'Braille' reading by a blind volunteer by visual cortex stimulation. Nature, 259, 15 January 1976, 111-112. Xladejovsky, M. G., Eddington, D. K., Evans, J. R., and Dobelle, W. B. A computer-based brain stimulation system to investigate sensory prostheses for the blind and deaf. IEEE Trans. Biomed. Eng., X93-23, 4 July 1976, 285- 295. Mladejovsky, M. G., Eddington, D. K., Dobelle, W. !I., and Brackmann, D. E. Artificial hearing for the deaf by cochlear stimulation: pitch modulation and some parametric thresholds. Transactions of ASAIO, 21, 1975, I-6. Privileged Communication 193 J. Lederberg Section 6.4.5 MATHE~IATICAL MODELING OF PHYSIOLOGICAL SYSTE"IS 6.4.5 MATHEMhTICAL MODELING OF PHYSIOLOGICAL SYSTEMS - t4athematical Hodeling of Physiological Systems John J. Osborn, M.D., Director Research Data Facility The Institutes of Medical Sciences San Francisco, California 9411'5 The overall goal of the Institutes of Medical Sciences's collaboration with SUMEX is the application of computer technology to clinical medicine. Our efforts during the past year have been in the fields of knowledge based engineering and mathematical modeling. We are using our available computer based physiological measurement systems to provide the basis on which physiological interpretation is being developed using knowledge engineering, and to provide the data with which mathematical models are being developed using the SU:QZX modeling facility. Project support: Granting Agency: NIH Grant Number: NE%00134 Total period of the award: 3 years Current year: 3 Current funding: $45,570 Granting agency: NIH Grant Number: HRr)2917 Total period of the award: 3 years Current year: 3 Current funding: $198,839 BIOi4EDICAL KNOWLEDGE ENGINEERING IN CLINICAL MEDICINE (KEMED) The REMED system is conceived 2s an application of the discipline of heuristic based programming to the interpretation of measurements made in clinical medicine. The long range ,goal of the project is to do research on a biomedical knowledge-based system for interpreting the clincal significance of physiological data. This interpretation will be used to aid in diagnostic decision making and the selection of therapeutic action. Even the best measurements often go unused because of the reasonable reluctance of clinical staff to make measurements whose results they only poorly understand and whose relation to clinical management is ambiguous. We will use techniques of biomedical knowledge engineering to extract and systematize the heuristic knowledge used by experts in the practice of their clinical art. These techniques will be used to construct and utilize a knowledge base to guide inference making by computer programs. J. Lederberg 194 Privileged Communication NATHE?iATICAL HODELIIG OF PHYSIOLOGICAL SYSTEMS Section 6.4.5 The first program in the KENEC system is designed for interpretation of standard pulmonary function laboratory test data. A knowledge base was developed for interpreting the relationship between measured flows, lung volumes, pulmonary diffusion capacity and pulmonary mechanics and the standard diagnoses of pulmonary function. The knowledge base includes interpretation of measured test results and diagnosis of the type and severity of any pulmonary disease which may be present. The program is being developed as an extension to the MYCIN formalism, and it makes extensive use of the !4YCIiq structures and programming system. Funding has been requested to continue this work. MATHE?dATICAL MODELING OF PHYSIOLOOICAL SYSTE:G Mathematical models of the cardio-pulmonary system are being developed to extract clinical physiological information from data acquired by the patient monitoring system. two approaches are being taken: 1) parsimonious models of the dynamic behavior of CO2 followin; an increase in inspired oxygen concentration are being developed for automated patient monitoring application, and 2) a detailed model of the regional behavior of radioactive tracers in the lung is being used as a standard for evaluation of the previous models. The MLAB (Modelling Laboratory) program, available on SUNEX is being used extensively for model development by simulating hypothesized models and for data analysis, i.e., identification of model parameters fron experimental data. The CO dilution method has been applied successfully in the ICU and additional fun ing requested. Iis- Two new methods for measuring regional lung function 7dit.h radioactive tracers have been developad where NLA3 was es sential and further funding has been requested. T-ILAB was used to perform an error analysis of the method for measuring regional pulmonary shunt fraction. Also, using i4LAB model simulation to understand the complex dynamics of 133-Xenon in the lung-tissue system, a method for measuring intraregional ventilation/perfusion ratio maldistribution has been developed which significantly extends the sensitivity of previous Imethods. A model of the oculatory system is presently being developed on MLAB in collaboration with the Smith-Kettlewell Institute of the Visual Sciences. We anticipate that their model will be used in the future for treatment of patients with strabismus. Interface with SUMEX de use SUINEX through the Tymshare network using a terminal. The text editing facilities of SUMEX, including both text editing and message sending, are excellent additions to our in-house facilities (PDP-11 based system). The message system is particularily useful for comaunicating ideas and questions with other colleagues using the SU;lEX system. Our principal difficulty with STJMEX is turn-around time. Both the MYCIN amd >ILAB systems are interactive, and the 30-60 second time response times associated with I".lYCIG a.?d ?lLAB jobs are at best discouraging. We have a strong desire to develop in-house capabilities in artificial intelligence. Cle have already invested significant numbers of hours in developing competence with the MYCIN system, and we are confident of developing an extremely capable staff in heuristic programming. An in-house AI computational capability is a more difficult capability to conceive. Developing Privileged Communication J. Lederberg Section 6 -4.5 MATHEMATICAL ilCDELING OF PHYSIOLOGICAL SYSTEMS artificial intelligence programming facility on a PDT-11 based system remains a significant long-term interest. The satellite capability offers both the potential of not continuing to provide additional load on SUMEX, and it offers tnz potential of more rapid interaction with the user. The SUMEX facility contributed to the following qrant applications and articles: Requested Funding: 1) Biomedical Knowledge Engineering in Clinical I?edicine (NIH) 2) Pulmonary Function in Acute Illness (NIH) 3) Computer Laboratory for Clinical Support (NIH) 4) Improvement in Regional VA/Q Resolution (UIH, USAF, USN) Bibliography 1) Simulation to Relate t4easured Gas Concentrations at the Mouth to Pulmonary Mechanics and Perfusion. J.C. Kunz, R.R. I`litchell, D.H. McClung, J.J. Osborn, Submitted to the 1977 ACEMB. 2) Identifiability of Pulmonary and Recirculation Parameters Fol-lowing Sequential Bolus Inputs of 133 Xe. R.R. Mitchell, R.J. Fallat. Submitted to the 1977 ACE.%. 3) Simulation of Intraregional Ventilation-Parfusion Ratio Mal-distribution. J.C. Glaub, R-R. t'4itche11, R.J. Fallat. Submitted to the 1977 ACEMB. 4) ?!Ieasurenent of Residual Volume and Ventilation Distribution Using Helium and a Five Vital Capacity Breath Maneuver. R-3. i+iitchell, Technical Report 32, Institutes of Medical Sciences, Feb. 1977. 5) Identification of Human Oculomotor System Parameters with Application to Strabismus. N.K. Gupta, A.V. Phatak, Systems Control; R.R. Flitchell, Heart Research Institute and Carter Collins, Smith-Kettlexell Institute, Institutes of Medical Sciences. Submitted to Joint Automatic Control Conference, 197.. J. Lederberg 196 Privileged Communication PUFF/W PROJECT Section 6.4.6 6.4.6 PUFt'/VM PROJECT PUFF/WI - Pulmonary Function and Ventilator flanagement Project John J. Osborn, M.D. The Institutes of Medical Sciences (San Francisco) and E. A. Feigenbaum Computer Science Department, Stanford University Note: The PUFF/W project is the outgrowth of the efforts of Prof. Feigenbaum's group at Stanford to establish new applications areas for AI in medical research, It represents a collaboration with Dr. Osborn's group which has been working on another AIM pilot project titled "Mathematical Modeling of Physiological Systems". A PUFF/V:4 proposal is currently pendin 2 with NIH and and PUFF/WI is being reviewed in parallel by the AIM Executive Committee for separate pilot status. 1. General Problem Measurements of patient physiology have become universally accepted as important parts of the delivery of clinical medicine, Good, useful measurements often go unused, however, because of the legitimate resist.ance of attending staff to using measurements which they poorly understand. Thus, technology contributes to clinical medicine if: -- It's so useful, economical and easy to use that everyone can use it (e.g.: SMA-12, Brain scanner, Paps) -- It's so useful, economical and has been around long enough that many people have been trained to use it (e.g.: ECG in ICU). The dissemination of nelzl technology in clinical medicine is limited by the ability of the system of medical care delivery to accept and assimilate the interpretation of the results of the technology. Given that the technology is useful in knowledgeable hands, this rate of assimilation is related somewhat to cost, but more to the rate at which education progresses. The new computer axial tomography systems have been accepted rapidly (tuo neighboring hospitals near San Francisco made headlines when each tried to purchase $200,9flc) devices) because the measurements they make are useful, and they are readily interpreted by staff, A system of medical technology should: -- :4ake clinically important physiological measurements; -- Get data automatically, accurately [done often]; -- Recognize irrelevant data, poor data and artifact [rarely done]; Privileged Communication 197 J. Lederberq J Section 6.4.6 PUFF/VM PROJECT -- Interpret clinical significance of data in light of limitations of the data collection and analysis [almost never done]; -- Operate economically. Systematic interpretation of test data is both possible (if the problem has a restricted domain) and desirable (because interpretation will be consistent for all and usable without direct supervision of a specialist). 2. Objectives 2.1. Overall Objectives: Our immediate objective is to develop a computer programming system for interpreting the clinical significance of measures of pulmonary function. We hope to develop this system for diagnostic use in the pulmonary function laboratory and to aid diagnosis and ventilator management of respiratory insufficiency in the intensive care unit. We hope to demonstrate the clinical effectiveness of such a system for improving the accuracy and timeliness of dia,gnosis. Our long range goal is to develop an integrated system for making and interpreting measures of pulmonary function. We believe that this is possible because of the present and potential contribution of instrumentation and data analysis systems to the diagnosis and clinical management of pulmonary distress. We believe, in addition, that the discipline of knowledge-based heuristic programming is potentially the best basi s on iJhich to develop a system for automaticaly interpreting the results of the measures of pulmonary function. We aim, in the long run, to develop an inexpensive enough implementation that the system will find wide acceptability in the delivery of clinical care. 2.2. Pulmonary Laboratory: 3ur objective for this project is to develop a heuristic program for interpreting the results of standard pulmonary function tests. The program will identify the need for repeated measurements because of poor patient effort; identify the need for additional information in order to make a more definitive diagnosis; report and explain the reasons for primary and secondary diagnoses and severity of any disease state; referral diagnosis; identify the relation between diagnosis and any interpret any change from previous tests or limitations on the interpretation because of the test methodology and the patient effort. We propose to: implement the system using a significant extension of an existing system of heuristic methods; extend the existing system to add new pulmonary disease diagnosis decision rules; develoo models for directing program execution, achieving faster performance, and detecting and interpreting t`he clinical situation in terms of any inconsistent data; facilitate model acquisition. J. Lederberg 198 Privileged Communication PUFF/VI'{ PROJECT Section 6.4.6 2.3. Intensive Care Unit (ICU): Our objective for this project is to develop computer programs f'or a system to interpret results of tests of pulmonary funci;ion in the hospital Intensive Care Unit. The program will interpret and explain the results of test measurements used to diagnose respiratory insufficiency; suggest initial settings for a ventilator for the patient with respiratory insufficiency; diagnose need for change in ventilation for the patient on a mechanical ventilator; and diagnose appropriateness of moving forward or back in the process of weaning the patient from the ventilator. We will implement the system using a new heuristic based interpretation system capable of interpreting continuous data fron: the changing patient situation. The system will allow goal-oriented and data-driven invocation of interpretation rules from the knowledge base. 2.4. Progress Evaluation: Our objective for this project is to conduct major evaluations of the direction and schedule of the above projects. These evaluations will be conducted near the end of the first and second years of the project. The evaluations will help assure the soundness of th e co;nputer science and the clinical investigations. Outside experts in clinical medicine and computer science will participate in the evaluation process. 2.5. Advantages of Collaborative Effort between 1% and the Stanford Heuristic Programming Project The collaboration offers a complementary blend of medical and computer science knowledge: -- Clinically important problems: Interpretation of pulmonary measurements, both in lab end ICU. -- Auto data collection and analysis in pulmonary lab and in ICU using computer. Data has demonstrated value in clinical medicine; Well understood procedures for collection, interpretation, use of data. -- Having computer data collection, automated interpretation is logical next step. -- Use all power computer science has available; discard excess in application specific implementation after designing into implementation the important features. -- The SUMEX charter from NIH includes exporting artificial intelli.Tence techniques (AI) to a larger community, and MS is an excellent potential colleag:ue. IMS has real clinical problems which can use AI effectively; biomedical engineering, statistics, and mathematical formulation of problems to contribute to AI; strong clinical orientation to give AI practical use. Privileged Communication 139 J. Lederberg Section 6.4.6 PUFF/VH PROJECT 3. Specific Aims A. Develop an integrated knowledge-based system for interpreting standard pulmonary function test results. B. Develop an integrated knowledge-based system for interpreting tests and observations used for diagnosis and treatment of respiratory insufficiency in the ICU. C. Conduct major project evaluations, usins outside experts in clinical medicine and computer science, to revie;l progress to date and to help identify protmising directions for continuing research. To these ends, we will: 1. Develop a knowledge base for pulmonary function laboratory test. interpretation, including rules to: -- Interpret results from spirometry , body plethosmography and measurement of diffusion capacity for CO; -- Diagnose the presence and severity of obstruction, restriction and diffusion defects; -- Diagnose the presence and severity of obstructive subtypes (asthma, bronchitis, emphysema); and -- Identify poor test results and the need for new information to make a more definitive diagnosis. 2. Implement rules for pulmonary function test interpretation using a significant extension of the existing p':PCIN formalism. Heuristic and mathematical models of "prototype" disease states will be used to: -- Identify the presence of supporting and conflictin evidence for a primary interpretation; -- Interpret the clinical significmne of measured data both in terms of measured data , the patient history, an3. expected values for the typical case; -- Recognize and interpret the signicicance of inconsistent data; and -- Direct. the invocation of rules, thereby speeding program operation. 3. Develop a knowledge base for interprstinz tests and observations relevant to diagnosis and ventilator mana%err.ent. of respiratory insufficiency: -- Interpret results of measurements of vital zspacity, blood sases, respiratory pressures, volumes, :zas concentrations; hemodynamics; -- Recommend procedure for settin up a ventilator for a patient; J. Lederberg 2c)3 Privileged Communication PUFF/WI PROJECT Section 5.4.5 -- Diagnose need for change in patient ventilation; -- Identify indications for proceeding forward or back in the process of weaning from a ventilator; and -- Make interpretations in light of measured test results, patient history, record of therapies and results of therapies and observations. 4. Implement rules for interpretation of respiratory insufficiency data with a new heuristic interpretation system including the following major features: -- Forms time-dependent hypotheses about the patient state; -- Infers desired courses of action based on measured patient state, observations, and expectations of future course. -- Uses models, both heuristic and mathematical, for generating an expectation of the immediate patient course; 5. Create an advisory committee, including outside experts in clinical medicine and computer science, to review the progress to date. They will review conceptual formulations, system design, scope and detail of the clinical knowledge and system operation. The advisory group will be asked to help to identify additional important considerations for the clinical knowledge base and the computer implementation, suggest improved ways to conceptualize or implement problems, and evaluate the soundness of the results to date. 4 . Significance Science advances by quantitation and development of general theories. The practice of medicine advances along one path by integrating quantitative measurements and general theories into the routine of existing clinical practice. The world of clinical medicine includes a complicated interaction among human patients, complex physiology, and proud, human clinical staff. This project is based on the assertion that good , quantitative measurements of physiological state are useful if effectively related to the human and physiological complexities of the clinical world. The best possibility we see for making new quantitative measurements far more generally useful in clinical medicine lies in knowledge-based interpretation of well understood physiologically relevant measurements. The improved care of the sick pat.ient is our objective. This project, if successful, will directly improve the ability of the clinical staff to properly diagnose and manage the patient with respiratory insufficiency. It will lay the foundation for extension of successful methodologies of interpretation of the general problem of interpreting measurements of physiological state. Privileged Communication 201 J. Lederberg Appendix I OVERVIEGI OF ARTIFICIAL INTELLIGENCE RESEARCU Appendix I OVERVIEiJ OF ARTIFICIAL INTELLIGENCE RESEARCH - -- ARTIFICIAL INTELLIGEf1CE RESEARCU What is it? tihat has it achieved? Where is it going? Excerpt from a report by Professor Edward A. Feigenbaum Stanford University INTRODUCTIOS In this briefing, these questions :gill b- L discussed as succinctly as possible: I. What is the scientific field of artificial intelligence research, as seen from various viewpoints? What are the general goals of the field? II. What are it- 3 practical working goals? Uhat are some achievements relative to these goals (circa 1973)? III. What steps (new goals, problems, within a five year horizon? potential achievements) seem to lie ahead, ARTIFICIAL IMTELLIGENCS (alias INTELLIGEXT CGXPUTER SYSTEi4S): - General Vie;i; Artificial Inteliigence research is that part of Computer Science that is concerned with the symbol-manipulation processes that produce intelligent action. By "intelligent action" is meant an act or decision that is goal-oriented, arrived at by an understandable chain of symbolic analysis and reasonin;: steps, and is one in which knowledge of the world informs and suides the reasonins. Some scientists view the performance of complex sy,mbolic reasoninq acts by computer programs as the sine qua non for artificial intelligence programs, but this is necessarily a limited view. Yet another view unifies AI research vith the rest of Computer Science. It iS an oversimplified view, but worthy of consideration. The potential uses of computers by people to accomplish tasks can be "one-diaensionalizedff into a J. Lederberg 202 Privileged Communication OVERVISV OF ARTIFICIAL INTELLIGENCE R%SEARCH Appendix I spectrum representing the nature of instruction that must be given the computer to do its job. Call it the %IAT'-TO-I-ION spectrum. At one extreme of the spectrum, the user supplies his intelligence to instruct the machine with precision exactly HOW to do his job, step-by-step. Progress in Computer Science can be seen as steps away from that extreme ffHOW" point on the spectrum: the familiar panoply of assembly languages, subroutine libraries, compilers, extensible languages, etc. At the other extreme of the spectrum is the user with his real problem (WHAT he wishes the computer, as his instrument, to do for him). He aspires to communicate WHAT he wants done in a language that is comfortable to him (perhaps English); via communication modes that are convenient for him (including perhaps, speech or pictures); with some generality, some abstractness, perhaps some vagueness, imprecision, even error; without having to lay out in detail all necessary subgoals for adequate performance - with reasonable assurance that he is addressing an intelligent agent that is using knowledge of his world to understand his intent, to fill in his vagueness, to make specific his abstractions, to correct his errors, to discover appropriate subgoals, and ultimately to translate WHAT he really iqants done into processing steps that define HOW it shall be done by a real computer. The research activity aimed at creating computer programs that act as "intelligent agents" near the WHAT end of the IIHAT-TO-HOG spectrum can be viewed as the long-range goal of AI research. Historically, AI research has always been the primary vehicle for progress toward this end, though science as a whole is largely unaware of the role, the goals, and the progress. HISTORICAL TRACE -- -- The Yorkinq Goals of the Science; Progress toward those goals; The root concepts of A1 as a science are 1) the conception of the digital computer as a symbol-processin g device (rather than as merely a nunber calculator); 2) the conception that all intelligent activity can be precisely described as symbol-manipulation. (The latter is the fundamental working hypothesis of the AI field, but is controversial outside of the field.) The first inference to be drawn therefrom is that the symbol-manipulations which constitute intelligent activity can be modeled in the medium of the symbol-processing capabilities of the digital computer. This intellectual advance --which gives realization in a physical system, the digital computer, to the complex symbolic processes of intelligent action and decision --with detailed case studies of how the realization can be accomplished, and with bodies of methods and techniques for creating new demonstrations--ranks as one of the great intellectual achievements of Science, allowing us finally to understand how a physical system can also embody mind. The fact that lar.ge seg;aents of the intellectual community do not yet understand that this advance has been made does not change its truth or its fundamental nature. Privileged Communication 233 J. Lederber,g Appendix I O'JERVIZPI OF ARTIFICIAL INTELLIGENCE RESEARCH of 1. 2. 3. Three global l'workin.g goals" have dominated the AI field for the 17 years its existence. These are: Understanding heuristic search as a processing scheme sufficient to account for much intelligent problem solving behavior; and exploring the scope and pervasiveness of heuristic problem solving. Semantic information processing: developing precise formulations of "understandingrr by programs, and llmeaningtl of symbols that are input or stored; tne acquisition, storage, and deployment of knowledge of the world in the service of symbolic problem solving. Information Processing Psychology: developing precise models of human behavior in symbolic-processing tasks. The first two goals represent the fundamental paradigms that have dominated the field. The third cuts across these orthogonally, and involves intense interdisciplinary contact with Psychology, and Linguistics. GOAL 1. HEURISTIC SEARCH, HEURISTIC PROG3X!l?!INfL SYYZOLIC -- PROBLEM SOLVING PROGRAXS _I_- In the first decade, the dominant par&i,. cm of AI research was heuristic search. In this paradigm, problem solving is conceived as follows: A tree of "tries" (aliases: subproblens, reductions, candidates, solution attempts, alternatives-and-consequences, etc.) is sprouted (or sproutable) by a generator. Solutions (variously defined) exist at particular (unknown) depths along particular (unknown) paths. To find one is a "problemft. For any task regarded as nontrivial, the search space is very large. Rules and procedures called heuristics are applied to direct search, to iiait search, to constrain the sprouting of the tree, etc. ?!hile some of this tree-searching machinery is entirely task-specific, other parts can be ma.... _ ,+n Quite general over the domain of designs employing the heuristic search paradigm. Two notions are critical. The first is that problem solvers generally face a "maze" of alternative courses of decision and action that is huge compared smith their processing resources. The second is the use of heuristic knowledge to st eer carefully through large mazes toward a solution seeking the plausible an-', potentially fruitful avenues, avoiding the absurdities and the high-risk paths. Heuristic knowledge is usually informal knowledge-- to be distinguished from formal knowledge that is assertable with the rigor of proof. Polya, the famous mathematician who wrote Patterns of Plausible Inference and other books on pro'olea solving, calls heuristic reasoning ?lthe art of good guessing." Heuristic knowledge is often "common sense" knowledge of the world, rules-of-thumb for generally ac2eGtabl.e performance, or rules of good practice in specific situations. jihen we speak of the "expertise" of an expert, and the "good judgment" he brings to bear on complex problems in his domain, we often are speaking of the heuristics he has developed to search effectively. J. Lederberg 204 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I Provocative essays by Polya notwithstanding, the first serious and detailed studies of heuristic problem solving ever done by Science were done as AI research in its first decade. As with any other science, progress came by the detailed examination of specific cases, from which gradually emerged both a broad picture of the nature of the phenomena being studied and, within this, more formal theories for specific parts. Three sub -goals of heuristic programming are discernable. SUBGOAL 1 A. Demonstrate sufficiency of heuristic search for tasks of intellectual difficulty. These heuristic programming efforts dealt with almost "pure" symbolic reasoning tasks (i.e., tasks not requiring much coupling to real-world knowledge), and used inference schemes that were either ad-hoc or of limited scope. Notable successes during this "prove-the concept" phase were: the Logic Theory Program, that proved theorems in Uhitehead & Russel's propositional calculus; the Geometry Theorem Proving program, that proved theorems in Euclidean geometry at a level of competence exceeding that of the excellent high school geometry student; the Syabolio, Integration program, that solved college freshman symbolic integration problems about as well as MIT freshmen; chess-playing programs that play respectable "club player" C or B Class chess; a checker playing program that was virtually unbeatable, except by the country's top few players (notable also for remarkable self-improvement in performance by analysis of its own play and flbook-move't good play); and a. number of competent management science applications (assembly-line balancing, warehouse location, job-shop scheduling, etc.). To recapitulate briefly: the key concepts are: search in problem solving; and the use of generally informal knowledge to guide search effectively. The AI community was the first to devote serious scientific effort to developing the idea of the use of informal knowledge in problem solving, with notable successes. Few in Science recognize that this achievement has been made and is ready for exploitation. SUBGOAL 1B. Generality in Problem Solving Programs Generality here means the use of a small set of problem solving methods of wide applicability to solve problems of many different types. Each of the problems posed is stated to the program in a particular representation (or framework) vith which the set of methods is constructed to handle. The subgoal of generality arises first as a reaction to the array of f'specialty11 programs mentioned above; second, from the general observation that the ability to do a wide range of tasks is a special touchstone of intelligence; third, from a direct assessment that as the diversity and heterogeneity of the tasks handled by an agent. increases, the likelihood that it can do them all without intelligent action decreases; and fourth, from the argument that any ultimate intelligent agent must have wide generality, since it must take the world and its problems as they come without any intermediary, making generality an important independent desideratum. Privileged Communication J. Lederberg Appendix I OVERVIEi+/ OF ARTIFICIAL INTELLIGENCE RESEARCH This subgoal was pursued with vigor for ten years in a number of projects, was important for its feedback value in clarifying issues for the AI field, and has temporarily (at least) been put back on the shelf as the field begins to explore knowledge-based problem solvers and issues in the representation of knowledge. There were two discernable subthemes. The first was an attempt to ': create abstract heuristic search methods that were divorced from any particular content. Examples were: the General Problem Solver, which used a variant of heuristic search known as mean:-ends analysis; rWLTIPLE, which introduced adaptivit.y in the selection of what subproblem to choose "next" in a search; and REF-ARF, which extended the generality of ordinary procedural programming languages to include the embedding of non-procedural problems of constraint satisfaction. The second subtheme was the construction of theorem provers that take problems expressed as theorems to be proved in the first-order predicate calculus. This line of work was motivated by the (correct) observation that the scope for representing real-world facts and situations in first-order predicate calculus is very great; and by the invention of the resolution method, a comput.ational method for finding oroofs for theorems in this calculus. There has been continuous improvement on the basic method, taking the form of proposing more powerful inference techniques, rather than the form of specific ways for programs to adapt to particular problems. The very strength of the formulation in terms of generality, namely its complete homogenization of the particular task (all tasks are seen and dealt with in the same logical formalism) turns effort away fron how to exploit the particularities of special classes of tasks. But it appears that only by exploiting the particularities can significant reduction in search be achieved. From a practical point of view the only proofs produced by such problem solvers were "shallow!' proofs. Much of this line of r esearch has been temporarily "shelved", awaiting further knowledge on how best to represent knowledge for computer processing. Problems that are essentially simple when represented in their "natural" representation appear extraordinarily complicated when translated into first- order predicate calculus. The current search for theorem provers using higher-order logics is based not on the attempt to increase the raw expressive power, so to speak, of first-order logic, but on the belief that naturalness of expression will ultimately pay off. SUBGOAL IC: High-Performance Programs that perform at near-human level in specialized areas As the heuristic programmin g area matured to the point where the practitioners felt comfortable with their tools, and adventuresome in their use; as the need to explore the varieties of problem- .a posed by the real-world was more keenly felt; and as the concern with knowledge-driven programs (t.o be discussed later) intensified, specific projects arose which aimed at and achieved levels of problem solving performance that equalled, and in some cases exceeded, the best human performance in the tasks being studied. The J. Lederberg 206 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCti Appendix I example of such a program most often cited in the Heuristic DE?JDRAL program, which solves the scientific induction problem of analyzing the mass spectrum of an organic molecule to produce a hypothesis about the i;l.Oli?CUle'S total structure. This is a serious and difficult problem in a relatively new area of analytical chemistry. The program's performance has been generally very competent and in l'world's champion" class for certain specialized families of molecules. Similar levels of successful performance have been achieved by some of the MATHLAB programs that assist scientists in doing symbolic mathematics. The effectiveness of MATHLtid's procedures for doing symbolic integration in calculus is virtually unexcelled. Yet another example, with great potential economic significance, involves a program for planning complex organic chemical syntheses from substances available in chemical catalogs; The program is currently being used as an l~intelliqent assistant" in a new and complex or,ganic synthesis. GOAL 2. SEMANTIC INFORE