Date: Wed, 10 May 1995 18:10:47 PDT From: Ron SapolskyI saw your posts in bionet.genome.chromosomes and what you were discussing reminds me of the work being done by Peter Karp at SRI. Here's two of his references from Medline:To: gwelz@panix.com Subject: Re: Genome Program II
1.Title: A knowledge base of the chemical compounds of intermediary metabolism.
Author: Karp PD.
Journal: Computer Applications in the Biosciences, 1992 Aug, 8(4):347-57.
Abstract: This paper describes a publicly available knowledge base of the chemical compounds involved in intermediary metabolism. We consider the motivations for constructing a knowledge base of metabolic compounds, the methodology by which it was constructed, and the information that it currently contains. Currently the knowledge base describes 981 compounds, listing for each: synonyms for its name, a systematic name, CAS registry number, chemical formula, molecular weight, chemical structure and two-dimensional display coordinates for the structure. The Compound Knowledge Base (CompoundKB) illustrates several methodological principles that should guide the development of biological knowledge bases. I argue that biological datasets should be made available in multiple representations to increase their accessibility to end users, and I present multiple representations of the CompoundKB (knowledge base, relational data base and ASN. 1 representations). I also analyze the general characteristics of these representations to provide an understanding of their relative advantages and disadvantages. Another principle is that the error rate of biological data bases should be estimated and documented-this analysis is performed for the CompoundKB.
2.Title: Artificial intelligence methods for theory representation and hypothesis formation.
Author: Karp PD.
Journal: Computer Applications in the Biosciences, 1991 Jul, 7(3):301-8.
Abstract: This article describes artificial intelligence methods for representing theories in molecular biology, and for improving the predictive power of these theories using experimental data. A program called GENSIM provides a framework for representing theories that includes descriptions of classes of biological objects (genes, enzymes, etc.), and processes that specify potential interactions among these objects (such as enzymatic reactions). GENSIM can employ a theory specified within this framework to predict the outcomes of biological experiments. A program called HYPGENE comes into play when the observed outcome of an experiment does not match the outcome predicted by GENSIM. HYPGENE works backward from the error in GENSIMs prediction to postulate changes to both the theory embodied by GENSIM, and the presumed initial conditions of the experiment. I view HYPGENEs hypothesis generation task as a design problem, and I have adapted AI methods developed for design and planning to this task. These techniques were developed in conjunction with an in-depth study of the discovery of the gene regulation mechanism of attenuation in the E. coli tryptophan operon. Both GENSIM and HYPGENE have been tested on sample problems from the history of attenuation, and produced many of the same solutions as biologists did.