Is a Genome Like a Computer Program?

Gary Welz
email: garywelz@yahoo.com

An organism's genome is its set of chromosomes, its complete set of genetic information. Many have compared the genome to a massive database - as a blueprint for every protein and organ in the organism. Certainly it is an extraordinary storage device. But can the computer analogy be taken further? Can the genome be thought of as a program that controls the moment to moment functioning of the organism. Can it be viewed as a "self-installing and self-launching application" that enables an organism to develop or "build" itself?

In order to make this metaphor concrete, I propose that computer scientists and biologists begin attempting to describe the processes that the genome participates in as though they were parts of a large computer program. Specifically, create flowcharts with genes as objects connected by logical terms like "and" and "or" and, of course, "while" loops?

Schematic representation of genetic processes has a long history. The "Fundamental Dogma" of genetics, as James Watson once glibly called it, is represented by:

DNA --> RNA --> Proteins

This expresses the sequence of processes:

"DNA is transcribed by RNA and RNA is the template upon which proteins are constructed."

As Watson himself knows better than anyone, the picture is far more complex than that. Robert Robbins of the DOE told me that can begin to approximate it with the chart

DNA --> primary transcript --> messenger RNA -->primary polypetide --> processed polypeeptide -->final protein --> does stuff

With each item in the sequence gives feedback to all the ealier items and DNA even gives feedback to itself.

This past Spring I posted a note to the bionet.genome.chromosome and bionet.general discussion groups concerning the question of whether a genome can be regarded as a computer program and quite a lively discussion ensued that I want to make available to a larger audience. Excerpts of the discussion are linked to my synopsis of it below.

It began with my original posting on April 13, 1995 which was followed by a reply by a very thoughtful and detailed reply from Robert Robbins of the US Dept. of Energy Genome Database Project. Robbins is a biologist with a serious interest in having computer scientists consider my questions. He was encouraging while politely pointing out the naive errors in my thinking.

Robbins himself then heard from G. Dellaire of McGill who raised some interesting points of his own. Robbins replied in detail to Dellaire's comments.

David Baillie from the Institute of Mol. Biol. Biochem. at Simon Fraser University in Burnaby, Canada, Vahe Bedian from the Univ. of Pennsylvania and Paul O'Neill from the Univ. of Utah Computer Center offered some short but useful comments.

Tengleong Chew from the St. Louis University Medical Center replied in detail to my posting and closed with the tantalizing remark that "There are potential Nobel Prizes hidden in this field."

I sent a few people the collected comments and G. Dellaire replied with some detail remarks on the comments of others.

Next, I posted my first attempt to create a flow chart of a genetic process, the process of b-galactosidase, the gene that produces an enzyme used for the digestion of lactose sugar in the bacterium e.coli. The gene is activated if glucose is not present and lactose is.

The chart seemed fairly simple, but Keith Robison of Harvard pointed out that the processes of detecting the presence of glucose and lactose took place in parallel, not in a linear order as my chart implied.

I responded to Robison saying basically that this type of discussion was precisely what I hoped would result from my posting. This was not, after all, an obvious fact to a naive non-molecular biologist.

Vahe Bedian commented more enthusiatically on the rough chart and Robison's remarks.

Guy Tantenzopf suggested a few candidate organisms for this type of analysis. Ron Sapolsky gave references to two papers by P.D. Karp that deal with some of the same questions that I had raised.

This discussion has been very enriching. First because of the intelligence and generousity of the electronic acqaintances I have made in the international molecular biology community but also because it has made me realize that there is a place - perhaps even a need - for naive computer science thinking in the world of molecular genetics.

Appendix

A list of useful WWW sites for Genome Research

An Introduction to Molecular Genetics

The Genome Database Project of the US Dept. of Energy

The DOE runs the Genome Database Project. Search engines will allow you to look up a great many genes for humans and a number of other organisms.

National Institutes of Health

The NIH's Division of Computer Research and Technology is a major supporter of biological research that employs computers. Most of what they now do involves databases - but they may be ready to support some more theoiretical research of the kind I've described.

US Government Labs

US Universities and Private Labs

Non-US Labs

Lists of Genomic Resources