The gene is dead…oh wait…alive but sloppy

The concept of the gene almost died this morning. I was just sitting at my desk minding my own business, reading a journal article when, blam!, the gene was dead. Well almost — turned out to be more of a near death experience. I’m sorry to report there was no white light or anything mysterious like that. But there was coffee, so thats good.

A new article in Genome Research entitled “What is a gene, post-ENCODE? History and updated definition” by Mark B. Gerstein et al. first takes a walk down memory lane. All those definitions! Here we go:

1860s-1900s: Gene as a discrete unit of heredity

1910s: Gene as a distinct locus

1940s: Gene as a blueprint for a protein

1950s: Gene as a physical molecule

1960s: Gene as a transcribed code

1970s-1980s: Gene as open reading frame (ORF) sequence pattern

1990s-2000s: Gene as annotated genomic entity, enumerated in the databanks (current view, pre-ENCODE)

Under the current view, a common metaphor applied to the definition of genes is that they are subroutines in the genomic operating system. The authors explain:

“…one metaphor that is increasingly popular for describing genes is to think of them in terms of subroutines in a huge operating system (OS). That is, insofar as the nucleotides of the genome are put together into a code that is executed through the process of transcription and translation, the genome can be thought of as an operating system for a living being. Genes are then individual subroutines in this overall system that are repetitively called in the process of transcription”

They go on to explain that there are a number of problematic issues with the current definition of a gene. Gene regulation, overlapping and spliced genes, junk DNA, parasitic and mobile genes all can complicate efforts to give genes a definition that is nice and neat…and accurate. So difficult, that the authors at one point described a potential escape route [Spoiler alert: this is the part where the gene almost dies]: “we could declare the concept of the gene dead and try to come up with something completely new that fits all the data.”

Well, that seemed like a bold move to the authors too! Nobody likes to wear black in the summertime anyway. Instead they opted for a definition that would at least attempt to be backward compatible. I was delighted to see that they were also sensitive to the fact that everyone wants to know just how many genes are in the human genome. So, the authors note that another criterion for a new definition of the gene is that it “should be practical enough so that one can readily enumerate genes” (emphasis theirs). OK, this is sounding good.

After working toward a definition for a few pages, the authors finally deliver the goods and propose the following definition:

The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.

Its not Shakespeare, that’s for sure, but I guess most technical definitions are not. Luckily, there is an updated metaphor to go along with this definition:

The new ENCODE perspective does not, of course, fit with the metaphor of the gene as a simple callable routine in a huge operating system. In this new perspective, one enters a gene “routine” in many different ways in the framework of alternative splicing and lattices of long transcripts. The execution of the genomic OS does not have as neat a quality as this idea of repetitive calls to a discrete subroutine in a normal computer OS. However, the framework of describing the genome as executed code still has some merit. That is, one can still understand gene transcription in terms of parallel threads of execution, with the caveat that these threads do not follow canonical, modular subroutine structure. Rather, threads of execution are intertwined in a rather “higgledy-piggledy” fashion, very much like what would be described as a sloppy, unstructured computer program code with lots of GOTO statements zipping in and out of loops and other constructs.

So, there you go. Your genome is essentially sloppy, unstructured computer code. And the gene lives on, in a sort of higgledy-piggledy fashion.

The one noticeable weakness in this paper is the total lack of poetry. Definitions, metaphors, but no haikus. How would Basho define a gene? Haikus please!

Update: I’ve worked up the courage. Here is my attempt at a Basho-ripoff haiku:

Zea mays
Transposons jump in

ENCODE Project


4 Responses to “The gene is dead…oh wait…alive but sloppy”

  1. Ricardo on August 14th, 2007 7:50 pm

    Great summary of the paper, but I find that the new metaphor is sort of a stretch. They could have made up some other metaphor to better depict the idea instead of the sloppy code.

    Anyhow, loved the first paragraph, specially the coffee part. Too funny :D

  2. Trackbacks on April 21st, 2019 9:47 pm

Leave a Reply

You may use HTML tags in your comment. Please be patient, comments may take a while to post.

Subscribe without commenting