50 million personal genome sequences by 2015?

Just how quickly will the market for personal genome sequences grow? My back-of-the-napkin calculation pegs it at 50 million sequences obtained by 2015, give or take. While this is far from a scientifically derived calculation, my rationale is simply to assume that the trend line for the personal genome sequencing market might look a lot like the one experienced in the personal computer market.

Welcome to 1980
The personal computer industry grew from several thousand units sold in 1975 to 50 million units in 1995. If the personal genome sequencing market follows suit, we might say that 2007 for personal genome sequences was like 1979 for personal computers, and we’ve just turned the corner into 1980 where units sold remains sub 1 million, but growth is noticeable. If growth continues apace, we’ll hit 50 million sequences obtained somewhere around 2015-2020.

PCs 1980-1995: sparkline
pDNA 2007-2015: sparkline


Deus ex machina
A difficulty with predicting the future is human agency, or choice. How quickly will people warm to the idea of surfing their personal genome sequence? Although we’re in the early days, personal genome sequencing evangelists are starting to appear in unexpected places. Enter Christopher Hitchens. In a recent interview about his newest book on the Hoover Institute’s show Uncommon Knowledge, he paused and said: “As it happens, I’ve had my DNA sequenced recently. You can get yours done too. And you should, by the way…”. Here is the clip (you can skip the 8 min point):


History, on repeat
How else might the personal genome sequencing market resemble the personal computing industry? For some ideas, witness this absolutely brilliant piece of video from Britain in 1969 — a year when computers were not yet “personal” computers, but they were clearly heading that direction:

Part 2, and Part 3

The interviews with people on the street are amazing. When asked, “So what do you think of computers?”, responses range from the aloof “What are computers?” to the utopian “a revolution like we’ve never seen” to the dismissive “yeah, they’re great, but I don’t know what all the fuss is about” to the completely dystopian “the government will use them to control us”.

At the close of the video, renowned professor Donald Michie (University of Edinburgh) had this to say:

“[Computers are] bringing about the greatest revolution the human race has ever known…This revolution could lead to terrible consequences, or it could lead to the greatest advances ever for the human race. Which of these things are to happen, is up to us.”

Fools Game
Attaching a number to a 10 year forecast is a fools game. The timescale may be off-base in either direction. The point of writing this post though is to help set expectations about a near-term future where many millions of people have obtained personal genome sequences (including partial sequences). This point still escapes many people who work in and around the genetics field.


17 Responses to “50 million personal genome sequences by 2015?”

  1. Steven Murphy MD on January 6th, 2008 8:48 pm

    Great post Jason!
    I think computers are grand. And so is your genome. But……Couldn’t even the old compuers at least do somethig useful? If you build the 2000 USD full genome, then people will definately buy it like a PC. There is a little prolem though….No need to upgrade your genome every 2 years.
    for now….

  2. jasonbobe on January 6th, 2008 9:01 pm

    Hi Steve -

    Don’t forget that a principal reason people purchased computers was to play games. I suspect the varied number of uses of a DNA sequence, many of them recreational and non-medical, will be enough of an incentive for people to desire them. Witness the National Genographic project (>200,000 paying customers thus far).

    My calculations are not based on full human genomes either, just personal genetic sequences.

    In this way, people might be buying or “upgrading” their sequence every few years.

    2008: 1M snp chip
    2009: selective sequencing + 1M snp
    2010: exome + 1M snp
    2012: exome, improved CNV, 1M snp, +microbiomics
    2015: full genome
    2018: single-cell sequencing of specific tissues for low level mosaicism, telomere length, etc

    As it gets cheap, easy, and useful people might start eyeing their cats, dogs, and houseplants as well.


  3. jasonbobe on January 6th, 2008 9:11 pm

    I should also note that in 2007 2008 we still aren’t using computers in medicine :)

    Tongue in cheek: Imagine if we waited until computers were valued by the medical community before we began using them. It might be brash of us to try to start using genomes for medicine right from the get go :-P

  4. David M Jackson on January 7th, 2008 1:40 pm

    Interesting post…unfortunately I can’t help but do a little math:

    2015 - 2008 = 7ish years (~2,700 days)

    50M / 2,700 = 18,000+ PER DAY!!!

    Can we sequence 18,000 genomes THIS YEAR ALONE? [not a rhetorical question]

    Without the cost of the technology dropping out of the stratosphere, the number per day quired to hit 50M simply rises until the price comes down. Then instead of $$$, the bandwidth limitation becomes one of multiply parallel, multiplexed, HTP sequencers being (1) available / installed / functional, (2) fully utilized, and (3) running 24/7/365.

    While I assume the 50M number has been used to shock the reader, even an order of magnitude less will be an amazing milestone should it be achieved. Obviously the acceleration towards the under-$10K genome is underway…but when it will be fully enabled remains highly speculative and, IMO, unlikely to be achieved in ‘08 alone.

    Just my two cents…

  5. jasonbobe on January 7th, 2008 2:03 pm


    This is a great way to splice the data. To achieve 50 million sequences per year, it would require more than the combined annual output of LabCorp and Quest (2005):

    Labcorp 2005: 13 million samples
    Quest 2005: 20 million samples

    No small task! The size of investment in capital equipment needed to put together one or two sequencing operations that could handle this magnitude of throughput is astronomical.

    Another way to splice the data might be to consider the Genographic Project, which has already surpassed 200K volunteer samples processed. Even though they are sequencing only a handful of basepairs, they are still providing personal genome sequence data.

    In this way, to achieve 50M sequences obtained (albeit not full genome sequences), we would need:

    (50M)/(200K) = 250 projects (or companies) similar to the Genographic Project, perhaps each with their own equipment and providing a wide mix of sequence data (from a few snps, to 1M snps, to selective sequencing, to whole genomes).

    That seems imminently doable (and likely). How many sequencing outfits will appear in China alone by 2012?

    I agree that 50M whole genomes are unlikely by 2015, but that was not the point of my post. I should probably clarify.


  6. jasonbobe on January 7th, 2008 3:08 pm

    Perhaps we might adopt some language which clarifies whole genome sequences from partial genome sequences:

    pDNA = partial personal genetic sequence data
    PGs = whole personal genomes

    I mixed the two in the above post, might be best if I didn’t use them interchangeably.


  7. Ryan Phelan on January 7th, 2008 5:37 pm

    Nice piece Jason—sounds like a perfect longbet to me!

    Maybe you should post a prediction:http://www.longbets.org/

  8. David M Jackson on January 7th, 2008 5:59 pm


    Excellent point about the terminology - words do indeed matter!

    For any healthcare consumer to not be considered for a genetic sequencing of some sort in the relatively near future is naive at best. As for consumer-directed sequencing, there will be a few too but I suggest proportionately fewer that orders written by clinicians.

    As for the number of individual genomes sequenced, that is likely to be very, very small no matter the technology that comes out. Why do I say that? Because the best technology out there to do it still costs ten$-of-thousand$ per genome, is not even on the market yet and takes 8+ weeks to do just one!


  9. jasonbobe on January 7th, 2008 11:49 pm

    Ryan -

    Thanks for the suggestion…I’ve always wanted to place a longbet, perhaps now is the time :)


  10. jasonbobe on January 7th, 2008 11:55 pm

    David -

    One point where I would consider drawing a line in the sand around pDNA touches upon clinical genetic testing (and some non-medical consumer genetics for that matter). These tests by and large are interpretations and individuals don’t get raw basepairs back except for variant alleles (e.g. deltaF508 for CF).

    This is of course starting to split hairs, but I think the mindshift toward transacting on basepairs is an important one. One of the blog posts I’ve been planning on writing for months involves imagining whether genetics will stay vertically integrated, or will balkanize into service groups:

    (1) biology: tissue/dna sample mgmt and conversion (tissue into basepairs), e.g. cell lines, sequencing
    (2) interpretation: converting basepairs into knowledge, e.g. Dx, visualization, counseling, referrals, alerts, etc.
    (3) data stewardship: of basepairs, phenotypes, knowledge/intepretation, access rights, access history, preferences, etc.

    What I think everyone will really want is data portability, which enables (1) sequence accumulation and re-sequencing; (2) interpretation accumulation and re-interpretation, i.e. 2nd opinions; (3) a natural division of labor around specialty (sequencing, IT, interpretation) and a market in each. These items might be bundled at point of transaction, or they might not.


  11. A. Marques on January 8th, 2008 6:18 pm

    Are we talking about full genome sequences or marker zones? If it’s full, I’d say that 50 mill is probably a bit too much.

  12. nick gogerty on January 8th, 2008 7:12 pm

    50m seems very reasonable. They may not even need to fully sequenced for value. My guess is that similar to the cheap SNP analysis today, a stastical approximation of sequential similarities will allow for a “sequence” to be performed very cheaply. My only argument is with the number I think it is too low. I would predict that the figure would be close to 100m as the most likely to be “sequenced would be” those who are “in process” in a healthcare system in a developing country, the newly born or those being treated. The real interesting thing will metagenomic sequencing. A sequence can tell you about the potentials of where you started but maybe not where you are or your “potential” going forward.

  13. nick gogerty on January 8th, 2008 7:14 pm

    The earlier post in regards to 50m sounds high should read “the singularity” by Kurzweil. The Snp Cost has been dropping by about 2 orders of magnitude per year. 50m genomes @ $100 each is only $5b. A drop in the health care bucket which now consumes 10-15% of most developed countries GDP.

  14. Janette on March 21st, 2008 10:40 pm

    Interesting article and comments.
    One of the biggest hurdles facing genome centres is that the technology used for sequencing has not exponentially grown and improved like computers. For almost 10 years genome sequencing was peformed using capillary electrophoresis. Only in the last two years have the next-gen platforms been released. However, these are still very expensive to run - and as the scientific world runs on money, the expenditure may still be too great to offer such large scale screening.
    Another issue is data quality and use of the data - we can produce it, but data analysis is still in its infancy, we hardly even know how the genome works yet. This could cause all sorts of problems to lay person - look at the results of BRCA1/2 screening and susceptiability vs. confirmed 100% chance of breast cancer.
    Data quality still has it limitations. It may be easy to generate 50million DNA sequences, but how many extra have to be generated to account for quality clipping esp. if they are to be used in medical situations?

  15. jasonbobe on March 22nd, 2008 10:53 am

    Hi Janette -

    RE: the pace of change in sequencing capability, you should check out Rob Carlson’s piece:

    The Pace and Proliferation of Biological Technologies, Originally published in Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science Volume 1 Number 3, August 2003. Published on KurzweilAI.net March 4, 2004.

    RE: Data quality and the number of “extra” sequences needed…

    This is an important point. As you elude to, it might take 50x coverage to achieve quality data. In other words, one might need to sequence the same DNA 50 times in order to get one high quality sequence. In this way, in order to generate 50M personal DNA sequences, we’ll really need to generate 50×50 sequences. Or, to flip it around, we might need to sequence 50M dna sequences to achieve quality DNA for 1M sequences.

  16. Trackbacks on March 23rd, 2019 10:09 am

Leave a Reply

You may use HTML tags in your comment. Please be patient, comments may take a while to post.

Subscribe without commenting