Primary Keys with a Twist

Alex Tabarrok over at Marginal Revolution points to an advertisement portraying a
customer trying to order food from a pizza shop. [Editor: You should really click
through and watch/listen to the pizza shop piece before you continue]. The customer is rattled by the amount of
information the clerk on the phone knows about his personal life–ranging from recent
purchases at other stores, books checked out from the library, health conditions, financial
information etc.  The advertisement doesn’t describe exactly how this
pizza shop is able to pull all of this information together into its
point-of-sale computer.  All we know is that the hypothetical pizza shop
recently "got wired into the new system", presumably a centralized
database system that uses the customer’s national ID number–which figures
prominently in the dialogue–as the primary key.

In an article
published in 2002, Clay Shirky described the threat shared primary keys pose
for personal privacy:

"The current worst-case scenario is a single
universal database in which all records — federal, state, and local, public and
private — would be unified with a single set of primary keys."

The difficulty of creating such a universal database is the sheer variety of
primary keys used by different databases to uniquely identify records as
belonging to a particular individual, e.g. the database at the local Department
of Motor Vehicles may use driver license numbers while the local video store
uses membership IDs and someone else uses social security numbers.

Making databases interoperable becomes much
less difficult if everyone uses the same primary key to uniquely identify
records.  This is the reason why the advertisement draws attention to the
national ID number.  Presumably, since the time the government issued the
IDs, there was widespread adoption of the IDs as primary keys by pharmacies,
libraries, insurance companies, etc, thus providing the opportunity to create a
centralized database to which the pizza shop is “wired into.”

In such a world, individuals concerned about
personal privacy may go to great lengths to keep their national ID numbers
under wraps. Since primary keys are a
required field, without it, the interoperable record can no longer be easily
created and placing an order for pizza becomes once again an unremarkable part
of life.

This is not the end of the story, Shirky takes
the scenario several steps further. There
is no need for the government to figure in such a universal database at all. Remember, they made it all possible by
issuing unique national IDs to all citizens. Here is the twist that Shirky delivers:

“DNA will provide a cheap way to link a database record with a particular
person, in a way that is much harder to change or forge than anything we’ve
ever seen…Everyone has a biological
primary key
embedded in every cell of their body in the form of DNA…keys
are issued every time someone is born, and re-issued every time a new cell is
created. And if the keys already exist, then the technological argument is not
about creating new keys, but about reading existing ones… If primary keys are issued by nature, rather than by
each database  acting  alone,  then  there  is
no  more  need  for  central databases
or advance
coordination, because the contents of any two DNA-holding databases can be
merged on demand in something close to real time.”

Unlike a national ID number, the information contained in DNA is impossible
to keep under wraps. Actually it is
possible to keep it under wraps now only because DNA sequencing is
prohibitively expensive, but that is changing quickly. This doesn’t get past Shirky:

“The race is on among several biotech firms to be
able to sequence a person’s entire genome for $1000. The $1 DNA ID will be a
side effect of this price drop, and it’s coming soon. When the price of reading DNA markers drops below a dollar, it will be almost
impossible to control who has access to reading a person’s DNA

As I have noted elsewhere, genetic privacy experts have joked that the
largest tissue bank in the world is owned by Publishers Clearinghouse, since there
often are enough cells on the back of licked stamps and envelopes to extract plenty
of DNA for sequencing. Conveniently,
names and addresses are also right there on the return address label. In light of Shirky’s article, the business of
Publishers Clearinghouse could morph from one of hawking magazines to one of
hawking biological primary keys. They
could sell these gems along with the names and addresses they are linked to in nice
and neat little package, like a database.

Before this starts to sound too much like 1984 all over again, I should note
that, like Alex Tabarrok, I’m torn between the the specter of every pizza shop
clerk having access to my personal information and the advantages I can reap by
having my information “out there.”

Stephen Brobst and Richard Hackathorn co-authored a nice article in Teradata
Magazine about the social issues surrounding new massive storage technologies. In this article they describe one of the
opportunities that will be available to all drivers by putting their personal
information “out there.” Brobst and
Hackathorm call it extreme automotive insurance, which:

involves monitoring driving habits to facilitate
premium adjustments based on actual driving behaviors. Such driving behaviors
can be observed via satellite collection of data during automobile operation.

To do this, an individual must be identifiable when
getting into the driver’s seat. That way, subsequent driving habits can be
collected during vehicle use. Those who drive beyond the speed limit and weave
in between lanes will pay higher premiums than those who do not. The data
warehouse is used to analyze the vast amount of data from these monitoring
activities and provide individual premium pricing recommendations… The goal behind this…is to make insurance pricing more
"fair" by limiting the subsidies from good drivers to poor drivers.

Responsible teenage males (or their parents), may be the group most
interested in extreme automotive insurance. No matter how well a male teenager drives, his car insurance rates are
set, among other things, based on the aggregrate driving behaviors of other
teenage males. Needless to say, as a
group, they are demented drivers. They
suffer from what Steven Austad calls “testosterone dementia”:

“Another peculiarity of men, at least in modernized
countries, is that they exhibit a pronounced spike in their death rate
beginning after puberty. That is, their
death rate increases dramatically (tenfold!) between 11 and 23, then gradually
declines for about a decade before settling into an eight-year mortality-rate
doubling for the rest of their lives. I
call this period of male life, between 12 and 23, the time of testosterone
dementia, because it is a behavioral, not a physiological, phenomenon. During those years, more than two-thirds of
male deaths are due to accidents and suicides…Testosterone dementia makes
maturing males enthusiastic warriors, hunters of large game, or fighter pilots,
but poor insurance risks, as the driving records so graphically show.”

Brobst and Richard Hackathorn provide other examples where having loads of
personal information out there could bring real benefits, and there are many
more. At the same time, this doesn’t
make the pizza shop scenario any less unsettling.


Clay Shirky. "DNA, P2P, and Privacy." November 2002.

Mark A. Rothstein (ed.). Genetic Secrets: Protecting Privacy and
Confidentiality in the Genetic Era
. Yale UP: 1997.

Steven Austad. Why We Age. Wiley,

Stephen Brobst & Richard Hackathorn. “The future: eXtreme data warehousing” Teradata Magazine.
September 2004.


Comments are closed.