| Book Reviews Archives | Main |
February 04, 2005 | A Search By Any Other Name
Slate has been running excerpts from a book called Safe: The Race to Protect Ourselves in a Newly Dangerous World (affiliate url here, normal url here).
I plan to get to it soon, but want to talk first about today's excerpt, which concerns data mining, Total Information Awareness and an technique called one-way hashing. I'll include a snippet from today's Slate story, but if you want to really follow the argument here, first go read the whole piece
This excerpt is about technology created by Jeff Jonas, a computer scientist who founded a company called SRD, which gained venture capital from the CIA and was recently bought by IBM. His software was originally used to look for casino cheaters, by looking for hidden links between individuals.
But the death of TIA was not the end of data mining's application for security questions. In fact many of the most controversial TIA projects simply switched funding sources to classified ones. Finding a way to scan and exchange data remains an active interest of intelligence agencies. One question, then, is whether there are technical ways to mine networked data and preserve both secrecy and privacy at the same time. Jonas thinks he has an answer, which [Jeff Jonas] says came to him after he heard that the government had trouble keeping its watch-list data under wraps. He also knew-from the TIA controversy and the firestorm of criticism over airlines such as JetBlue giving passenger data to the government-that Americans are becoming increasingly skeptical of corporations handing over their personal data to the government. What Jonas came up with is a means to anonymize information but still allow it to be searched for links. He named it ANNA, and he says it's the answer to "how to know everything about everyone without knowing anything about anyone."ANNA works like this: The software takes a set of data and applies a mathematical encryption formula that converts each piece of data-a name, an address, a phone number-into an indecipherable string of characters. The name al-Midhar, for example, could be transformed into cbd034409c22929518fa494f99dc9964. It's called a one-way hash, and in the case of ANNA, the hash function serves to create an anonymous version of the information stored in the database. Each string of numbers is unique, so if two pieces of data differ by even a letter or a comma, the resulting hash will be completely different. ANNA also takes the common data errors found by NORA-misspellings of names, transposed birth dates-and hashes them as well. Then it does the same for the names and other information on the watch list (which might include birth dates, addresses, or Social Security numbers). Once all the data is hashed, NORA or another system could search for matches between the unique numbers without ever revealing the underlying data.
Let's say the government is looking for a particular suspect, John Doe, and wants to find out if certain companies have any data about him. It runs a hash on "John Doe," his birth date, Social Security number, and any other information it has on him. The result is a string of letters and numbers. It then hands that string over to the companies, which have run the same hash function on all of their data. Then the company simply looks for matching strings in its database. If it finds one, it alerts the government, which then could obtain a court order to un-anonymize the data.
First, Jeff Jonas is a really smart guy. I've met him before and interviewed him for at least one story.
Jonas used to be a severe critic of Total Information Awareness-style data mining, which looks for patterns of behavior to find possible suspects. He contrasted that with his system, which starts with a suspicion about an individual and then looks to see who that individual is connected to. I assume, though I don't know, that he is still of this opinion.
Now, Jonas's system would anonymize data, but despite the attempt to look for misspellings, transposed digits and variations on names, there is huge room for error in such a matching system. For instance, every David Nelson would share the same hash number. Databases also differ significantly in their ability to differentiate between individuals. A National Rifle Association email list may only contain a name and an email address, while a bank would have much more. Unless, every individual has a unique identifier attached to their every transaction, there is a huge problem with incorrect identification.
Moreover, what kinds of databases does Jonas envision his system having immediate access to? As part of the Markle Task Force report on the need for a centralized national security IT structure, he wrote (.pdf) this:
Counterterrorism officers should be able to identify known associates of the terrorist suspect within 30 seconds, using shared addresses, records of phone calls to and from the suspect’s phone, emails to and from the suspect’s accounts, financial transactions, travel history and reservations, and common memberships in organizations, including (with appropriate safeguards) religious and expressive organizations.
Now, Jonas is here talking about a system that would have access to your emails, online behavior, your purchase records, lists of what numbers you called, where you have traveled and what political and religious groups you belong to.
Maybe that's something the country would agree to, but do not think it is not a massive change.
Two other points, anonymization was much talked about with Total Information Awareness.
Just because government agencies or algorithms don't know your name, that does not mean you are not being surveilled.
Imagine if a little spider robot sneaked into your house every day, poking around for drugs and plugging into your computers' USB port to search for child pornography or unauthorized MP3s.
It doesn't know your name, but if it finds something suspicious, it alerts an officer, who gets authorization from his superior or a judge to reveal your identity and further search your house.
Now, to be fair, that's a real world analogy to an anonymized Total Information Awareness model that has suffered from mission creep.
In the original TIA conception, the little partly-blind spider would only look in your computer and the ones of every company and hospital for indicators that you were part of a terrorist conspiracy, though it might come by a couple times a day.
In Jonas's ANNA model, the blinkered spider would only look visit your house, your work and your church, if you were a terrorism suspect or if you had a connection to a terrorist suspect, such as living in the same apartment building or visiting the same chat room.
And finally, there is nothing that prohibits the government from using information about other possible crimes when conducting a legitimate search and there's no technical barrier to using a system like ANNA to track down mobsters, file traders or recreational drug users.
Those are political questions. I don't mean to disparage the idea of anonymization using one-way hashes -- it could be a very useful tool for protecting privacy and civil liberties.
However, I'm skeptical of its misuse in political arguments, and I'm distrustful of Wired Magazine-style techno-evangelism.
That said, I'm still planning on buying Safe later today.
(And just a note about the history of Total Information Awareness -- it was not The New York Times that first revealed the program's existence in November 2002 as today's Slate excerpt would have it. Wired News freelancer Elliot Borin beat the newspaper of record by almost three months.>
Posted by Ryan Singel at 08:45 AM | Comments (1) | TrackBack
January 30, 2005 | Reviewing the Review - A Closer Look at a Hatchet JobHeather Mac Donald, the City Journal's resident apologist for racial profiling and abusive interrogation techniques, published a review of Robert O'Harrow Jr.'s book, No Place To Hide, in the January 25 edition of the Wall Street Journal.
Mac Donald is perhaps best known for her full-throated defense of the Patriot Act from any and all criticism, including this essay, which was reprinted in full (scroll down) on the Justice Department's website defending the legislation.
Here Mac Donald offers a snarky review of O'Harrow's book, a broadside written in bad faith that dismisses the book simply based on the premise that no one should even question the implications of surveillance, government use of massive corporate collections of data, or law enforcement powers.
Here's two examples of her inability to engage the book fairly:
One:
Mr. O'Harrow presents every horror story he can find about a data system gone awry. Florida authorities bar an eligible voter from voting in the 2000 presidential election in Florida after computers falsely identify him as a felon. [...]
Such misfirings are regrettable, and every measure should be taken to avoid them. [...] The cost to democratic legitimacy of election fraud outweighs the minimal risk that antifraud technology will disenfranchise eligible voters. Virtually every modern discovery that improves life -- from vaccines to automobiles -- carries risks; balancing those risks against the technology's benefits is a skill that privacy advocates seem to lack.
Mac Donald's dismisses the DBT/Florida debacle as the case of one eligible voter being disenfranchised, calling it the price of modern anti-fraud technology.
But it wasn't just one voter. The list included three percent of all African American voters in Florida. And while we will never know for certain whether inaccurate purges changed the 2000 election results, the U.S. Commission on Civil Rights estimates 8,000 voters were inaccurately flagged by the faulty list. President Bush's margin of victory was 447 votes.
FindingsThe state of Florida’s statutorily mandated purge list, compiled by a private firm, was provided to county supervisors of elections with names that were inexact matches. The data provided demonstrated that this list had at least a 14.1 percent error rate.
African Americans had a significantly greater chance of being listed on Florida’s mandated purge list. The probability of names of African Americans appearing on the list in error was significantly greater than the likelihood of the names of whites being erroneously included on the purge list.The state of Florida’s use of this purge list, combined with the state law that places the burden on voters to remove themselves from the list, resulted in denying countless African Americans the right to vote.
Two:
In fact, people give away personal information even when they don't have to. In 1998, hundreds of thousands of magazine readers filled out an eight-page, 700-item questionnaire about themselves just because Condé Nast was curious about its subscribers' most intimate medical problems and life-style choices. Americans clearly have a far more relaxed view of privacy than the activists who claim to speak on their behalf.
What Mac Donald conveniently leaves out of this account is that the survey pretended it was anonymous, while in fact, the survey's designer surreptitiously placed a tracking code on the envelope that identified the reader the survey had been mailed. While Americans may like to fill out surveys, we don't like being misled or lied to by omission.
Of course, that portion of the story doesn't fit with Mac Donald's thesis, so she conveniently neglects to mention it.
And finally, Mac Donald tries to defend the Total Information Awareness project by arguing that O'Harrow neglects to look into all the effort being put into "anonymization technologies," which, though she declines to cite a single example, Mac Donald insists are being pursued as thoroughly as the technologies to surveill Americans. (Of course, in her view, this would have happened regardless of people concerned about privacy and surveillance).
Let's take for example the Total Information Awareness system that DARPA was working on. Once developed and deployed (by some agency other than DARPA, which is purely a research group), the system would search through almost any database imaginable, including law enforcement, medical, associational, financial, phone, media and Internet records to search for patterns of activities that look like terrorist plans. The goal was to find plots before the deed was done.
Regardless of the immensity of the difficulty of distinguishing between legitimate activities and terrorist plots, as well as the enormous potential for false positives, even if the system could work, there's a not-so-minor question of the Fourth Amendment. The system would have placed almost the entirety of Americans' lives under constant surveillance.
The program's directors directed a minuscule amount of their funding to a "privacy appliance." That system would sit between the databases and the central supercomputer algorithms, and would try to add-on privacy by anonymizing citizen's identities. So the appliance sitting on a credit card database would send on "234fgxc45f bought a Casio watch" and the one sitting on AT&T's server would send on "234fgxc45f called 457.763.3452" and the Joint Terrorism Task Force database would send on the info that "457.763.3452 that is the workplace of a suspected terrorist." Then an analyst would take that info to a judge or simply to their supervisor and get permission to change 234fgxc45f to a real name.
That's nice, so far as it goes, and does do something to prevent the creation of a thoroughly indexed central database on Americans, but it ignores one crucial thing: the system is still surveilling every move of American citizens, a blatant violation of the spirit, if not the instrument, of the Fourth Amendment.
Here's a analogy.
Suppose law enforcement agents had the keys to all Americans' houses and every day, opened the door and let a dog go in. The officer has no idea what your name is. The dog goes in and sniffs all through the house. If the dog smells drugs or something it thinks might be drugs, it barks. Once the dog barks, the officer calls down to the central station and gets permission to find out who you are and do further searches.
From where I stand, that's an un-American America, just as I think, a society in which a computerized version of a drug-detecting dog is sniffing my every purchase, email and phone call is un-American and would have a chilling effect on citizens' participation in politics.
Mac Donald may think that's a fine world to live in, where a law enforcement agents unleash a computerized sniffer on your every move.
Of course, in Mac Donald's world, if you have nothing to hide, you have nothing to fear.
But try telling that to the 3,000 or so Denver activists who were spied upon by the Denver police and Joint Terrorism Task Force. O'Harrow chronicles the experience of Quakers who were labeled "criminal extremists," but somehow that story fails to make it into Mac Donald's hit piece.
At one point in her review, Mac Donald also castigates O'Harrow for not doing any reporting on the effects of surveillance. It's the kind of line that will make anyone who has read the book spit out their coffee in comic disbelief. If anything, O'Harrow spends too much time reporting, as the level of detail supporting his narrative is almost overwhelming (and amazingly, though I cover much of the same territory, O'Harrow does, I found not a single error in fact in the book).
Moreover, despite Mac Donald's castigation of O'Harrow's book as a "Jeremiad", O'Harrow is eminently fair and far from pretending to have all the answers. In fact, its clear he wants a fuller debate over the use of personal information, the legitimate uses of surveillance and whether new laws are needed to keep up with the power of new technology.
It would be nice to have that debate, but with writers like Heather Mac Donald being tolerated by the Wall Street Journal, it seems increasingly unlikely that any real, informed debate will happen anytime soon.
That's a shame, and both Mac Donald and the editors who let her purported review go into print owe the country better.
Posted by Ryan Singel at 10:40 PM | TrackBack
January 27, 2005 | Book Fight ClubDennis Bailey over at his Open Society Paradox blog points out that C-SPAN is featuring his book (also called Open Society Paradox) (tracking url here), touting openness and surveillance over privacy, at 10 am on Sunday morning. Bailey also points out that at 11 pm, CSPAN is doing a thing on O'Harrow's book, No Place to Hide, (perhaps this is the last time I'll mention it -- at least in January).
Bailey writes: "Two diametrically opposed viewpoints, back to back on C-SPAN. Sorry Fox but that's a better idea of fair and balanced."
True that and it sounds like a way more interesting debate than anything that will be Meet My Ego or Inside Snarking.
Hopefully, I'll get to Bailey's book before Sunday. That way, he and I can get to some high tech updating of old-style wrassling (remember back when pro-wrestlers didn't use chairs? That kind of wrassling).
Posted by Ryan Singel at 04:00 PM | TrackBack
January 25, 2005 | The Times Discovers O'HarrowMichiko Kakutani reviewed O'Harrow's No Place To Hide in today's New York Times.
This surveillance state is not a futuristic place conjured in a Philip K. Dick novel or "Matrix"-esque sci-fi thriller. It is post-9/11 America, as described in Robert O'Harrow Jr.'s unnerving new book, "No Place to Hide" - an America where citizens' "right to be let alone," as Justice Louis Brandeis of the Supreme Court once put it, is increasingly imperiled, where more and more components of our daily lives are routinely monitored, recorded and analyzed.These concerns, of course, are hardly new. Way back in 1964, in "The Naked Society," Vance Packard warned about encroachments on civil liberties and the growing threat to privacy posed by new electronic devices, and in 1971, in "The Assault on Privacy," Arthur R. Miller warned that advances in information technologies had given birth to "a new social virus - 'data-mania.' " The digital revolution of the 1990's, however, exponentially amplified these trends by enabling retailers, marketers and financial institutions to gather and store vast amounts of information about current and potential customers. And as Mr. O'Harrow notes, the terrorist attacks of Sept. 11, 2001, "reignited and reshaped a smoldering debate over the proper use of government power to peer into the lives of ordinary people."
My review of the book for the January 5 edition of Wired News is here.
Update: I changed the Kakutani link to what should be a permanent link using Aaron Swartz's link generator, since the original link will soon disappear into the black hole of the paper's archive, only retrievable by those with LexisNexis accounts or enough desperation to pay $4.95 for old news .
Of course, Swartz's tool has other uses, especially for those frustrated by searching on the NYTimes's website for older articles. But perhaps someday soon, the Times will realize their strategy to make a buck or two from the desperate or stupid (maybe high schoolers writing research papers?) means it is no longer the newspaper of record, at least in the online world.
Posted by Ryan Singel at 10:29 AM | Comments (2) | TrackBack
January 19, 2005 | Book in TV FormatRobert O'Harrow's investigative work for his book "No Place To Hide" (review here) has spurred Mr. Peter Jennings of ABC News (generally my favorite anchor of the big three) to follow up with a short TV version of the book. That will air Thursday on PrimeTime Live.
Thursday, Jan. 20, 2005 -- Peter Jennings Reporting: No Place to Hide Peter Jennings examines the government's effort to harness technology in the name of security, and the price we might pay if we fail to balance security and freedom in the digital age.
It should be on at 10 EST/PST and 9 central, but check your local listings. (I tried that but found that one has to register with one service or the other to find aggregate TV listing times in my neck of the woods. YMMV, but seems pretty stupid.)
Note to San Franciscan's KGO TV's overly script driven programming schedule shows the program slotted for Thursday, from 10 p.m. to 11 p.m.
If you haven't already bought and devoured O'Harrow's book, then maybe you need to start with the television version, before you go buy the book through this affiliate link.
