AOL file blunder puts privacy at risk

Wired on Friday: Last year search engine Google stood up against a US government request for its users' search data

Wired on Friday: Last year search engine Google stood up against a US government request for its users' search data. This week, AOL handed exactly the same class of data to the world, by publishing online (and then hastily retracting) a file containing more than 20 million queries collected from over 650,000 American AOL users, writes Danny O'Brien

The resulting furore may well be a personal tragedy for those whose identities are exposed to public examination and ridicule, but it's more than just private data that is being uncovered: it's our own opinions on how this data should be treated, and whether it should be collected at all.

The US department of justice in Google's case wanted access to a vast amount of information - a million random web addresses from its cache of searched websites, and a week's worth of users' search requests. In the end, a judge declared that the former was acceptable, but the latter "gave the court pause as to whether the search queries themselves may constitute potentially sensitive information".

The court was not the only one pausing. Many were divided over just how revealing search queries compiled without users' names or other personal information might be. AOL's goof - which transpired after over-eager researchers at their research facilities released the files for academic use - lets anyone browse through searches, organised by numbered (but not explicitly named) AOL users.

READ MORE

The first discovery is that this data is not just bland "anonymised" aggregates, emptied of all personally identifiable data. Names and addresses appear throughout the searches. One of the most heavily-protected items of personal data in the US is a person's social security number (SSN), a nine-digit number that (rather foolishly) is often used as an identifier by banks, insurance companies and others.

Bloggers reported more than 180 numbers that look like SSNs in the AOL data leak, some of which are connected with names. Vanity searches (where users appear to be searching for mentions of themselves) regularly occur. Individually anonymous snippets of data - a neighbourhood, a hobby, a school - are spread out over time but are linkable to a single AOL account, allowing a person to be pinned down in a disarmingly easy way.

The searches also include the precise time that a search was entered and, in the case of successful searches, the web address of the site the user eventually clicked upon, providing more clues for determined stalkers.

Only a handful of the victims of this "Data Valdez" have been contacted or come forward; but when more find out what has been released, it may not be pretty. One searcher, bloggers reported, filled his or her searches with terms aiming to, apparently, find out how to murder a spouse. Another searcher discusses wanting to surreptitiously "add breasts" to her husband.

It is the reactions to these searches that are particularly interesting. For the hundreds of comments expressing shock at the invasion of privacy, there are a few who want to know more about those searchers - who feel that the police should get involved and that those numbered individuals should have a name put to them by a criminal investigation. The very fact that we now know that there are people out there making these searches, and that major companies like AOL have their records, makes it clear to many that we may have evidence that could help stop or punish a crime, and that we have an obligation to use that evidence.

It is a temptation that has not escaped law enforcement either. When questioned about the information they could have obtained from the Google subpoena, one department of justice spokesperson said that if they'd seen searches within that file referring to child pornography, they would certainly take it seriously.

In the Google case, the department was obtaining its information for a case entirely unrelated to the investigation of child pornography. And in this case, AOL's release of data was accidental and involved hundreds of thousands of individuals with, as far as we know, no criminal record or suspicion attached to them, until now.

It is not just the invasion of privacy that make such releases, whether to the public, to the government, or to a third party, so dangerous. It's the temptation to violate that privacy. Even the most ardent privacy advocate is drawn to peer through these internet curtains and pore over this data. Why not? All human life is there. There are whole soap opera episodes hidden in these text files, waiting to be eked out.

But where does it end? Currently, every major search engine collects this data. If it is a public service to pull out the suspects from this accidentally public line of suspects, why should we not send our law enforcement agents through all of our searches? Why have privacy at all, when the benefits of full and mutual exposure are so great?

AOL intended this file to be a test-bed for academics to draw psychological and social conclusions. It may turn out that, by releasing it to all of us, we've become our own guinea pigs for our future. Do we want this to happen to us, to our families? Simply having this data stored by corporations instead of governments is no protection, and these data leaks will continue to happen as long as those honey-pots of data are saved by the major search engines.

Or do we want to hold back our worst instincts and, rather than futilely attempt to hold back the tide, insist that these companies do not keep this data at all, and preserve all our privacy, once and for all?

Danny O'Brien is activism co-ordinator of the Electronic Frontier Foundation