Debunking the Myth of “Anonymous” Data

Share It

Today, almost everything about our lives is digitally recorded and stored somewhere. Each credit card purchase, personal medical diagnosis, and preference about music and books is recorded and then used to predict what we like and dislike, and—ultimately—who we are.

This often happens without our knowledge or consent. Personal information that corporations collect from our online behaviors sells for astonishing profits and incentivizes online actors to collect as much as possible. Every mouse click and screen swipe can be tracked and then sold to ad-tech companies and the data brokers that service them.

In an attempt to justify this pervasive surveillance ecosystem, corporations often claim to de-identify our data. This supposedly removes all personal information (such as a person’s name) from the data point (such as the fact that an unnamed person bought a particular medicine at a particular time and place). Personal data can also be aggregated, whereby data about multiple people is combined with the intention of removing personal identifying information and thereby protecting user privacy.

Sometimes companies say our personal data is “anonymized,” implying a one-way ratchet where it can never be dis-aggregated and re-identified. But this is not possible—anonymous data rarely stays this way. As Professor Matt Blaze, an expert in the field of cryptography and data privacy, succinctly summarized: “something that seems anonymous, more often than not, is not anonymous, even if it’s designed with the best intentions.”

Anonymization…and Re-Identification?

Personal data can be considered on a spectrum of identifiability. At the top is data that can directly identify people, such as a name or state identity number, which can be referred to as “direct identifiers.” Next is information indirectly linked to individuals, like personal phone numbers and email addresses, which some call “indirect identifiers.” After this comes data connected to multiple people, such as a favorite restaurant or movie. The other end of this spectrum is information that cannot be linked to any specific person—such as aggregated census data, and data that is not directly related to individuals at all like weather reports.

Data anonymization is often undertaken in two ways. First, some personal identifiers like our names and social security numbers might be deleted. Second, other categories of personal information might be modified—such as obscuring our bank account numbers. For example, the Safe Harbor provision contained with the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that only the first three digits of a zip code can be reported in scrubbed data.

However, in practice, any attempt at de-identification requires removal not only of your identifiable information, but also of information that can identify you when considered in combination with other information known about you. Here's an example:

First, think about the number of people that share your specific ZIP or postal code.
Next, think about how many of those people also share your birthday.
Now, think about how many people share your exact birthday, ZIP code, and gender.

According to one landmark study, these three characteristics are enough to uniquely identify 87% of the U.S. population. A different study showed that 63% of the U.S. population can be uniquely identified from these three facts.

We cannot trust corporations to self-regulate. The financial benefit and business usefulness of our personal data often outweighs our privacy and anonymity. In re-obtaining the real identity of the person involved (direct identifier) alongside a person’s preferences (indirect identifier), corporations are able to continue profiting from our most sensitive information. For instance, a website that asks supposedly “anonymous” users for seemingly trivial information about themselves may be able to use that information to make a unique profile for an individual.

Location Surveillance

To understand this system in practice, we can look at location data. This includes the data collected by apps on your mobile device about your whereabouts: from the weekly trips to your local supermarket to your last appointment at a health center, an immigration clinic, or a protest planning meeting. The collection of this location data on our devices is sufficiently precise for law enforcement to place suspects at the scene of a crime, and for juries to convict people on the basis of that evidence. What’s more, whatever personal data is collected by the government can be misused by its employees, stolen by criminals or foreign governments, and used in unpredictable ways by agency leaders for nefarious new purposes. And all too often, such high tech surveillance disparately burdens people of color.

Practically speaking, there is no way to de-identify individual location data since these data points serve as unique personal identifiers of their own. And even when location data is said to have been anonymized, re-identification can be achieved by correlating de-identified data with other publicly available data like voter rolls or information that's sold by data brokers. One study from 2013 found that researchers could uniquely identify 50% of people using only two randomly chosen time and location data points.

Done right, aggregating location data can work towards preserving our personal rights to privacy by producing non-individualized counts of behaviors instead of detailed timelines of individual location history. For instance, an aggregation might tell you how many people’s phones reported their location as being in a certain city within the last month, but not the exact phone number and other data points that would connect this directly and personally to you. However, there’s often pressure on the experts doing the aggregation to generate granular aggregate data sets that might be more meaningful to a particular decision-maker but which simultaneously expose individuals to an erosion of their personal privacy.

Moreover, most third-party location tracking is designed to build profiles of real people. This means that every time a tracker collects a piece of information, it needs something to tie that information to a particular person. This can happen indirectly by correlating collected data with a particular device or browser, which might later correlate to one person or a group of people, such as a household. Trackers can also use artificial identifiers, like mobile ad IDs and cookies to reach users with targeted messaging. And “anonymous” profiles of personal information can nearly always be linked back to real people—including where they live, what they read, and what they buy.

For data brokers dealing in our personal information, our data can either be useful for their profit-making or truly anonymous, but not both. EFF has long opposed location surveillance programs that can turn our lives into open books for scrutiny by police, surveillance-based advertisers, identity thieves, and stalkers. We’ve also long blown the whistle on phony anonymization.

As a matter of public policy, it is critical that user privacy is not sacrificed in favor of filling the pockets of corporations. And for any data sharing plan, consent is critical: did each person consent to the method of data collection, and did they consent to the particular use? Consent must be specific, informed, opt-in, and voluntary.

Related Issues

Privacy

Locational Privacy

Related Updates

Press Release | April 22, 2026

EFF Sues DHS and ICE For Records on Subpoenas Seeking to Unmask Online Critics

SAN FRANCISCO – The Electronic Frontier Foundation (EFF) sued the Department of Homeland Security (DHS) and Immigration and Customs Enforcement (ICE) today demanding public records about their use of administrative subpoenas to try to identify their online critics.

Deeplinks Blog by Rory Mir, Nathan Sheard | April 16, 2026

Stop New York's Attack on 3D Printing

New York's proposed 2026-2027 budget currently includes provisions that will require all 3D printers sold in the state to run print-blocking censorware—software that surveils every print for forbidden designs. This policy would also create felony charges for possessing or sharing certain design files. The vote on the state budget could...

Deeplinks Blog by Guest Author | April 14, 2026

Google Broke Its Promise to Me. Now ICE Has My Data.

In 2025, Google gave Amandla Thomas-Johnson's data to ICE without giving him the chance to challenge the subpoena, breaking a nearly decade-long promise to notify users before handing their data to law enforcement.

Press Release | April 14, 2026

EFF to State AGs: Investigate Google's Broken Promise to Users Targeted by the Government

Google's Failure to Warn Users About Law Enforcement Demands for Data Is Deceptive

Deeplinks Blog by Sarah Hamid | April 8, 2026

Digital Hopes, Real Power: How the Arab Spring Fueled a Global Surveillance Boom

When people remember the 2011 uprisings across the Middle East and North Africa (MENA), they picture crowded squares, raised phones, and the feeling that the internet had finally shifted the balance of power toward ordinary people. But the past decade and a half is also a story about how governments...

Deeplinks Blog by Betty Gedlu | April 2, 2026

Google and Amazon: Acknowledged Risks, and Ignored Responsibilities

In late 2024, we urged Google and Amazon to honor their human rights commitments. Since then, a stream of additional reporting has reinforced that our concerns were well-founded. Yet despite mounting evidence of serious risk, both companies have refused to take action.

Deeplinks Blog by Jillian C. York, Veridiana Alimonti | April 2, 2026

EFF’s Submission to the UN OHCHR on Protection of Human Rights Defenders in the Digital Age

Governments around the world are adopting new laws and policies aimed at addressing online harms, including laws intended to curb cybercrime and disinformation, and ostensibly protect user safety. Framed as necessary responses to legitimate concerns, they are increasingly being used in ways that restrict fundamental rights.

Deeplinks Blog by Jason Kelley | March 30, 2026

Cindy Cohn on The Daily Show: Learn More About EFF, Privacy's Defender, and Watch the Interview

Thanks for visiting! Learn more about EFF, Cindy Cohn, and her new book, Privacy’s Defender: My Thirty-Year Fight Against Digital Surveillance. EFF's lawyers, activists, and technologists have been thinking about the next big thing in tech before anyone else—whether that’s age verification, AI, or Palantir. Whatever causes you fight for...

Deeplinks Blog by Jason Kelley | March 30, 2026

EFF's Cindy Cohn on The Daily Show! Tonight Monday, March 30

EFF Executive Director Cindy Cohn will be on The Daily Show tonight, Monday March 30, at 11 pm ET and PT, speaking with host Jon Stewart. Cindy will discuss her long history of fighting for privacy online and her new book, Privacy’s Defender: My Thirty-Year Fight Against Digital Surveillance (MIT Press).

Deeplinks Blog by Jillian C. York | March 25, 2026

Digital Hopes, Real Power: Reflecting on the Legacy of the Arab Spring

A new generation of protesters, raised on social media and often fluent in the tools of digital dissent, has taken to the streets in recent months and years. This is the first installment of a blog series reflecting on the global digital legacy of the 2011 Arab uprisings.

Related Issues

Privacy

Locational Privacy

Debunking the Myth of “Anonymous” Data

Debunking the Myth of “Anonymous” Data

Anonymization…and Re-Identification?

Location Surveillance

Related Issues

Related Updates

Related Issues

Follow EFF:

Contact

About

Issues

Updates

Press

Donate