Bad Data “For Good”: How Data Brokers Try to Hide Behind Academic Research

Share It

When data broker SafeGraph got caught selling location information on Planned Parenthood visitors, it had a public relations trick up its sleeve. After the company agreed to remove family planning center data from its platforms in response to public outcry, CEO Auren Hoffman tried to flip the narrative: he claimed that his company’s harvesting and sharing of sensitive data was, in fact, an engine for beneficial research on abortion access. He even argued that SafeGraph’s post-scandal removal of the clinic data was the real problem: “Once we decided to take it down, we had hundreds of researchers complain to us about…taking that data away from them.” Of course, when pressed, Hoffman could not name any individual researchers or institutions.

SafeGraph is not alone among location data brokers in trying to “research wash” its privacy-invasive business model and data through academic work. Other shady actors like Veraset, Cuebiq, Spectus, and X-Mode also operate so-called “data for good” programs with academics, and have seized on the pandemic to expand them. These data brokers provide location data to academic researchers across disciplines, with resulting publications appearing in peer-reviewed venues as prestigious as Nature and the Proceedings of the National Academy of Sciences. These companies’ data is so widely used in human mobility research—from epidemic forecasting and emergency response to urban planning and business development—that the literature has progressed to meta-studies comparing, for example, Spectus, X-Mode, and Veraset datasets.

Data brokers variously claim to be bringing “transparency” to tech or “democratizing access to data.” But these data sharing programs are nothing more than data brokers’ attempts to control the narrative around their unpopular and non-consensual business practices. Critical academic research must not become reliant on profit-driven data pipelines that endanger the safety, privacy, and economic opportunities of millions of people without any meaningful consent.

Data Brokers Do Not Provide Opt-In, Anonymous Data

Location data brokers do not come close to meeting human subjects research standards. This starts with the fact that meaningful opt-in consent is consistently missing from their business practices. In fact, Google concluded that SafeGraph’s practices were so out of line that it banned any apps using the company’s code from its Play Store, and both Apple and Google banned X-Mode from their respective app stores.

Data brokers frequently argue that the data they collect is “opt-in” because a user has agreed to share it with an app—even though the overwhelming majority of users have no idea that it’s being sold on the side to data brokers who in turn sell to businesses, governments, and others. Technically, it is true that users have to opt in to sharing location data with, say, a weather app before it will give them localized forecasts. But no reasonable person believes that this constitutes blanket consent for the laundry list of data sharing, selling, and analysis that any number of shadowy third parties are conducting in the background.

No privacy-preserving aggregation protocols can justify collecting location data from people without their consent.

On top of being collected and shared without consent, the data feeding into data brokers’ products can easily be linked to identifiable people. The companies claim their data is anonymized, but there’s simply no such thing as anonymous location data. Information about where a person has been is itself enough to re-identify them: one widely cited study from 2013 found that researchers could uniquely characterize 50% of people using only two randomly chosen time and location data points. Data brokers today collect sensitive user data from a wide variety of sources, including hidden tracking in the background of mobile apps. While techniques vary and are often hidden behind layers of non-disclosure agreements (or NDAs), the resulting raw data they collect and process is based on sensitive, individual location traces.

Aggregating location data can sometimes preserve individual privacy, given appropriate parameters that take into account the number of people represented in the data set and its granularity. But no privacy-preserving aggregation protocols can justify the initial collection of location data from people without their voluntary, meaningful opt-in consent, especially when that location data is then exploited for profit and PR spin.

Data brokers’ products are notoriously easy to re-identify, especially when combined with other data sets. And combining datasets is exactly what some academic studies are doing. Published studies have combined data broker location datasets with Census data, real-time Google Maps traffic estimates, and local household surveys and state Department of Transportation data. While researchers appear to be simply building the most reliable and comprehensive possible datasets for their work, this kind of merging is also the first step someone would take if they wanted to re-identify the data.

NDAs, NDAs, NDAs

Data brokers are not good sources of information about data brokers, and researchers should be suspicious of any claims they make about the data they provide. As Cracked Labs researcher Wolfie Christl puts it, what data brokers have to offer is “potentially flawed, biased, untrustworthy, or even fraudulent.”

Some researchers incorrectly describe the data they receive from data brokers. For example, one paper describes SafeGraph data as “anonymized human mobility data” or “foot traffic data from opt-in smartphone GPS tracking.” Another describes Spectus as providing “anonymous, privacy-compliant location data” with an “ironclad privacy framework.” Again, this location data is not opt-in, not anonymized, and not privacy-compliant.

Other researchers make internally contradictory claims about location data. One Nature paper characterizes Veraset’s location data as achieving the impossible feat of being both “fine-grained” and “anonymous.” This paper further states it used such specific data points as “anonymized device IDs” and “the timestamps, and precise geographical coordinates of dwelling points” where a device spends more than 5 minutes. Such fine-grained data cannot be anonymous.

All of this should be a red flag for Institutional Review Boards, which need visibility into whether data brokers actually obtain consent.

A Veraset Data Access Agreement obtained by EFF includes a Publicity Clause, giving Veraset control over how its partners may disclose Veraset’s involvement in publications. This includes Veraset’s prerogative to approve language or remain anonymous as the data source. While the Veraset Agreement we’ve seen was with a municipal government, its suggested language appears in multiple academic publications, which suggests a similar agreement may be in play with academics.

A similar pattern appears in papers using X-Mode data: some use nearly verbatim language to describe the company. They even claim its NDA is a good thing for privacy and security, stating: “All researchers processed and analyzed the data under a non-disclosure agreement and were obligated to not share data further and not to attempt to re-identify data.” But those same NDAs prevent academics, journalists, and others in civil society from understanding data brokers’ business practices, or identifying the web of data aggregators, ad tech exchanges, and mobile apps that their data stores are built on.

All of this should be a red flag for Institutional Review Boards, which review proposed human subjects research and need visibility into whether and how data brokers and their partners actually obtain consent from users. Likewise, academics themselves need to be able to confirm the integrity and provenance of the data on which their work relies.

From Insurance Against Bad Press to Accountable Transparency

Data sharing programs with academics are only the tip of the iceberg. To paper over the dangerous role they play in the online data ecosystem, data brokers forge relationships not only with academic institutions and researchers, but also with government authorities, journalists and reporters, and non-profit organizations.

The question of how to balance data transparency with user privacy is not a new one, and it can’t be left to the Verasets and X-Modes of the world to answer. Academic data sharing programs will continue to function as disingenuous PR operations until companies are subjected to data privacy and transparency requirements. While SafeGraph claims its data could pave the way for impactful research in abortion access, the fact remains that the very same data puts actual abortion seekers, providers, and advocates in danger, especially in the wake of Dobbs. The sensitive data location data brokers deal in should only be collected and used with specific, informed consent, and subjects must have the right to withdraw that consent at any time. No such consent currently exists.

We need comprehensive federal consumer data privacy legislation to enforce these standards, with a private right of action to empower ordinary people to bring their own lawsuits against data brokers who violate their privacy rights. Moreover, we must pull back the NDAs to allow research investigating these data brokers themselves: their business practices, their partners, how their data can be abused, and how to protect the people whom data brokers are putting in harm’s way.

Related Issues

Privacy

Big Tech

Locational Privacy

Location Data Brokers

Related Updates

Deeplinks Blog by Hayley Tsukayama | July 20, 2026

Protect Your Privacy with California's DROP Tool

Are you a California resident? Then we've got exciting news for you: there's a tool just for you that lets you take a single, relatively easy step to protect your privacy. It's called a DROP request. (That's Delete Request and Opt-out Platform, if you're fancy). This one bit of paperwork...

Deeplinks Blog by Dave Maass | July 17, 2026

How the Watch Dogs Video Game Series Mirrored and Predicted Real-World Digital Rights Issues

When Ubisoft's Watch Dogs 2 was released in 2016, it was a headtrip for those of us working on digital-rights issues in the Bay Area. The game's missions often felt like they were ripped from the pages of EFF's Deeplinks blog.

Deeplinks Blog by Thorin Klosowski | July 15, 2026

Most Smart Watches, Rings, and Bands Lack Basic Transparency Reports and Key Privacy Features

Oura Rings, Garmin GPS fitness watches, Apple Watches, Whoop bands—every year, more and more tech devices are promising to monitor our health and fitness, guide us toward healthier living, and provide useful health metrics to take to our doctors. But few of these tools provide the sorts of privacy and...

Deeplinks Blog by Rory Mir | July 14, 2026

Don’t Repeat NY’s 3D Printing Blunder

This year the state of New York had the dubious honor of being the first to pass a controversial provision to mandate all 3D printers come with surveillance and censorship. That means not only is there a ticking clock to protect every artist, researcher, engineer, and hobbyist in the state...

Deeplinks Blog by India McKinney | July 9, 2026

The House Passed The KIDS Act—The Senate Should Reject It

Last week, the House voted on the KIDS Act, a disjointed package of legislation that seeks to control Americans’ web browsing and private messaging. The package combines a revised version of the Kids Online Safety Act (KOSA), with several other internet bills, study bills, reporting requirements, and new...

Deeplinks Blog by Paige Collings | July 2, 2026

LGBT Q&A: How Can I Wipe Online Data That Points To My Queer Identity?

This Pride, we’re answering all your digital rights questions in season two of our initiative, LGBT Q&A. You Asked: Is there a way for me to wipe data about me online that could point to my queer identity?

Deeplinks Blog by Paige Collings, Erica Portnoy | June 30, 2026

LGBT Q&A: What Data Are Companies in the UK Collecting When Verifying My Age?

This Pride, we’re answering all your digital rights questions in season two of our initiative, LGBT Q&A. You Asked: I live in the UK, and we have age verification now on a bunch of websites (including Reddit) and now on iPhones. Can you explain what sort of data companies...

Deeplinks Blog by Lena Cohen, Paige Collings | June 26, 2026

EFF to Grindr: This Pride Month, Put Safety and Privacy Over Profits

This Pride month, we’re calling on the dating app Grindr to prioritize LGBTQ+ user safety by making privacy the default across its platform. That means no more sharing personal data with advertisers or training AI on private information without users’ opt-in consent.

Deeplinks Blog by Thorin Klosowski | June 26, 2026

Hate “The Algorithm?” RSS Is One of the Tools You’ve Been Looking For

Since at least the moment Facebook introduced (and apologized for) its News Feed, “the algorithm” has been shorthand for the ways the tech giants control what we see and when we see it. In the age of enshittification, there is a push to reclaim our feeds and networks. Good news...

Deeplinks Blog by Rory Mir, Cliff Braun | June 26, 2026

We Can Still Stop California’s 3D Printer Surveillance Scheme

Ignoring EFF’s warnings about the dangers and impossibility of implementing a new mandate for 3D print surveillance software, the California State Assembly has signed off on legislation to do just that. In the process, legislators amended the bill to make it even more confusing, while failing to address the risks...

Related Issues

Privacy

Big Tech

Locational Privacy

Location Data Brokers

Bad Data “For Good”: How Data Brokers Try to Hide Behind Academic Research

Bad Data “For Good”: How Data Brokers Try to Hide Behind Academic Research

Data Brokers Do Not Provide Opt-In, Anonymous Data

NDAs, NDAs, NDAs

From Insurance Against Bad Press to Accountable Transparency

Related Issues

Related Updates

Related Issues

Follow EFF:

Contact

About

Issues

Updates

Press

Donate