Bad Data “For Good”: How Data Brokers Try to Hide Behind Academic Research

Share It

When data broker SafeGraph got caught selling location information on Planned Parenthood visitors, it had a public relations trick up its sleeve. After the company agreed to remove family planning center data from its platforms in response to public outcry, CEO Auren Hoffman tried to flip the narrative: he claimed that his company’s harvesting and sharing of sensitive data was, in fact, an engine for beneficial research on abortion access. He even argued that SafeGraph’s post-scandal removal of the clinic data was the real problem: “Once we decided to take it down, we had hundreds of researchers complain to us about…taking that data away from them.” Of course, when pressed, Hoffman could not name any individual researchers or institutions.

SafeGraph is not alone among location data brokers in trying to “research wash” its privacy-invasive business model and data through academic work. Other shady actors like Veraset, Cuebiq, Spectus, and X-Mode also operate so-called “data for good” programs with academics, and have seized on the pandemic to expand them. These data brokers provide location data to academic researchers across disciplines, with resulting publications appearing in peer-reviewed venues as prestigious as Nature and the Proceedings of the National Academy of Sciences. These companies’ data is so widely used in human mobility research—from epidemic forecasting and emergency response to urban planning and business development—that the literature has progressed to meta-studies comparing, for example, Spectus, X-Mode, and Veraset datasets.

Data brokers variously claim to be bringing “transparency” to tech or “democratizing access to data.” But these data sharing programs are nothing more than data brokers’ attempts to control the narrative around their unpopular and non-consensual business practices. Critical academic research must not become reliant on profit-driven data pipelines that endanger the safety, privacy, and economic opportunities of millions of people without any meaningful consent.

Data Brokers Do Not Provide Opt-In, Anonymous Data

Location data brokers do not come close to meeting human subjects research standards. This starts with the fact that meaningful opt-in consent is consistently missing from their business practices. In fact, Google concluded that SafeGraph’s practices were so out of line that it banned any apps using the company’s code from its Play Store, and both Apple and Google banned X-Mode from their respective app stores.

Data brokers frequently argue that the data they collect is “opt-in” because a user has agreed to share it with an app—even though the overwhelming majority of users have no idea that it’s being sold on the side to data brokers who in turn sell to businesses, governments, and others. Technically, it is true that users have to opt in to sharing location data with, say, a weather app before it will give them localized forecasts. But no reasonable person believes that this constitutes blanket consent for the laundry list of data sharing, selling, and analysis that any number of shadowy third parties are conducting in the background.

No privacy-preserving aggregation protocols can justify collecting location data from people without their consent.

On top of being collected and shared without consent, the data feeding into data brokers’ products can easily be linked to identifiable people. The companies claim their data is anonymized, but there’s simply no such thing as anonymous location data. Information about where a person has been is itself enough to re-identify them: one widely cited study from 2013 found that researchers could uniquely characterize 50% of people using only two randomly chosen time and location data points. Data brokers today collect sensitive user data from a wide variety of sources, including hidden tracking in the background of mobile apps. While techniques vary and are often hidden behind layers of non-disclosure agreements (or NDAs), the resulting raw data they collect and process is based on sensitive, individual location traces.

Aggregating location data can sometimes preserve individual privacy, given appropriate parameters that take into account the number of people represented in the data set and its granularity. But no privacy-preserving aggregation protocols can justify the initial collection of location data from people without their voluntary, meaningful opt-in consent, especially when that location data is then exploited for profit and PR spin.

Data brokers’ products are notoriously easy to re-identify, especially when combined with other data sets. And combining datasets is exactly what some academic studies are doing. Published studies have combined data broker location datasets with Census data, real-time Google Maps traffic estimates, and local household surveys and state Department of Transportation data. While researchers appear to be simply building the most reliable and comprehensive possible datasets for their work, this kind of merging is also the first step someone would take if they wanted to re-identify the data.

NDAs, NDAs, NDAs

Data brokers are not good sources of information about data brokers, and researchers should be suspicious of any claims they make about the data they provide. As Cracked Labs researcher Wolfie Christl puts it, what data brokers have to offer is “potentially flawed, biased, untrustworthy, or even fraudulent.”

Some researchers incorrectly describe the data they receive from data brokers. For example, one paper describes SafeGraph data as “anonymized human mobility data” or “foot traffic data from opt-in smartphone GPS tracking.” Another describes Spectus as providing “anonymous, privacy-compliant location data” with an “ironclad privacy framework.” Again, this location data is not opt-in, not anonymized, and not privacy-compliant.

Other researchers make internally contradictory claims about location data. One Nature paper characterizes Veraset’s location data as achieving the impossible feat of being both “fine-grained” and “anonymous.” This paper further states it used such specific data points as “anonymized device IDs” and “the timestamps, and precise geographical coordinates of dwelling points” where a device spends more than 5 minutes. Such fine-grained data cannot be anonymous.

All of this should be a red flag for Institutional Review Boards, which need visibility into whether data brokers actually obtain consent.

A Veraset Data Access Agreement obtained by EFF includes a Publicity Clause, giving Veraset control over how its partners may disclose Veraset’s involvement in publications. This includes Veraset’s prerogative to approve language or remain anonymous as the data source. While the Veraset Agreement we’ve seen was with a municipal government, its suggested language appears in multiple academic publications, which suggests a similar agreement may be in play with academics.

A similar pattern appears in papers using X-Mode data: some use nearly verbatim language to describe the company. They even claim its NDA is a good thing for privacy and security, stating: “All researchers processed and analyzed the data under a non-disclosure agreement and were obligated to not share data further and not to attempt to re-identify data.” But those same NDAs prevent academics, journalists, and others in civil society from understanding data brokers’ business practices, or identifying the web of data aggregators, ad tech exchanges, and mobile apps that their data stores are built on.

All of this should be a red flag for Institutional Review Boards, which review proposed human subjects research and need visibility into whether and how data brokers and their partners actually obtain consent from users. Likewise, academics themselves need to be able to confirm the integrity and provenance of the data on which their work relies.

From Insurance Against Bad Press to Accountable Transparency

Data sharing programs with academics are only the tip of the iceberg. To paper over the dangerous role they play in the online data ecosystem, data brokers forge relationships not only with academic institutions and researchers, but also with government authorities, journalists and reporters, and non-profit organizations.

The question of how to balance data transparency with user privacy is not a new one, and it can’t be left to the Verasets and X-Modes of the world to answer. Academic data sharing programs will continue to function as disingenuous PR operations until companies are subjected to data privacy and transparency requirements. While SafeGraph claims its data could pave the way for impactful research in abortion access, the fact remains that the very same data puts actual abortion seekers, providers, and advocates in danger, especially in the wake of Dobbs. The sensitive data location data brokers deal in should only be collected and used with specific, informed consent, and subjects must have the right to withdraw that consent at any time. No such consent currently exists.

We need comprehensive federal consumer data privacy legislation to enforce these standards, with a private right of action to empower ordinary people to bring their own lawsuits against data brokers who violate their privacy rights. Moreover, we must pull back the NDAs to allow research investigating these data brokers themselves: their business practices, their partners, how their data can be abused, and how to protect the people whom data brokers are putting in harm’s way.

Related Issues

Privacy

Big Tech

Locational Privacy

Location Data Brokers

Related Updates

Deeplinks Blog by Saira Hussain | February 26, 2026

Victory! Tenth Circuit Finds Fourth Amendment Doesn’t Support Broad Search of Protesters’ Devices and Digital Data

In a big win for protesters’ rights, the U.S. Court of Appeals for the Tenth Circuit overturned a lower court’s dismissal of a challenge to sweeping warrants to search a protester’s devices and digital data and a nonprofit’s social media data.The case, Armendariz v. City of Colorado Springs,...

Deeplinks Blog by Mario Trujillo | February 13, 2026

Seven Billion Reasons for Facebook to Abandon its Face Recognition Plans

Meta’s analysis that it can avoid scrutiny by releasing a privacy invasive product during a time of political crisis is craven and morally bankrupt. It is also dead wrong.

Deeplinks Blog by Mario Trujillo | February 10, 2026

Open Letter to Tech Companies: Protect Your Users From Lawless DHS Subpoenas

EFF is calling on technology companies like Meta and Google to stand up for their users by resisting DHS lawless administrative subpoenas for user data.

Deeplinks Blog by Thorin Klosowski | January 29, 2026

Introducing Encrypt It Already

End-to-end encryption protects what we say and what we store in a way that gives users—not companies or governments—control over data. These sorts of privacy-protective features should be the status quo across a range of products, from fitness wearables to notes apps, but instead it’s a rare feature limited to...

Deeplinks Blog by Lena Cohen | January 29, 2026

Google Settlement May Bring New Privacy Controls for Real-Time Bidding

EFF has long warned about the dangers of the “real-time bidding” (RTB) system powering nearly every ad you see online. A proposed class-action settlement with Google over their RTB system is a step in the right direction towards giving people more control over their data. Truly curbing the harms of...

Deeplinks Blog by Cindy Cohn | January 26, 2026

EFF Statement on ICE and CBP Violence

In the past year, ICE and CBP have descended into utter lawlessness, repeatedly refusing to exercise or submit to the democratic accountability required by the Constitution and our system of laws. These violations must stop now.

Deeplinks Blog by Josh Richman | January 15, 2026

Report: ICE Using Palantir Tool That Feeds On Medicaid Data

ICE is using a Palantir tool that uses Medicaid and other government data to stalk people for arrest. This is exactly the kind of data privacy abuse that EFF has been warning about.

Deeplinks Blog by Sarah Hamid | December 30, 2025

EFF's Investigations Expose Flock Safety's Surveillance Abuses: 2025 in Review

Throughout 2025, EFF conducted groundbreaking investigations into Flock Safety's automated license plate reader (ALPR) network, revealing a system designed to enable mass surveillance and susceptible to grave abuses. Our research sparked state and federal investigations, drove landmark litigation, and exposed dangerous expansion into always-listening voice detection technology. We documented...

Deeplinks Blog by Hudson Hongo | December 28, 2025

EFFector Audio Speaks Up for Our Rights: 2025 Year in Review

EFFector's audio companion features exclusive interviews where EFF's lawyers, activists, and technologists dig deeper into the biggest stories in privacy, free speech, and innovation. Here are some of the best interviews from EFFector Audio in 2025.

Deeplinks Blog by Sarah Hamid | December 28, 2025

Procurement Power—When Cities Realized They Can Just Say No: 2025 in Review

In 2025, elected officials across the country began treating surveillance technology purchases differently: not as inevitable administrative procurements handled by police departments, but as political decisions subject to council oversight and constituent pressure. This shift proved to be the most effective anti-surveillance strategy of the year.Since February, at least 23...

Related Issues

Privacy

Big Tech

Locational Privacy

Location Data Brokers

Bad Data “For Good”: How Data Brokers Try to Hide Behind Academic Research

Bad Data “For Good”: How Data Brokers Try to Hide Behind Academic Research

Data Brokers Do Not Provide Opt-In, Anonymous Data

NDAs, NDAs, NDAs

From Insurance Against Bad Press to Accountable Transparency

Related Issues

Related Updates

Related Issues

Follow EFF:

Contact

About

Issues

Updates

Press

Donate