How to Protect Privacy When Aggregating Location Data to Fight COVID-19

Share It

Español

As governments, the private sector, NGOs, and others mobilize to fight the COVID-19 pandemic, we’ve seen calls to use location information—typically drawn from GPS and cell tower data—to inform public health efforts. Among the proposed uses of location data, one of the most widely discussed is analyzing aggregated data about which locations people are visiting, whether they are traveling less, and other collective measurements of individuals’ movement. This analysis might be used to inform judgments about the effectiveness of shelter-in-place orders and other social distancing measures. Projects making use of aggregated location data have graded residents of each state on their social distancing and visualized the travel patterns of people on returning from spring break. Most recently, Google announced that it would publish ongoing “COVID-19 Community Mobility Reports,” which draw on the company’s store of location data to report on changes at a community level in people’s travel to various locations such as grocery stores, parks, and mass transit stations.

Compared to using individualized location data for contact tracing—as many governments around the world are already doing—deriving public health insights from aggregated location data poses far fewer privacy and other civil liberties risks such as restrictions on freedom of expression and association. However, even “aggregated” location data comes with potential pitfalls. This post discusses those pitfalls and describes some high-level best practices for those who seek to use aggregated location data in the fight against COVID-19.

What Does “Aggregated” Mean?

At the most basic level, there’s a difference between “aggregated” location data and “anonymized” or “deidentified” location data. Practically speaking, there is no way to deidentify individual location data. Information about where a person is and has been itself is usually enough to reidentify them. Someone who travels frequently between a given office building and a single family home is probably unique in those habits and therefore identifiable from other readily identifiable sources. One widely cited study from 2013 even found that researchers could uniquely characterize 50% of people using only two randomly chosen time and location data points.

Aggregation to preserve individual privacy, on the other hand, can potentially be useful. Aggregating location data involves producing counts of behaviors instead of detailed timelines of individual location history. For instance, an aggregation might tell you how many people’s phones reported their location as being in a certain city within the last month. Or it might tell you, for a given area in a city, how many people traveled to that area during each hour in the last month. Whether or not a given scheme for aggregating location data works to improve privacy depends deeply on the details: On what timescale is the data aggregated? How large of an area does each count cover? When is a count considered too low and dropped from the data set?

For example, Facebook uses differential privacy techniques such as injecting statistical noise into the dataset as part of the methodology of its “Data for Good” project. This project aggregates Facebook users’ location data and shares it with various NGOs, academics, and governments engaged in responding to natural disasters and fighting the spread of disease, including COVID-19.

There is no single magic formula for aggregating individual location data such that it provides insights that might be useful for some decisions and yet still cannot be reidentified. Instead, it’s a question of tradeoffs. As a matter of public policy, it is critical that user privacy not be sacrificed when creating aggregated location datasets to inform decisions about COVID-19 or anything else.

How Do We Evaluate the Use of Aggregated Location Data to Fight COVID-19?

Because aggregation reduces the risk of revealing intimate information about individuals’ lives, we are less concerned about this use of location data to fight COVID-19 compared to individualized tracking. Of course, the choice of the aggregation parameters generally needs to be done by domain experts. As in the Facebook and Google examples above, these experts will often be working within private companies with proprietary access to the data. Even if they make all the right choices, the public needs to be able to review these choices because the companies are sharing the public’s data. For the experts doing the aggregation, there’s often pressure to reduce the privacy properties in order to generate an aggregate data set that a particular decision-maker claims must be more granular in order to be meaningful to them. Ideally, companies would also consult outside experts before moving forward with plans to aggregate and share location data. Getting public input on whether a given data-sharing scheme sufficiently preserves privacy can help reduce the bias that such pressure creates.

As a result, companies like Google that produce reports based on aggregated location data from users should release their full methodology as well as information about who these reports are shared with and for what purpose. To the extent they only share certain data with selected “partners,” these groups should agree not to use the data for other purposes or attempt to re-identify individuals whose data is included in the aggregation. And, as Google has already done, companies should pledge to end the use of this data when the need to fight COVID-19 subsides.

For any data sharing plan, consent is critical: Did each person consent to the method of data collection, and did they consent to the use? Consent must be specific, informed, opt-in, and voluntary. Ordinarily, users should have the choice of whether to opt-in to every new use of their data, but we recognize that obtaining consent to aggregate previously acquired location data to fight COVID-19 may be difficult with sufficient speed to address the public health need. That's why it's especially important that users should be able to review and delete their data at any time. The same should be true for anyone who truly consents to the collection of this information. Many entities that hold location information, like data brokers that collect location from ads and hidden tracking in apps, can’t meet these consent standards. Yet many of the uses of aggregated location data that we’ve seen in response to COVID-19 draw from these tainted sources. At the very least, data brokers should not profit from public health insights derived from their stores of location data, including through free advertising. Nor should they be allowed to “COVID wash” their business practices: the existence of these data stores is unethical, and should be addressed with new consumer data privacy laws.

Finally, we should remember that location data collected from smartphones has limitations and biases. Smartphone ownership remains a proxy for relative wealth, even in regions like the United States where 80% of adults have a smartphone. People without smartphones tend to already be marginalized, so making public policy based on aggregate location data can wind up disregarding the needs of those who simply don’t show up in the data, and who may need services the most. Even among the people with smartphones, the seeming authoritativeness and comprehensiveness of large scale data can cause leaders to reach erroneous conclusions that overlook the needs of people with fewer resources. For example, data showing that people in one region are traveling more than people in another region might not mean, as first appears, that these people are failing to take social distancing seriously. It might mean, instead, that they live in an underserved area and must thus travel longer distances for essential services like groceries and pharmacies.

In general, our advice to organizations that consider sharing aggregate location data: Get consent from the users who supply the data. Be cautious about the details. Aggregate on the highest level of generality that will be useful. Share your plans with the public before you release the data. And avoid sharing “deidentified” or “anonymized” location data that is not aggregated—it doesn’t work.

Related Issues

COVID-19 and Digital Rights

Related Updates

Deeplinks Blog by Bennett Cyphers | November 10, 2021

Data Broker Veraset Gave Bulk Device-Level GPS Data to DC Government

In the first weeks of the COVID-19 pandemic, a location data broker called Veraset offered officials in Washington, DC full access to its proprietary database of “highly sensitive” device-level GPS data, collected from cell phones, for the entire DC metro area.The officials accepted the offer, according to public...

Deeplinks Blog by Alexis Hancock, Adam Schwartz, Jon Callas | August 31, 2021

Vaccine Passport Missteps We Should Not Repeat

Vaccine mandates are becoming increasingly urgent from public health officials and various governments. As they roll out, we must protect users of vaccine passports and those who do not want to use—or cannot use—a digitally scannable means to prove vaccination. We cannot let the tools used to fight for public...

Deeplinks Blog by Alexis Hancock, Adam Schwartz, Jon Callas | June 25, 2021

Decoding California's New Digital Vaccine Records and Potential Dangers

This post was updated on 6/29/21 to more accurately describe how New York is running its voluntary vaccine passport programThe State of California recently released what it calls a “Digital COVID-19 Vaccine Record.” It is part of that state’s recent easing of...

Deeplinks Blog by Alexis Hancock, Adam Schwartz, Hayley Tsukayama | April 22, 2021

No Digital Vaccine Bouncers

The U.S. is distributing more vaccines and the population is gradually becoming vaccinated. Returning to regular means of activity and movement has become the main focus for many Americans who want to travel or see family. An increasingly common proposal to get there is digital proof-of-vaccination,...

Deeplinks Blog by Jason Kelley | February 5, 2021

Online-Only Vaccine Distribution Will Leave Too Many Behind

As the rollout of COVID-19 vaccines has begun across the U.S., there have been numerous reports of people having trouble getting it—not just because of its limited availability, but also because some counties and states have chosen to require computer and Internet access to sign up. This...

Deeplinks Blog by Rory Mir | February 1, 2021

Keeping Up With "At Home with EFF": From Student Privacy to Online Censorship

We're excited to have yet another "At Home with the EFF" event coming up this Wednesday, February 3rd, with panels making sense of all the online censorship issues emerging this year. From the takedown of Parler to Trump getting banned, we'll offer an insider look on how censorship decisions...

Deeplinks Blog by Adam Schwartz | January 5, 2021

COVID-19 and Surveillance Tech: Year in Review 2020

Location tracking apps. Spyware to enforce quarantine. Immunity passports. Throughout 2020, governments around the world deployed invasive surveillance technologies to contain the COVID-19 outbreak.But heavy-handed tactics like these undercut public trust in government, precisely when trust is needed most. They also invade our privacy and...

Deeplinks Blog by Gennie Gebhart | December 24, 2020

How COVID Changed Content Moderation: Year in Review 2020

In a year that saw every facet of online life reshaped the coronavirus pandemic, online content moderation and platform censorship were no exception. After a successful Who Has Your Back? campaign in 2019 to encourage large platforms to adopt best practices and endorse the Santa Clara Principles, 2020...

Deeplinks Blog by Alexis Hancock, Hayley Tsukayama | December 16, 2020

Vaccine Passports: A Stamp of Inequity

A COVID vaccine has been approved and vaccinations have begun. With them have come proposals of ways to prove you have been vaccinated, based on the presumption that vaccination renders a person immune and unable to spread the virus. The latter is ...

Deeplinks Blog by Gennie Gebhart, Jason Kelley | December 10, 2020

California Has a New COVID Exposure Notification App

Today California joined dozens of other states and countries in launching its COVID-19 exposure notification app, CA Notify, built on Google and Apple’s Exposure Notification API. Google and Apple’s API is already used in 20 other U.S. states, as well as countries including Germany, the...

Related Issues

COVID-19 and Digital Rights

How to Protect Privacy When Aggregating Location Data to Fight COVID-19

How to Protect Privacy When Aggregating Location Data to Fight COVID-19

What Does “Aggregated” Mean?

How Do We Evaluate the Use of Aggregated Location Data to Fight COVID-19?

Related Issues

Related Updates

Related Issues

Follow EFF:

Contact

About

Issues

Updates

Press

Donate