How Dozens of Companies Know You're Reading About Those NSA Leaks
As news websites around the globe are publishing story after story about dragnet surveillance, these news sites all have one thing in common: when you visit these websites, your personal information is broadcast to dozens of companies, many of which have the ability to track your surfing habits, and many of which are subject to government data requests.
How Does This Happen?
Why Does This Matter?
Each time your browser makes a request it sends the following information with it:
- Your IP address and the exact time of the request
- User-Agent string: which normally contains the web browser you're using, your browser's version, your operating system, processor information (32-bit, 64-bit), language settings, and other data
- Referrer: the URL of the website you're coming from—in the case of the Facebook Like button example, your browser tells Facebook which website you're viewing
- Other HTTP headers which contain potentially identifying information
- Sometimes tracking cookies
Every company has different practices, but they generally log some or all of this information, perhaps indefinitely.
It takes very little information about your web browser to build a unique fingerprint of it. See EFF's Panopticlick website to see how unique and trackable your web browser is even without the use of tracking cookies. You can read more in our Primer on Information Theory and Privacy.
Who is Using Third Party Resources?
Here are some examples of prominent news websites that have been reporting on surveillance issues and which domain names they load third party resources from as of June 2013:
The Guardian, which is hosted at guardian.co.uk and was the first to publish about the recent NSA spying leaks, loads scripts from:
guim.co.uk ajax.googleapis.com criteo.com amazonaws.com optimizely.com facebook.com twitter.com google.com quantserve.com wunderloop.net outbrain.com chartbeat.com
The Washington Post, which is hosted at www.washingtonpost.com and was published the first story about PRISM alongside the Guardian, loads scripts from:
troveread.com wpdigital.net doubleclick.net criteo.com omtrdc.net theroot.com slate.com expressnightout.com trove.com ooyala.com adsonar.com mathtag.com spotxchange.com bloomberg.com revsci.net scorecardresearch.com chartbeat.com twitter.com cloudfront.net
The New York Times, which is hosted at nytimes.com, loads scripts from:
nyt.com doubleclick.net krxd.net moatads.com googlesyndication.com typekit.com revsci.net scorecardresearch.com imrworldwide.com chartbeat.com
The Wall Street Journal, which is hosted at online.wsj.com, loads scripts from:
wsj.net msn.com axf8.net peer39.net typekit.net llnwd.net imrworldwide.com facebook.net dowjoneson.com akamai.net doubleclick.net chartbeat.com bluekai.com
All of these websites, by loading third party resources from servers controlled by major providers like Facebook, Google, and others, are sending information about their visitors to companies subject to US government data requests. While these news companies themselves could directly recieve requests for this data, the fact that they voluntarily send this data to the same small, centralized group of third parties makes these third parties convenient and attractive targets to collect visitor information from vast swaths of the web. Once a website sends data to a third party, it no longer has the power to stand up for its users against unconstitutional government requests for that data.
These news websites are not alone. Other websites that send information about all of their visitors to large companies that are subject to US government data requests include CNN, Huffington Post, MSNBC, BBC, Al Jazeera, BoingBoing, Slashdot, WordPress.com, Occupy Wall Street, Internet Defense League, and hundreds of thousands of others.
These sites are for the most part not actively attempting to diminish the privacy of their users. Rather, there are several factors that converge that make it commonplace to include third party resources. First, services like Google Analytics are very popular and provide an easy way to do analytics. Second, it's commonly considered a good practice for websites to include jQuery and load webfonts from servers run by Google, since these will load fast and reduce the burden on your servers. Finally, including Facebook Like buttons and other social media widgets on your website is one of the best ways to gain social media traction.
It's time for these "best practices" to change, so we can avoid giving the government a one-stop shop for your data. In a future blog post we'll discuss ways that web developers and companies can mitigate privacy risks related to third party resources from their websites.
We do occasionally allow our website to interact with other services, like social networking, mapping, and video hosting websites. It is our policy not to include third-party resources when users initially load our web pages, but we may dynamically include them later after giving the user a chance to opt-in. If you believe a third-party resource is automatically loading, please let us know so we can address it.
The Importance of a Strong Do Not Track Standard
Given the proliferation of information flowing to third parties, it is critical that we develop a strong Do Not Track (DNT) standard that forbids third parties from collecting and retaining information derived from a user's visit to a website once that user has enabled DNT in her browser. Unfortunately, the W3C Tracking Protection Working Group is working on a standard that is far too watered down, and hence unlikely to offer real privacy protections to users. This leaves users exposed to data collection not only by the companies themselves, but also by the NSA and other agencies who might seek to obtain the information from these third parties. We very much hope that in the next month or so before the working group winds down, we arrive at a strong Do Not Track standard that helps to protect users from this type of abuse.
What Can Users Do?
But users need not wait for Do Not Track or any standard. A good start to protecting yourself from ubiquitous web tracking is to do these 4 Simple Changes to Stop Online Tracking.
If you really want to be in control of exactly which third party requests your browser makes, use RequestPolicy for Firefox. It's a browser extension that blocks all third party resources by default, and then lets you choose which resources you want your browser to load for which websites. But be warned, the problem with third party resources is so prevalent that RequestPolicy will break the layout and functionality of almost every single site until you allow specific third party requests for those sites.
It's unfortunate that protecting privacy on the web requires determined users that are willing to be inconvenienced for privacy. We need to work towards a long-term technical ecosystem that will better protect the privacy of who visits what websites. We also need strong privacy laws that protect user data from unconstitutional surveillance, and the transparency necessary to ensure these laws cannot be bypassed in secret.