June 17, 2013 | By Micah Lee

How Dozens of Companies Know You're Reading About Those NSA Leaks

As news websites around the globe are publishing story after story about dragnet surveillance, these news sites all have one thing in common: when you visit these websites, your personal information is broadcast to dozens of companies, many of which have the ability to track your surfing habits, and many of which are subject to government data requests.

How Does This Happen?

When you load a webpage in your browser, the page normally includes many elements that get loaded separately, like images, fonts, CSS files, and javascript files. These files can be, and often are, loaded from different domain names hosted by different companies. For example, if a website has a Facebook Like button on it, your browser loads javascript and images from Facebook's server to display that Like button, even if the website you're visiting has nothing to do with Facebook.

Why Does This Matter?

Each time your browser makes a request it sends the following information with it:

  • Your IP address and the exact time of the request
  • User-Agent string: which normally contains the web browser you're using, your browser's version, your operating system, processor information (32-bit, 64-bit), language settings, and other data
  • Referrer: the URL of the website you're coming from—in the case of the Facebook Like button example, your browser tells Facebook which website you're viewing
  • Other HTTP headers which contain potentially identifying information
  • Sometimes tracking cookies

Every company has different practices, but they generally log some or all of this information, perhaps indefinitely.

It takes very little information about your web browser to build a unique fingerprint of it. See EFF's Panopticlick website to see how unique and trackable your web browser is even without the use of tracking cookies. You can read more in our Primer on Information Theory and Privacy.

Who is Using Third Party Resources?

Here are some examples of prominent news websites that have been reporting on surveillance issues and which domain names they load third party resources from as of June 2013:

The Guardian, which is hosted at guardian.co.uk and was the first to publish about the recent NSA spying leaks, loads scripts from:

guim.co.uk ajax.googleapis.com criteo.com amazonaws.com optimizely.com facebook.com twitter.com google.com quantserve.com wunderloop.net outbrain.com chartbeat.com

The Washington Post, which is hosted at www.washingtonpost.com and was published the first story about PRISM alongside the Guardian, loads scripts from:

troveread.com wpdigital.net doubleclick.net criteo.com omtrdc.net theroot.com slate.com expressnightout.com trove.com ooyala.com adsonar.com mathtag.com spotxchange.com bloomberg.com revsci.net scorecardresearch.com chartbeat.com twitter.com cloudfront.net

The New York Times, which is hosted at nytimes.com, loads scripts from:

nyt.com doubleclick.net krxd.net moatads.com googlesyndication.com typekit.com revsci.net scorecardresearch.com imrworldwide.com chartbeat.com

The Wall Street Journal, which is hosted at online.wsj.com, loads scripts from:

wsj.net msn.com axf8.net peer39.net typekit.net llnwd.net imrworldwide.com facebook.net dowjoneson.com akamai.net doubleclick.net chartbeat.com bluekai.com

All of these websites, by loading third party resources from servers controlled by major providers like Facebook, Google, and others, are sending information about their visitors to companies subject to US government data requests. While these news companies themselves could directly recieve requests for this data, the fact that they voluntarily send this data to the same small, centralized group of third parties makes these third parties convenient and attractive targets to collect visitor information from vast swaths of the web. Once a website sends data to a third party, it no longer has the power to stand up for its users against unconstitutional government requests for that data.

These news websites are not alone. Other websites that send information about all of their visitors to large companies that are subject to US government data requests include CNN, Huffington Post, MSNBC, BBC, Al Jazeera, BoingBoing, Slashdot, WordPress.com, Occupy Wall StreetInternet Defense League, and hundreds of thousands of others.

These sites are for the most part not actively attempting to diminish the privacy of their users. Rather, there are several factors that converge that make it commonplace to include third party resources. First, services like Google Analytics are very popular and provide an easy way to do analytics. Second, it's commonly considered a good practice for websites to include jQuery and load webfonts from servers run by Google, since these will load fast and reduce the burden on your servers. Finally, including Facebook Like buttons and other social media widgets on your website is one of the best ways to gain social media traction.

It's time for these "best practices" to change, so we can avoid giving the government a one-stop shop for your data.  In a future blog post we'll discuss ways that web developers and companies can mitigate privacy risks related to third party resources from their websites.

In contrast to the websites listed above, when you visit eff.org, your browser doesn't make any requests to any third parties. We've long been aware of the privacy dangers inherent in loading third party resources, and our privacy policy states:

We do occasionally allow our website to interact with other services, like social networking, mapping, and video hosting websites. It is our policy not to include third-party resources when users initially load our web pages, but we may dynamically include them later after giving the user a chance to opt-in. If you believe a third-party resource is automatically loading, please let us know so we can address it.

The Importance of a Strong Do Not Track Standard

Given the proliferation of information flowing to third parties, it is critical that we develop a strong Do Not Track (DNT) standard that forbids third parties from collecting and retaining information derived from a user's visit to a website once that user has enabled DNT in her browser. Unfortunately, the W3C Tracking Protection Working Group is working on a standard that is far too watered down, and hence unlikely to offer real privacy protections to users. This leaves users exposed to data collection not only by the companies themselves, but also by the NSA and other agencies who might seek to obtain the information from these third parties. We very much hope that in the next month or so before the working group winds down, we arrive at a strong Do Not Track standard that helps to protect users from this type of abuse.

What Can Users Do?

But users need not wait for Do Not Track or any standard. A good start to protecting yourself from ubiquitous web tracking is to do these 4 Simple Changes to Stop Online Tracking.

If you really want to be in control of exactly which third party requests your browser makes, use RequestPolicy for Firefox. It's a browser extension that blocks all third party resources by default, and then lets you choose which resources you want your browser to load for which websites. But be warned, the problem with third party resources is so prevalent that RequestPolicy will break the layout and functionality of almost every single site until you allow specific third party requests for those sites.

It's unfortunate that protecting privacy on the web requires determined users that are willing to be inconvenienced for privacy. We need to work towards a long-term technical ecosystem that will better protect the privacy of who visits what websites. We also need strong privacy laws that protect user data from unconstitutional surveillance, and the transparency necessary to ensure these laws cannot be bypassed in secret.

Please join our fight against NSA spying.

Deeplinks Topics

Stay in Touch

NSA Spying

EFF is leading the fight against the NSA's illegal mass surveillance program. Learn more about what the program is, how it works, and what you can do.

Follow EFF

BREAKING: Obama rejects laws mandating backdoors in our communications, but doesn't go nearly far enough: https://eff.org/r.wv2s

Oct 9 @ 12:13am

The final TPP text is still secret, but based on what we know so far it's a terrible deal for digital rights: https://eff.org/r.tu4p

Oct 8 @ 6:00pm

Great news! Gov. Brown has signed SB 741—a great first step towards limiting the use & acquisition of IMSI catchers in California.

Oct 8 @ 3:52pm
JavaScript license information