This is part 3 of a series of posts on user tracking on the modern web. You can also read part 1 and part 2.
Whenever you visit a web page, your browser sends a "User Agent" header to the website saying precisely which operating system and web browser you are using. This information could help distinguish Internet users from one another because these versions differ, often considerably, from person to person. We recently ran an experiment to see to what extent this information could be used to track people (for instance, if someone deletes their browser cookies, would the User Agent, alone or in combination with some other detail, be unique enough to let a site recognize them and re-create their old cookie?).
Our experiment to date has shown that the browser User Agent string usually carries 5-15 bits of identifying information (about 10.5 bits on average). That means that on average, only one person in about 1,500 (210.5) will have the same User Agent as you. On its own, that isn't enough to recreate cookies and track people perfectly, but in combination with another detail like geolocation to a particular ZIP code or having an uncommon browser plugin installed, the User Agent string becomes a real privacy problem.
User Agents: An Example of Browser Characteristics Doubling As Tracking Tools
When we analyze the privacy of web users, we usually focus on user accounts, cookies, and IP addresses, because those are the usual means by which a request to a web server can be associated with other requests and/or linked back to an individual human being, computer, or local network.
Typical advice for improving your privacy as you surf the web might include blocking or deleting cookies (and supercookies), and using proxy servers or tools like Tor to hide your IP address.
It's not intuitively obvious that a User Agent poses a similar risk to a unique tracking cookie. After all, cookies were designed, in part, to help web sites distinguish and recognize individual browsers, and User Agents weren't. And there could be millions of people out there who use the same browser and operating system that you do. But let's examine the matter more closely. A typical User Agent string looks something like this:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:184.108.40.206) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
In fact, that was the most common user agent string among browsers visiting the EFF website during the test period: Firefox 3.5.3 running on Windows XP. Notice that the operating system and browser versions are extremely specific and that the User Agent also includes the user's preferred language. There are a lot of things that can vary inside that string, and those variations can be used to distinguish and track people as they browse the Web.
Our Results to date on User Agent Identifiability
We ran an experiment to measure precisely how identifying the User Agent strings would have been among a 36-hour anonymized sample of requests to the EFF website. The following table shows different classes of browser, with the number of bits for best and average case User Agents within that class:
Identifying information in various classes of browsers
|Browser class||Avg. identifying information||Minimum identifying information||(Least identifying user agent)|
|Modern Windows Desktops||10.3-11.3 bits||4.6 - 5.0 bits||Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:220.127.116.11) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)|
|Internet Explorer||13.2-13.5 bits||6.3 - 7.2 bits||Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)|
|Firefox||8.6 - 9.4 bits||4.6 - 5.0 bits||Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)|
|Chrome||7.5-8.5 bits||5.7 - 6.2 bits||Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.0 (KHTML, like Gecko) Chrome/22.214.171.124 Safari/532.0|
|Linux||11.8-13.15 bits||6.6-7.9 bits||Mozilla/5.0 (X11; U; Linux i686; en-US; rv:126.96.36.199) Gecko/2009090216 Ubuntu/9.04 (jaunty) Firefox/3.0.14|
|Ubuntu||9.6 - 11.7 bits||6.6 - 7.8 bits||Mozilla/5.0 (X11; U; Linux i686; en-US; rv:188.8.131.52) Gecko/2009090216 Ubuntu/9.04 (jaunty) Firefox/3.0.14|
|Debian||13.5-15.3 bits||10.50 - 11.7 bits||Mozilla/5.0 (X11; U; Linux i686; en-US; rv:184.108.40.206) Gecko/2009091010 Iceweasel/3.0.6 (Debian-3.0.6-3)|
|Macintosh||8.8-9.3 bits||5.8-5.8 bits||Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:220.127.116.11) Gecko/20090824 Firefox/3.5.3|
|iPhone||10.8 - 11.3 bits||8.7 - 9.3 bits||Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_1 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7C144 Safari/528.16|
|Blackberry||14.7 - 15.5 bits||12.0 - 12.7 bits||BlackBerry9530/18.104.22.168 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105|
|Android||14.4 - 14.4 bits||12.2-12.4 bits||Mozilla/5.0 (Linux; U; Android 1.6; en-us; T-Mobile G1 Build/DRC83) AppleWebKit/528.5+ (KHTML, like Gecko) Version/3.1.2 Mobile Safari/525.20.1|
There are several remarkable facts about this dataset. Overall, it's amazing how identifying User Agent strings are. 10.5 bits is about one-third of the total information required to identify an Internet user.
It's also surprising that platforms like Firefox and Ubuntu, which have lower market penetration, are on average comparable or even less identifying than Windows and Microsoft Internet Explorer, which have very large userbases and should therefore have larger crowds to hide in. Part of this may be that visitors to the EFF website are over-representative of the former groups, but it's also clear that a large part of this is that Internet Explorer has a very high level of variation in its User Agent strings, with typical examples looking something like this:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618)
All of the different library and component versions there essentially function as partial tracking tokens.
We've launched a project called Panopticlick to collect a new dataset that extends this analysis from User Agents to the full browser plugin and configuration space. You can use Panopticlick to receive a uniqueness measurement for your own browser, and help EFF's privacy research efforts at the same time!
During September 2009, we took a 36 hour sample of anonymized requests to the eff.org web server by hashing the IP address of each request with a random salt, and throwing away the salt. We then calculated the amount of identifying information conveyed by each browser. Identifying information is measured in "bits of entropy", and says how large a crowd the information would reveal you within. Browsers usually convey between 5 and 15 bits of identifying information, about 10.5 bits on average. 10 bits of identifying information would allow you to be picked out of a crowd of 210, or 1024 people. 10.5 bits of information identifies can identify people from crowds of just under 1,448.
- 1. One bound is based on a count in which each hashed IP address is counted for only one request; the other bound is based on treating each hit as a unique browser. In almost all cases, the true amount of identifying information pertaining to the browser should lie between these two values.