Bad News For Reader Privacy: Google News Doesn't Index HTTPS Sites
In the ongoing effort to encrypt the entire web, news sites are an area of special importance. After all, the articles you choose to read can say a lot about you: how close you're following a political race, for example, can indicate where you stand on sensitive issues, or give clues about personal connections to the people or organizations being covered.
While a few news sites offer their content over secure HTTPS (e.g., partial support by the New York Times), far too many do not, much less by default. Our own Deeplinks blog is an exception. Readers can browse through our site without leaving a trail of which pages they viewed that can be easily picked up and stored by other people on the same wireless network or the reader's ISP—which could then be compelled to hand over that information to law enforcement or intelligence agencies like the NSA.
News sites should be given lots of encouragement to switch to HTTPS. But unfortunately, that category of sites faces a major incentive against doing so from Google. Google News, a section of the search engine that specifically searches through news sites, does not index articles available only over HTTPS. Google's decision undermines the privacy of readers who use the service.
Google has told us that it opts not to index HTTPS pages in an effort to exclude "non-news content" like login pages. Excluding HTTPS pages may have seemed like a reasonable technical decision at one point, when HTTPS was mostly confined to pages that required logins. But as the encrypted web continues to become more popular, policies like Google's serve as self-fulfilling prophecies: so long as sites are required to maintain an unencrypted version for News search, news sites are discouraged from taking the best steps for reader privacy because they will also result in less traffic.
Why Encrypt News Sites?
The case for encryption may not be as obvious as for private communications like login credentials, payment information, or personal messages, it can be just as important for users accessing public data like news articles.
That's because HTTPS provides privacy protections beyond just encrypting the content you send back and forth. When you visit a web site that uses HTTPS, only you and the site you're contacting know which individual page you're on. In layman's terms, everything "after the slash"—the information which specifies a particular page on a server—is encrypted in transit.
If, say, The Guardian were to encrypt its site by default, then your ISP, or office sys-admin, or an eavesdropper using the same cafe wifi, would only know that you had visited some pages on The Guardian. They would be unable to distinguish your visits to articles about the royal baby from visits to articles about the latest NSA leaks.1 For now, though, The Guardian is only available under HTTP, meaning that your interest in the royal family or illegal surveillance can be easily discovered.
Another benefit for news sites using HTTPS is that it makes keyword censorship, like the Chinese government's persistent attempts to block all online references to Tiananmen Square, much more difficult. An encrypted connection between the reader and the site means that intermediaries, such as state-run ISPs, can't easily search through the content of articles for blocked terms.
Google Can Do Better on Encryption
Google has demonstrated that it understands the value of protecting users with encrypted connections. When users search with the site, or connect to services like Gmail or Google Calendar, they use HTTPS. In fact, Google News itself uses HTTPS by default. Not only that, but services offered by the company use a critical web privacy feature called "forward secrecy" to keep exchanges private even if the company's own key is compromised in the future. Google has generally been a leader among large companies in encouraging and deploying cryptographic solutions to protect users.
It's frustrating and puzzling, then, that Google fails to offer readers the same level of privacy it considers appropriate for its users elsewhere. On sites where it's an option, users of our HTTPS-Everywhere browser extension already benefit from encryption, but we want to encourage news sites to protect the privacy of all of their users. This Google policy is an unfortunate roadblock in the way of that important goal. Google has told us that this issue, after years of being unresolved, is on their radar. We hope to see a solution soon.
- 1. A sophisticated eavesdropper could infer additional information from data like the size of the encrypted files you've requested. A photo-heavy report, for example, might be much larger than a text article. It's even possible that each article on the site could be identified by its unique size. Still, that sort of traffic analysis requires much more effort.