‘Scraping’ Is Just Automated Access, and Everyone Does It

Share It

For tech lawyers, one of the hottest questions this year is: can companies use the Computer Fraud and Abuse Act (CFAA)—an imprecise and outdated criminal anti-“hacking” statute intended to target computer break-ins—to block their competitors from accessing publicly available information on their websites? The answer to this question has wide-ranging implications for everyone: it could impact the public’s ability to meaningfully access publicly available information on the open web. This will impede investigative journalism and research. And in a world of algorithms and artificial intelligence, lack of access to data is a barrier to product innovation, and blocking access to data means blocking any chance for meaningful competition.

The CFAA was enacted in 1986, when there were only about 2,000 computers connected to the Internet. The law makes it a crime to access a computer connected to the Internet “without authorization” but fails to explain what this means. It was passed with the aim of outlawing computer break-ins, but has since metastasized in some jurisdictions into a tool to enforce computer use policies, like terms of service, which no one reads.

Efforts to use the CFAA to threaten competitors increased in 2016 following the Ninth Circuit’s poorly reasoned Facebook v. Power Ventures decision. The case involved a dispute between Facebook and a social media aggregator, which Facebook users had voluntarily signed up for. Facebook did not want its users engaging with this service, so it sent Power Ventures a cease and desist letter and tried to block Power Ventures’ IP address. The Ninth Circuit found that Power Ventures had violated the CFAA after continuing to provide its services after receipt of the cease and desist letter and having one of its IP address blocked.

After the decision was issued, companies—almost immediately—started citing the case in cease and desist letters, demanding that competitors stop using automated methods to access publicly available information on their websites. Some of these disputes have made their way to court, the most high profile of which is hiQ v. LinkedIn, which involves automated access of publicly available LinkedIn data. As law professor Orin Kerr has explained, posting information on the web and then telling someone they are not authorized to access it is “like publishing a newspaper but then forbidding someone to read it.”

The web is the largest, ever-growing data source on the planet. It’s a critical resource for journalists, academics, businesses, and everyday people alike. But meaningful access sometimes requires the assistance of technology, automating, and expediting an otherwise tedious process of accessing, collecting and analyzing public information. This process of using a computer to automatically load and read the pages of a website for later analysis is often referred to as “web scraping.”[1]

As a technical matter, web scraping is simply machine automated web browsing. There is nothing that can be done with a web scraper that cannot be done by a human with a web browser. And it is important to understand that web scraping is a widely used method of interacting with the content on the web: everyone does it—even (and especially) the companies trying to convince courts to punish others for the same behavior.

Companies use automated web browsing products to gather web data for a wide variety of uses. Some examples from industry include manufacturers tracking the performance ranking of products in the search results of retailer websites, companies monitoring information posted publicly on social media to keep tabs on issues that require customer support, and businesses staying up to date on news stories relevant to their industry across multiple sources. E-commerce businesses use automated web browsing to monitor competitors’ pricing and inventory, and to aggregate information to help manage supply chains. Businesses also use automated web browsers to monitor websites for fraud, perform due diligence checks on their customers and suppliers, and to collect market data to help plan for the future.

These examples are not hypothetical. They come directly from Andrew Fogg, the founder of Import.io, a company that provides software that allows organizations to automatically browse the web, and are based on Import.io’s customers and users. And these examples are not the exception; they are the rule. Gartner recommends that all businesses treat the web as their largest data source and predicts that the ability to compete in the digital economy will depend on the ability to curate and leverage web data. In the words of Gartner VP Doug Laney, “Your company’s biggest database isn’t your . . . internal database. Rather it’s the Web itself.”

Journalists and information aggregators also rely on automated web browsing. The San Francisco Chronicle used automated web browsing to gather data on Airbnb properties in order to assess the impact of Airbnb listings on the San Francisco rental market, and ProPublica used automated web browsing to uncover that Amazon’s pricing algorithm was hiding the best deals from its customers. The Internet Archive’s web crawlers (crawlers are one specialized example of automated web browsing) work to archive as much of the public web as possible for future generations. Indeed Google’s own web crawlers that power the search tool most of us rely on every day are simply web scraping “bots.”

During a recent Ninth Circuit hearing in hiQ v. Linkedin, LinkedIn tried to analogize the case to United States v. Jones, arguing that hiQ’s use of automated tools to access public information is different “in kind” than manually accessing that same information, just as long-term GPS monitoring of someone’s public movements is different from merely observing someone’s public movements.

The only thing that makes hiQ’s access different is that LinkedIn doesn’t like it. LinkedIn itself acknowledges in its privacy policy that it, too, uses automated tools, to “collect public information about you, such as professional-related news and accomplishments” and makes that information available on its own website—unless a user opts out via adjusting their default privacy settings. Question: How does LinkedIn gather that data on their users? Answer: web scraping.

And of course LinkedIn doesn’t like it; it wants to block a competitor’s ability to meaningfully access the information that its users post publicly online. But just because LinkedIn or any other company doesn’t like automated access, that doesn’t mean it should be a crime.

As law professor Michael J. Madison wrote, resolving the debate about the CFAA’s scope “is linked closely to what sort of Internet society has and what sort of Internet society will get in the future.” If courts allow companies to use the CFAA to block automated access by competitors, it will threaten open access to information for everyone.

Some have argued that scraping is what dooms access to public information, because websites will just place their data behind an authentication gate. But it is naïve to think that LinkedIn would put up barriers to access; LinkedIn wants to continue to allow users to make their profiles public so that a web search for a person continues to return a LinkedIn profile among the top results, so that people continue to care about the maintenance of their personal LinkedIn profiles, so that recruiters will continue to pay for access to LinkedIn recruiter products (e.g., specialized search and messaging), and so that companies will continue to pay to post job advertisements on LinkedIn. The default setting for LinkedIn profiles is public for a reason, and LinkedIn wants to keep it that way. It wants to participate in the open web to drive their business but use the CFAA to suppress competitors and avoid accepting the web’s open access norms.

The public is already losing access to information. With the rise of proprietary algorithms and artificial intelligence, both private companies and governments are making high stakes decisions that impact lives with little to no transparency. In this context, it is imperative that courts not take lightly attempts to use the CFAA to limit access to public information on the web.

[1] The term “scraping” comes from a time before APIs, when the only way to build interoperability between computer systems was to “read” the information directly from the screen. Engineers used various terms to describe this technique, including “shredding,” “scraping,” and “reading.” Because the technique was largely only discussed in engineering circles, the choice of terminology was never widely debated. As a result, today many people still use the term “scraping,” instead of something more technically descriptive—like “screen reading” or “web reading.”

An earlier version of this article was first published by the Daily Journal on March 27, 2018.

Related Issues

Coders' Rights Project

Related Updates

Deeplinks Blog by Tierney Hamilton | August 4, 2025

Digital Rights Are Everyone’s Business, and Yours Can Join the Fight!

Companies large and small are doubling down on digital rights, and we’re excited to see more and more of them join EFF. We’re first and always an organization who fights for users, so you might be asking: Why does EFF work with corporate donors, and why do they want...

Deeplinks Blog by Veridiana Alimonti | April 23, 2025

Six Years of Dangerous Misconceptions Targeting Ola Bini and Digital Rights in Ecuador

Ola Bini was first detained in Quito’s airport six years ago, called a “Russian hacker,” and accused of “alleged participation in the crime of assault on the integrity of computer systems.” It wouldn't take long for Ecuadorean authorities to find out that he was Swedish and an ...

Deeplinks Blog by Hayley Tsukayama | November 8, 2024

Celebrating the Life of Aaron Swartz: Aaron Swartz Day 2024

Aaron Swartz was a digital rights champion who believed deeply in keeping the internet open. His life was cut short in 2013, after federal prosecutors charged him under the Computer Fraud and Abuse Act (CFAA) for systematically downloading academic journal articles from the online database JSTOR. Facing the...

Deeplinks Blog by Hannah Zhao, Thorin Klosowski, Andrew Crocker | August 15, 2024

2 Fast 2 Legal: How EFF Helped a Security Researcher During DEF CON 32

Enter the EFF Coders’ Rights Project, designed to help programmers, tinkerers, and innovators who wish to responsibly explore technologies and report on those findings. Our Coders Rights lawyers counsel many of those who reach out to us on anything from mitigating legal risk in their talks, to reporting vulnerabilities they’ve...

Deeplinks Blog by Rory Mir | August 15, 2024

EFF Honored as DEF CON 32 Uber Contributor

At DEF CON 32 this year, the Electronic Frontier Foundation became the first organization to be given the Uber Contributor award. This award recognizes EFF’s work in education and litigation, naming us “Defenders of the Hacker Spirit.”

Deeplinks Blog by Aaron Jue | August 6, 2024

Support Justice for Digital Creators and Tech Users

EFF takes tough stances and tackles complicated problems for tech creators and users because it’s the right thing to do. You can help us defend online privacy and free speech for everyone.

Deeplinks Blog by Aaron Jue | August 6, 2024

EFF at the Las Vegas Hacker Conferences

Las Vegas is blazing hot and that means it's time for EFF to return to the hacker summer camp conferences—BSidesLV, Black Hat USA and DEF CON—to rally behind computer security researchers and tinkerers. Find all of EFF's scheduled talks and activities at the conferences right here.

Deeplinks Blog by Katitza Rodriguez | June 14, 2024

If Not Amended, States Must Reject the Flawed Draft UN Cybercrime Convention Criminalizing Security Research and Certain Journalism Activities

This is the first post in a series highlighting the problems and flaws in the proposed UN Cybercrime Convention. Check out The UN Cybercrime Draft Convention is a Blank Check for Surveillance Abuses. The latest and nearly final version of the proposed UN Cybercrime Convention—dated May 23, 2024...

Deeplinks Blog by Karen Gullo | April 1, 2024

Ola Bini Faces Ecuadorian Prosecutors Seeking to Overturn Acquittal of Cybercrime Charge

Ola Bini, the software developer acquitted last year of cybercrime charges in a unanimous verdict in Ecuador, was back in court last week in Quito as prosecutors, using the same evidence that helped clear him, asked an appeals court to overturn the decision with bogus allegations of unauthorized access...

Deeplinks Blog by Andrew Crocker | February 22, 2024

Is the Justice Department Even Following Its Own Policy in Cybercrime Prosecution of a Journalist?

Following an FBI raid of his home last year, the freelance journalist Tim Burke has been arrested and indicted in connection with an investigation into leaks of unaired footage from Fox News. The raid raised questions about whether Burke was being investigated for First Amendment-protected journalistic activities, and...

Related Issues

Coders' Rights Project

‘Scraping’ Is Just Automated Access, and Everyone Does It

‘Scraping’ Is Just Automated Access, and Everyone Does It

Related Issues

Related Cases

Related Updates

Related Issues

Related cases

Follow EFF:

Contact

About

Issues

Updates

Press

Donate