November 17, 2009 | By Fred von Lohmann

Google Books Settlement 2.0: Evaluating Access

This is the second in a series of posts about the proposed Google Book Search settlement.

The Potential Upside: Enhanced Public Access

From the public's point of view, unprecedented public access to books is the chief benefit promised by the revised proposed settlement (aka Settlement 2.0) of the Google Book Search litigation. That's the "upside" against which all the possible "down-sides" will be measured. And when it comes to enhancing public access, the proposed settlement holds great promise. Whether that promise will actually come to pass, however, is harder to predict.

Here's what we know about Google's book scanning efforts so far [revised in light of updated numbers sent by Google Nov. 19]:

  • Google has already scanned more than 12 million books (for comparison, U.S. libraries hold an estimated 42 million titles total).
  • Roughly 50% are in languages other than English, with more than 100 languages represented. (In the revised settlement proposal, however, the parties have tried to exclude most books published in countries other than the US, UK, Australia, or Canada, so some non-English language books may now be excluded.)
  • 2 million are clearly in the public domain (i.e., published pre-1923, government works, etc).
  • 2 million have been scanned with the explicit permission of copyright owners as part of Google's partner program.
  • That leaves ~7 million scanned volumes that are potentially the subject of the copyright lawsuits and the proposed settlement (given the low rate of copyright renewals for works published between 1923-1963, it is likely that a substantial portion of these 7 million volumes may actually be in the public domain, in which case they would fall outside the settlement).

So how much access will the public have to the scanned books that fall within the scope of the settlement (that's the ~7m already scanned, as well as millions more Google will be scanning in the future)? The answer will vary based on their copyright status, what services Google implements, and the expressed wishes of copyright owners:

Out-of-print, in-copyright books: For these books—principally out-of-print books published after 1923—the settlement envisions Google providing access through four principal mechanisms:

  • "Preview Uses" (show up to 20% of the book, for free, in response to search queries);
  • "Consumer Purchase" (permanent, full-text, online access on a book-by-book basis for a fee);
  • "Institutional Subscription" ("all-you-can-eat" full-text online access on a blanket basis through an institution); and
  • "Public Access Service" (at least one free public terminal for public libraries).

All of these "Display Uses" will be enabled by default under the settlement agreement for out-of-print, in-copyright books. This is just a default, however; copyright owners are entitled to change the default by electing to "Remove" or "Exclude" their books from any or all of the Display Uses. Of course, where unclaimed works (books whose copyright owners cannot be located or have not bothered to sign up with the Registry) are concerned, the default will effectively be the rule, which is a good thing for public access to these works.

In-print, in-copyright books: By default "Display Uses" will not be permitted for these books. In other words, if Google scans these books, they will go into the database corpus, but will not be available for Preview, Consumer Purchase, or Institutional Subscription, unless the copyright owner chooses to enable one or more of those uses. In short, no public access unless the copyright owner chooses to allow it.

Google Partner Program books: Under the settlement, copyright owners of both in-print and out-of-print books can elect to pull their books out of Google's database corpus, choosing instead to negotiate a different deal in the Google Partner program, which gives the copyright owner more flexibility to define exactly how the book can be accessed. Some observers anticipate that many, perhaps most, major publishers will take this option and remove their works from the products and services described in the settlement.

The Potential: Unprecedented Online Access

Taken together, these features mean that the Google Books project could potentially provide Americans (and only Americans, as the settlement only authorizes Google to offer Display Uses of in-copyright books to U.S. Internet users) with unprecedented instant access to a large collection of books that previously were available only in research university libraries. In particular, like the Internet before it, Google Books could make specialized resources available to people who otherwise might never be able to access them (see, e.g., Google's agreements to digitize U. of Wisconsin's Native American collection and U. of Texas' Benson Latin American collection).

In addition to enabling search and reading, the products and services envisioned by the settlement could also unleash innovative, transformative new uses for the information inside these books. For example, the availability of all these readily citable books could radically expand and transform Wikipedia, which places a premium on citations to neutral sources to validate edits to its pages. Once every Wikipedian can do full-text searches against the research collections of major university libraries, Wikipedia should see a huge expansion of cited contributions.

The proposed settlement also offers the promise of unprecedented access for the visually impaired. The proposed settlement commits Google to offering screen enlargement, read-aloud, and Braille displays ("Accommodated Service") for the Institutional Subscription product. As the National Federation for the Blind and a coalition of other disability rights groups have pointed out, this will make a "historically unprecedented" number of books accessible to the visually impaired.

In addition, under the terms of the settlement, Google may make two copies of the scanned books database ("Research Corpus") available through university libraries for "nonconsumptive" research (i.e., you can use it to develop your new OCR algorithms, but not to extract and compile every paragraph that mentions zombies to create a "Zombies Through The Ages" book). Although use of the Research Corpus will be subject to a number of restrictions that have drawn fire from academics, the creation of such "Research Corpus" would nevertheless be an important step forward for access. Programmatic access to a large database of books is likely to open new avenues of scholarly inquiry and unleash new innovations, including better search algorithms, optical character recognition techniques, automated language translation breakthrus, and other uses that we haven't yet imagined.

The Uncertainty: Empty Promises, Empty Shelves?

But the promise of what the settlement might accomplish is no guarantee of ultimate results.

First, under the settlement copyright owners can pull their books (see Section 3.5, "Right to Remove or Exclude") out of all the products and services envisioned by the settlement, including full-text search and limited "snippet view" access. This is essentially the "take the money and run" option—the copyright owner collects a per-book payment from Google for books already scanned, but then the public gets no online access to these books unless and until the copyright owners negotiate new deals with Google or other online providers. This effectively gives copyright owners a unilateral right to trump fair use, essentially "unpublishing" their books online. Some observers expect that most major publishers will opt to "take the money and run" for both their in-print and out-of-print titles, leaving gaping holes on the virtual shelves of Google Books. If this takes place, then the settlement would only foster access to orphan and unclaimed works. Still good, but far short of full access to every book in the University of Michigan library.

Second, Google is not required to offer all the products and services envisioned in the settlement. The settlement only compels Google to offer the following within 5 years (see Sections 3.7(a), 7.2(e)(i), 7.2(g)(ii)(1)):

  • Consumer Purchase (not clear what percentage of the scanned books must be made available)
  • Institutional Subscription for Higher Education, including Accommodated Service (for at least 85% of books scanned)
  • Public Access Service (for at least 85% of books scanned)
  • free search services (including Snippet View and Preview, for at least 85% of books scanned)
  • Library links that will help you find a library with hard copy (for at least 85% of books scanned)

Notably absent from this list is the Research Corpus described above (in side agreements with its library partners, however, Google has made monetary commitments toward building the Research Corpus). And if Google never gets more than 85% of eligible books online, that would represent still more gaps on the virtual shelves.

Third, the public gets only the kinds of access that Google makes available, only through interfaces that Google chooses to expose. And while this level of access is certainly preferable to no access at all, the "One Interface to Rule Them All" approach is likely to impede innovation, which ultimately means less access. It would be preferable if others had access to the underlying book scans, just as Google had access to the World Wide Web when it built its own search engine. (Google will protest that it spent the money to make the scans, and it's unfair to allow competitors to free-ride on its scanning investment. We already posted our answer to that objection.)

And Don't Forget the Down-Sides

So while the settlement does offer the exciting promise for drastically increasing public access to books, it is hard to predict whether that promise will be fulfilled. And even if the promise of access were fulfilled, there are other down-sides to the settlement, which we will take up in our next posts.

