This blog post is part of a series, looking at the public interest internet—the parts of the internet that don’t garner the headlines of Facebook or Google, but quietly provide public goods and useful services without requiring the scale or the business practices of the tech giants. Read our earlier installments.

Last time, we saw how much of the early internet’s content was created by its users—and subsequently purchased by tech companies. By capturing and monopolizing this early data, these companies were able to monetize and scale this work faster than the network of volunteers that first created it for use by everybody. It’s a pattern that has happened many times in the network’s history: call it the enclosure of the digital commons. Despite this familiar story, the older public interest internet has continued to survive side-by-side with the tech giants it spawned: unlikely and unwilling to pull in the big investment dollars that could lead to accelerated growth, but also tough enough to persist in its own ecosystem. Some of these projects you’ve heard of—Wikipedia, or the GNU free software project, for instance. Some, because they fill smaller niches and aren’t visible to the average Internet user, are less well-known. The public interest internet fills the spaces between tech giants like dark matter; invisibly holding the whole digital universe together.

Sometimes, the story of a project’s switch to the commercial model is better known than its continuing existence in the public interest space. The notorious example in our third post was the commercialization of the publicly-built CD Database (CDDB): when a commercial offshoot of this free, user-built database, Gracenote, locked down access, forks like freedb and gnudb continued to offer the service free to its audience of participating CD users.

Gracenote’s co-founder, Steve Scherf, claimed that without commercial investment, CDDB’s free alternatives were doomed to “stagnation”. While alternatives like gnudb have survived, it’s hard to argue that either freedb or gnudb have innovated beyond their original goal of providing and collecting CD track listings. Then again, that’s exactly what they set out to do, and they’ve done it admirably for decades since.

But can innovation and growth take place within the public interest internet? CDDB’s commercialization parlayed its initial market into a variety of other music-based offerings. Their development of these products led to them being purchased, at various points, by AV manufacturer Escient, Sony, Tribune Media, and most recently, Nielsen. Each sale made money for its investors. Can a free alternative likewise build on its beginnings, instead of just preserving them for its original users?

MusicBrainz, a Community-Driven Alternative to Gracenote

Among the CDDB users who were thrown by its switch to a closed system in the 1990s, was Robert Kaye. Kaye was a music lover and, at the time, a coder working on one of the earliest MP3 encoders and players at Xing. Now he and a small staff work full-time on MusicBrainz, a community-driven alternative to Gracenote. (Disclosure: EFF special advisor Cory Doctorow is on the board of MetaBrainz, the non-profit that oversees MusicBrainz).

“We were using CDDB in our service,” he told me from his home in Barcelona. “Then one day, we received a notice that said you guys need to show our [Escient, CDDB’s first commercial owner] logo when a CD is looked up. This immediately screwed over blind users who were using a text interface of another open source CD player that couldn’t comply with the requirement. And it pissed me off because I’d typed in a hundred or so CDs into that database… so that was my impetus to start the CD index, which was the precursor to MusicBrainz.”

Over two decades after the user rebellion that created it, MusicBrainz continues to tick along

MusicBrainz has continued ever since to offer a CDDB-compatible CD metadata database, free for anyone to use. The bulk of its user-contributed data has been put into the public domain, and supplementary data—such as extra tags added by volunteers—is provided under a non-commercial, attribution license. 

Over time, MusicBrainz has expanded by creating other publicly available, free-to-use databases of music data, often as a fallback for when other projects commercialize and lock down. For instance, Audioscrobbler was an independent system that collected information on what music you’ve listened to (no matter on what platform you heard it), to learn and provide recommendations based on its users’ contributions, but under your control. It was merged into Last.fm, an early Spotify-like streaming service, which was then sold to CBS. When CBS seemed to be neglecting the “scrobbling” community, MusicBrainz created ListenBrainz, which re-implemented features that had been lost over time. The plan, says Kaye, is to create a similarly independent recommendation system. 

While the new giants of Internet music—Spotify, Apple Music, Amazon—have been building closed machine-learning models to data-mine their users, and their musical interests, MusicBrainz has been working in the open with Barcelona's Pompeu Fabra University to derive new metadata from the MusicBrainz communities’ contributions. Automatic deductions of genre, mood, beats-per-minute and other information are added to the AcousticBrainz database for everyone to use. These algorithms learn from their contributors’ corrections, and the fixes they provide are added to the commonwealth of public data for everyone to benefit from.

MusicBrainz’ aspirations sound in synchrony with the early hopes of the Internet, and after twenty years, they appear to have proven the Internet can support and expand a long-term public good, as opposed to a proprietary, venture capital-driven growth model. But what’s to stop the organization from going the same way as those other projects with their lofty goals? Kaye works full-time on MusicBrainz along with eight other employees: what’s to say that they’re not exclusively profiteering from the wider unpaid community in the same way as larger companies like Google benefit from their users’ contributions?

MusicBrainz has some good old-fashioned pre-Internet institutional protections. It is managed as a 501(c) non-profit, the MetaBrainz Foundation, which places some theoretical constraints on how it might be bought out. Another old Internet value is radical transparency, and the organization has that in spades. All of its financial transactions, from profit and loss sheets to employment costs, to its server outlay and board meeting notes are published online.

Another factor, says Kaye, is keeping a clear delineation between the work done by MusicBrainz’s paid staff and the work of the MusicBrainz volunteer community. “My team should work on the things that aren’t fun to work on. The volunteers work on the fun things,” he says. When you're running a large web service built on the contributions of a community, there’s no end of volunteers for interesting projects, but, as Kaye notes, “there's an awful lot of things that are simply not fun, right? Our team is focused on doing these things.” It helps that MetaBrainz, the foundation, hires almost exclusively from long-term MusicBrainz community members.

Perhaps MusicBrainz’s biggest defense against its own decline is the software (and data) licenses it uses for its databases and services. In the event of the organizations’ separation from the desires of its community, all its composition and output—its digital assets, the institutional history—are laid out so that the community can clone its structure, and create another, near-identical, institution closer to its needs. The code is open source; the data is free to use; the radical transparency of the financial structures means that the organization itself can be reconstructed from scratch if need be.

Such forks are painful. Anyone who has recently watched the volunteer staff and community of Freenode, the distributed Internet Relay Chat (IRC) network, part ways with the network’s owner and start again at Libera.chat, will have seen this. Forks can be divisive in a community, and can be reputationally devastating to those who are abandoned by the community they claimed to lead and represent. MusicBrainz staff’s livelihood depends on its users in a way that even the most commercially sensitive corporation does not. 

It’s unlikely that a company would place its future viability so directly in the hands of its users. But it’s this self-imposed sword of Damocles hanging over Rob Kaye and his staff’s heads that fuels the communities’ trust in their intentions.

Where Does the Money Come From?

Open licenses, however, can also make it harder for projects to gather funding to persist. Where does MusicBrainz' money come from? If anyone can use their database for free, why don’t all their potential revenue sources do just that, free-riding off the community without ever paying back? Why doesn’t a commercial company reproduce what MusicBrainz does, using the same resources that a community would use to fork the project?

MusicBrainz’s open finances show that, despite those generous licenses, they’re doing fine. The project’s transparency lets us see that it brought in around $400K in revenue in 2020, and had $400K in costs (it experienced a slight loss, but other years have been profitable enough to make this a minor blip). The revenue comes as a combination of small donors and larger sponsors, including giants like Google, who use MusicBrainz’ data and pay for a support contract.

Given that those sponsors could free-ride, how does Kaye get them to pay? He has some unorthodox strategies (most famously, sending a cake to Amazon to get them to honor a three-year-old invoice), but the most common reason seems to be that an open database maintainer that is responsive to a wider community is also easier for commercial concerns to interface with, both technically and contractually. Technologists building out a music tool or service turn to MusicBrainz for the same reason as they might pick an open source project: it’s just easier to slot it into their system without having to jump through authentication hoops or begin negotiations with a sales team. Then, when a company forms around that initial hack, its executives eventually realize that they now have a real dependency on a project with whom they have no contractual or financial relationship. A support contract means that they have someone to call up if it goes down; a financial relationship means that it’s less likely to disappear tomorrow.

If Sony had used MusicBrainz’ data, they would have been able to carry on regardless

Again, commercial alternatives may make the same offer, but while a public interest non-profit like MusicBrainz might vanish if it fails its community, or simply runs out of money, those other private companies may well have other reasons to exit their commitments with their customers. When Sony bought Gracenote, it was presumably partly so that they could support their products that used Gracenote’s databases. After Sony sold Gracenote, they ended up terminating their own use of the databases. Sony announced to their valued customers in 2019 that Sony Blu-Ray and Home Theater products would no longer have CD and DVD recognition features. The same thing happened to Sony’s mobile Music app in 2020, which stopped being able to recognize CDs when it was cut off from Gracenote’s service. We can have no insight into these closed, commercial deals, but we can presume that Sony and Gracenote’s new owner could not come to an amicable agreement. 

By contrast, if Sony had used MusicBrainz’ data, they would have been able to carry on regardless. They’d be assured that no competitor would buy out MusicBrainz from under them, or lock their products out of an advertised feature. And even if MusicBrainz the non-profit died, there would be a much better chance that an API-compatible alternative would spring up from the ashes. If it was that important, Sony could have supported the community directly. As it is, Sony paid $260 million for Gracenote. For their CD services, at least, they could have had a more stable service deal with MusicBrainz for $1500 a month.

Over two decades after the user rebellion that created it, MusicBrainz continues to tick along. Its staff is drawn from music fans around the world, and meets up every year with a conference paid for by the MusicBrainz Foundation. Its contributors know that they can always depend on its data staying free; its paying customers know that they can always depend on its data being usable in their products. MusicBrainz staff can be assured that they won’t be bought up by big tech, and they can see the budget that they have to work with.

It’s not perfect. A transparent non-profit that aspires to internet values can be as flawed as any other. MusicBrainz suffered a reputational hit last year when personal data leaked from its website, for instance. But by continuing to exist, even with such mistakes, and despite multiple economic downturns, it demonstrates that a non-profit, dedicated to the public interest, can thrive without stagnating, or selling its users out.

But, but, but. While it’s good to know public interest services are successful in niche territories like music recognition, what about the parts of the digital world that really seem to need a more democratic, decentralized alternative—and yet notoriously lack them? Sites like Facebook, Twitter, and Google have not only built their empires from others’ data, they have locked their customers in, apparently with no escape. Could an alternative, public interest social network be possible? And what would that look like?

We'll cover these in a later part of our series. (For a sneak preview, check out the recorded discussions at “Reimagining the Internet”, from our friends at the Knight First Amendment Institute at Columbia University and the Initiative on Digital Public Infrastructure at the University of Massachusetts, Amherst, which explore in-depth many of the topics we’ve discussed here.)

This is the fourth post in our blog series on the public interest internet. Read more in the series: