In the wake of the coronavirus pandemic, many social media platforms shifted their content moderation policies to rely much more heavily on automated tools. Twitter, Facebook and YouTube all ramped up their machine learning capabilities to review and identify flagged content in efforts to ensure the wellbeing of their content moderation teams and the privacy of their users. Most social media companies rely on workers from the so-called global South to review flagged content, usually under precarious working conditions and without adequate protections from the traumatic effects of their work. While the goal to protect workers from being exposed to these dangers while working from home is certainly legitimate, automated content moderation still poses a major risk to the freedom of expression online.

Wary of the negative effects the shift towards more automated content moderation might have on users’ freedom of expression, we called on companies to make sure that this shift would be temporary. We also emphasized the importance of meaningful transparency, notice, and robust appeals processes in these unusual times called for in the Santa Clara Principles.

While human content moderation doesn’t scale and comes with high social costs, it is indispensable. Automated systems are simply not capable of consistently identifying content correctly. Human communication and interactions are complex, and automated tools misunderstand the political, social or interpersonal context of speech all the time. That is why it is crucial that algorithmic content moderation is supervised by human moderators and that users can contest takedowns. As Facebook’s August 2020 transparency report shows, the company’s approach to content moderation during the coronavirus pandemic has been lacking in both human oversight and options for appeals. While the long-term impacts are not clear, we’re highlighting some of the effects of automated content moderation across Facebook and Instagram as detailed in Facebook’s report.

Because this transparency report omits key information, it remains largely impossible to analyze Facebook’s content moderation policies and practices.  The transparency report merely shares information about the broad categories in which deleted content falls, and the raw numbers of taken down, appealed, and restored posts. Facebook does not provide any insights on its definitions of complex phenomena like hate speech or how those definitions are operationalized. Facebook is also silent on the materials with which human and machine content moderators are trained and about the exact relationship between—and oversight of—automated tools and human reviewers.

We will continue to fight for real transparency. Without it there cannot be real accountability.

Inconsistent Policies Across Facebook and Instagram

While Facebook and Instagram are meant to share the same set of content policies, there are some notable differences in their respective sections of the report. The report, which lists data for the last two quarters of 2019 and the first two of 2020, does not consistently report the data on the same categories across the two platforms. Similarly, the granularity of data reported for various categories of content differs depending on platform.

More troubling, however, is what seems to be differences in whether users had access to appeal mechanisms. When content is removed on either Facebook or Instagram, people typically have the option to contest takedown decisions. Typically, when the appeals process is initiated, the deleted material is reviewed by a human moderator and the takedown decision can get reversed and content reinstated. During the pandemic, however, that option has been seriously limited, with users receiving notification that their appeal may not be considered. According to the transparency report, there were zero appeals on Instagram during the second quarter of 2020 and very few on Facebook.

The Impact of Banning User Appeals

While the company also occasionally restores content on its own accord, user appeals usually trigger the vast majority of content that gets reinstated. An example: in Q2, more than 380 thousand posts that allegedly contained terrorist content were removed from Instagram, fewer than in Q1 (440k). While around 8100 takedowns were appealed by users in Q1, that number plummeted to zero in Q2. Now, looking at the number of posts restored, the impact of the lack of user appeals becomes apparent: during the first quarter, 500 pieces of content were restored after an appeal from a user, compared to the 190 posts that were reinstated without an appeal. In Q2, with no appeal system available to users, merely 70 posts of the several hundred thousand posts that allegedly contained terrorist content were restored.

Meanwhile, on Facebook, very different numbers are reported for the same category of content. Facebook acted on 8.7 million pieces of allegedly terrorist content, and of those, 533 thousand were later restored, without having been triggered by a user appeal. In comparison, in Q1, when user appeals were available, Facebook deleted 6.3 million pieces of terrorist content. Of those takedowns, 180.1 thousand were appealed, but even more—199.2 thousand—pieces of content were later restated. In other words, far fewer posts that allegedly contained terrorist content were restored on Instagram where users couldn't appeal takedowns than on Facebook, where appeals were allowed.

Blunt Content Moderation Measures Can Cause Real Harm

Why does this matter? Often, evidence of human rights violations and war crimes gets caught in the net of automated content moderation as algorithms have a hard time differentiating between actual “terrorist” content and efforts to record and archive violent events. This negative impact of automated content detection is disproportionately borne by Muslim and Arab communities. The significant differences in how one company enforces its rules relating to terrorist or violent and extremist content across two platforms highlights how difficult it is to deal with the problem of violent content through automated content moderation alone. At the same time, it also underpins the fact that users can’t expect to get treated consistently across different platforms, which may increase problems of self-censorship.

Another example of the shortcoming of automated content removals: in Q2, Instagram removed around half a million images that it considered to fall into the category of child nudity and sexual exploitation. That is a significantly lower number compared to Q1, when Instagram removed about one million images. While Facebook’s report acknowledges that its automated content moderation tools struggle with some types of content, the effects seem especially apparent in this category of content. While in Q1, many takedowns of alleged child sexual abuse images were successfully appealed by users (16.2 thousand), only 10 pieces of deleted content were restored during the period in which users could not contest takedowns. These discrepancies in content restoration suggest that much more content that has been wrongfully taken down remained deleted, imperiling the freedom of expression of potentially millions of users. They also show the fundamentally important role of appeals to guard users’ fundamental rights and hold companies accountable for their content moderation policies.

The Santa Clara Principles on Transparency and Accountability in Content Moderation—which are currently undergoing an assessment and evaluation process following an open comment period—offer a set of baseline standards that we believe every company should strive to adopt. Most major platforms endorsed the standards in 2019, but just one—Reddit—has implemented them in full.

Facebook has yet to clarify whether its shift towards more automated content moderation is indeed temporary, or here to stay. Regardless, the company must ensure that user appeals will be reinstated. In the meantime, it is crucial that Facebook allow for as much transparency and public oversight as possible.