Two weeks ago, the Mexican newspaper El Milenio reported on a U.S. Department of Homeland Security (DHS) Office of Operations Coordination and Planning (OPC) initiative to monitor social media sites, blogs, and forums throughout the world. The document discloses how OPC’s National Operations Center (NOC) plans to initiate systematic monitoring of publicly available online data including “information posted by individual account users” on social media.

The NOC monitors, collects and fuses information from a variety of sources to provide a “real-time snap shot of the [U.S.] nation’s threat environment at any moment.” The NOC also coordinates information sharing to “help deter, detect, and prevent terrorist acts and to manage [U.S.] domestic incidents.” The NOC has initiated systemic monitoring of publicly available, user-generated data to follow real-time developments in U.S. crisis activities such as natural disasters as well as to help corroborate data received through official sources with ‘on-the-ground’ input.

The monitoring program appears to have its basis in a similar program used by NOC in its Haitian disaster relief efforts, where information from social media sources provided a vital source of real-time input that assisted NOC’s response, recovery and rebuilding efforts surrounding the 2009 earthquake. The new initiative attempts to leverage similar information sources in assessing and responding to a broader range of crisis activities, including terrorism, cybersecurity, nuclear and other disasters, health epidemics, domestic security, and border threats. While the addition of real-time social media sources can be extremely beneficial in disaster relief-type efforts, the breadth of activities covered by the initiative as well as the keywords and websites scheduled for systemic monitoring raise potential concerns, and the safeguards put in place by the initiative may not be sufficient to address these.

The NOC report entitled, “Privacy Impact Assessment of Public Available Social Media Monitoring and Situational Awareness Initiative”, reveals that NOC’s team of data miners are gathering, storing, analyzing, and sharing “de-identified” online information. The sources of information are “members of the public...first responders, press, volunteers, and others” who provide online publicly available information. To collect the information, the NOC monitors search terms such as “United Nations”, “law enforcement”, “anthrax”, “Mexico”, “Calderon”, “Colombia”, “marijuana”, “drug war”, “illegal immigrants”, “Yemen”, “pirates”, “tsunami”, “earthquake”, “airport”, “body scanner”, “hacker”, “DDOS”, “cybersecurity”, "2600" and “social media”. The report also contains a list of sites targeted for monitoring, including numerous blogs and news sites, as well as Wikileaks, Technorati, Global Voices Online, Facebook and Twitter. As the report was released in January 2011, this monitoring may already be taking place.

While the monitoring envisioned by the report is broad in scope, the initiative includes a number of safeguards that attempt to address privacy concerns. But these safeguards do not go far enough. Furthermore, while the NOC is attempting to limit the circumstances under which agents are permitted to collect or disclose personal data, these limitations only apply to DHS agents operating under this specific initiative. DHS “may use social media for other purposes including...law enforcement, intelligence, and other operations...” Other U.S. government agencies and initiatives have different rules and regulations that are subject to change.

With respect to the safeguards, NOC agents on social networks are prohibited from “post[ing] information, actively seek[ing] to connect..., accept[ing]... invitations to connect, or interact[ing] with others” including, presumably, responding to messages sent by other users. It is not clear, however, that this prohibition is sustainable in light of the NOC's objective. For example, NOC agents are authorized to “establish user names and passwords to form profiles and follow relevant government, media, and subject matter experts on social media sites.” Social networking sites are premised on the concept of “interacting with others.” Distinctions such as ‘following’ a user on Twitter and ‘connecting’ with such a user are not clear-cut.

Genuine attempts are being made to limit monitoring to publicly available information while excluding private sources. For example, agents may be prohibited from collecting information found on Facebook profiles which are restricted to “friends only.” However, problems may arise with respect to more ambiguous “semi-public” spaces that are emerging in many online venues. If NOC agents are authorized to “follow” a user on Twitter, are they allowed to “friend” a Facebook (or Google+) user whose profile contains purely public “relevant government, media, and subject matter”? What about information posted by other people following that user under the extended “friends of friends” setting? The NOC initiative may find it difficult to navigate such distinctions.

Monitoring of purely public online information to assess situational threats can also lead to abuse. During the G20 meeting in Toronto, Canada, police monitoring of real-time on the ground social media interactions was used to locate and arrest large numbers of peaceful protesters. As noted by Constable Drummond, a law enforcement agent deeply involved in Canadian G20 social media surveillance efforts:

“...people have a tendency to have tunnel vision when posting things on sites, feeling faceless and untraceable. It is with those postings that we were able to use our talent and use the information posted to our advantage. It allowed our officers to monitor public sites that protestors were using to share information.”

In the lead up to G20 in Pittsburgh, two individuals were arrested for broadcasting police positions on twitter in an attempt to help peaceful protesters. In the UK, Paul Chambers, a 27-year-old accountant, was convicted of “menacing” for posting a joke on his twitter feed which was taken by government agents to be an airport security threat. As Chambers used the NOC listed search term ‘airport’ in his joke, it may have come to NOC’s attention had it been tweeted in the U.S.

The report reminds individuals that if they do not want the NOC to collect their public data, they should not make it public in the first place: “[a]ny information posted publicly can be used by the NOC.” It places the responsibility of protecting privacy on end users, stating that “primary account holder[s] should be able to redress any [privacy] concerns through the third party social media service [and] should consult the privacy policies of the services they subscribe to for more information.” Moreover, DHS considers publication of the report as sufficient ‘notice’ to users that their public data may be monitored.

Unfortunately, following these policies is not as simple as it seems. Studies have shown that privacy policies are “hard to read” and are “read infrequently”, and even educated Facebook users who were concerned about privacy had trouble limiting data sharing with third parties. Moreover, they are nearly always subject to change. Facebook’s privacy policies have morphed continuously over the years, and have eroded privacy by making previously private information publicly available to everyone. Due to constantly shifting privacy settings, it is not clear that the NOC's definition of ‘public' and 'private’ align with user expectations.

Once NOC has identified useful raw online data for the DHS, attempts are made to “extract only the pertinent, authorized information and put it into a specific web application.” The report explicitly emphasizes that the data extracted from the raw information is to be “free of personal identifiable information”, and efforts are made to carry out this objective. The report claims that if personal data is collected beyond what is authorized, the NOC will immediately redact this information. This “de-identified” information will be shared with federal and state governments when “appropriate”, as well as with the private sector and foreign governments as “otherwise authorized by law.”

This raises concerns, however, as there is significant research (read here, here, here, and here) demonstrating that de-identification is not always effective. With enough information, individuals can often be “re-identified” through complex computational systems. The details of the actual techniques of the de-identification process deserve broader debate that is open to public scrutiny.

This newly discovered initiative is part of a broader trend of monitoring and using online information in various investigative contexts. What should users both inside and outside the US learn from these discoveries? As always Internet users should certainly think carefully before posting information about themselves on public sites and remember that privacy policies are constantly subject to change. Not only do we know that the government is watching, we have some clues as to how it is doing it.