Despite igniting controversy over ethical lapses and the threat to civil liberties posed by its tattoo recognition experiments the first time around, the National Institute of Standards and Technology (NIST) recently completed its second major project evaluating software designed to reveal who we are and potentially what we believe based on our body art.

Unsurprisingly, these experiments continue to be problematic.

The latest experiment was called Tatt-E, which is short for “Tattoo Recognition Technology Evaluation.” Using tattoo images collected by state and local law enforcement from incarcerated people, NIST tested algorithms created by state-backed Chinese Academy of Sciences and MorphoTrak, a subsidiary of the French corporation Idemia.

According to the Tatt-E results, which were published in October, the best-performing tattoo recognition algorithm by MorphoTrak had 67.9% accuracy in matching separate images of tattoo to each other on the first try.

NIST further tested the algorithms on 10,000 images downloaded from Flickr users by Singaporean researchers, even though it was not part of the original scope of Tatt-E. These showed significantly improved accuracy, as high as 99%.

Tattoo recognition technology is similar to other biometric technologies such as face recognition or iris scanning: an algorithm analyzes an image of a tattoo and attempts to match it to a similar tattoo or image in a database. But unlike other forms of biometrics, tattoos are not only a physical feature but a form of expression, whether it is a cross, a portrait of a family member, or the logo for someone’s favorite band.

Since 2014, the FBI has sponsored NIST’s tattoo recognition project to advance this emerging technology. In 2016, an EFF investigation revealed that NIST had skipped over key ethical oversight processes and privacy protections with its earlier experiments called Tatt-C, which is short for the Tattoo Recognition Challenge. This experiment promoted using tattoo recognition technology to investigate people’s beliefs and memberships, including their religion. The more recent Tatt-E, however, did not test for “tattoo similarity”—the ability to match tattoos that are similar in theme in design, but belong to different people.

A database of images captured from incarcerated people was provided to third parties—including private corporations and academic institutions—with little regard for the privacy implications. After EFF called out NIST, the agency retroactively altered its presentations and reports, including eliminating problematic information and replacing images of inmate tattoos in a “Best Practices” poster with topless photos of a researcher with marker drawn all over his body. The agency also pledged to implement new oversight procedures.

However, transparency is lacking. Last November, EFF filed suit against NIST and the FBI after the agencies failed to provide records in response to our Freedom of Information Act requests. So far the records we have freed have revealed how the FBI is seeking to develop a mobile app that can recognize the meaning of tattoos and the absurd methods NIST use to adjust its “Best Practices” documents.  Yet, our lawsuit continues, as the agency continues to withhold records and redacted much of the documents they have produced.

Tatt-E was the latest set of experiments conducted by NIST. Unlike Tatt-C, which involved 19 entities, only two entities chose to participate in Tatt-E, each of which has foreign ties. Both the Chinese Academy of Sciences and MorphoTrak submitted six algorithms for testing against a dataset of tattoo images provided by the Michigan State Police and the Pinellas County Sheriff’s Office in Florida.

MorphoTrak’s algorithms significantly outperformed the Chinese Academy of Sciences’, which may not be surprising since the company’s software has been used with the Michigan State Police’s tattoo database for more than eight years. Its best algorithm could return a positive match within the first 10 images 72.1% of the time, and that number climbed to 84.8% if researchers cropped the source image down to just the tattoo. The accuracy in the first 10 images increased to 95% if they used the infrared spectrum. In addition, the CAC algorithms performed poorly with tattoos on dark skin, although the skin tone did not make much of a difference for MorphoTrak’s software.

One of the more concerning flaws in the research is that NIST did not document “false positives.” This is when the software says it has matched two tattoos, but the match turns out to be in error. Although this kind of misidentification has been a perpetual problem with face recognition, the researchers felt that it was not useful to the study. In fact, they suggest that false positives a may have “investigative utility in operations.” While they don’t explain exactly what this use case might be, from other documents produced by NIST we can infer they are likely discussing how similar tattoos on different people could establish connections among their wearers.

While Tatt-E was supposed to be limited to images collected by law enforcement, NIST went a step further and used the Nanyang Technological University Tattoo Database, which was compiled from images taken from Flickr users, for further research. With this dataset, the Chinese Academy of Sciences' algorithms performed better, hitting at high as 99.3% accuracy.

No matter the accuracy in identification, tattoo recognition raises serious concerns for our freedoms. As we’ve already seen, improperly interpreted tattoos have been used to brand people as gang members and fast track them for deportation. EFF urges NIST to make Tatt-E its last experiment with this technology.