EFF Legal Intern Haley Amster contributed to this post.
Update: An earlier version of this post said that ExamSoft has had a security breach. For clarity: security breaches have only been alleged by users, and ProctorU, a partner of ExamSoft, has had a breach.
Over the past year, the use of online proctoring apps has skyrocketed. But while companies have seen upwards of a 500% increase in their usage, legitimate concerns about their invasiveness, potential bias, and efficacy are also on the rise. These concerns even led to a U.S. Senate inquiry letter requesting detailed information from three of the top proctoring companies—Proctorio, ProctorU, and ExamSoft—which combined have proctored at least 30 million tests over the course of the pandemic.1 Unfortunately, the companies mostly dismissed the senators’ concerns, in some cases stretching the truth about how the proctoring apps work, and in other cases downplaying the damage this software inflicts on vulnerable students.
In one instance, though, these criticisms seem to have been effective: ProctorU announced in May that it will no longer sell fully-automated proctoring services. This is a good step toward eliminating some of the issues that have concerned EFF with ProctorU and other proctoring apps. The artificial intelligence used by these tools to detect academic dishonesty has been roundly attacked for its bias and accessibility impacts, and the clear evidence that it leads to significant false positives, particularly for vulnerable students. While this is not a complete solution to the problems that online proctoring creates—the surveillance is, after all, the product—we hope other online proctoring companies will also seriously consider the danger that these automated systems present.
The AI Shell Game
This reckoning has been a long time coming. For years, online proctoring companies have played fast and loose when talking about their ability to “automatically” detect cheating. On the one hand, they’ve advertised their ability to “flag cheating” with artificial intelligence: ProctorU has claimed to offer “fully automated online proctoring”; Proctorio has touted the automated “suspicion ratings” it assigns test takers; and ExamSoft has claimed to use “Advanced A.I. software” to “detect abnormal student behavior that may signal academic dishonesty.” On the other hand, they’ve all been quick to downplay their use of automation, claiming that they don’t make any final decisions—educators do—and pointing out that their more expensive options include live proctors during exams or video review by a company employee afterward, if you really want top-tier service.
Proctoring companies must admit that their products are flawed, and schools must offer students due process and routes for appeal when these tools flag them, regardless of what software is used to make the allegations.
Nowhere was this doublespeak more apparent than in their recent responses to the Senate inquiry. ProctorU “primarily uses human proctoring – live, trained proctors – to assist test-takers throughout a test and monitor the test environment,” the company claimed. Despite this, it has offered an array of automated features for years, such as their entry-level “Record+” which (until now) didn’t rely on human proctors. Proctorio’s “most popular product offering, Automated Proctoring...records raw evidence of potentially-suspicious activity that may indicate breaches in exam integrity.” But don’t worry: “exam administrators have the ability and obligation to independently analyze the data and determine whether an exam integrity violation has occurred and whether or how to respond to it. Our software does not make inaccurate determinations about violations of exam integrity because our software does not make any determinations about breaches of exam integrity.” According to Proctorio’s FAQ, “Proctorio’s software does not perform any type of algorithmic decision making, such as determining if a breach of exam integrity has occurred. All decisions regarding exam integrity are left up to the exam administrator or institution” [emphasis Proctorio’s].
But this blame-shifting has always rung false. Companies can’t both advertise the efficacy of their cheating-detection tools when it suits them, and dodge critics by claiming that the schools are to blame for any problems.
And now, we’ve got receipts: in a telling statistic released by ProctorU in its announcement of the end of its AI-only service, “research by the company has found that only about 10 percent of faculty members review the video” for students who are flagged by the automated tools. (A separate University of Iowa audit they mention found similar results—only 14 percent of faculty members were analyzing the results they received from Proctorio.) This is critical data for understanding why the blame-shifting argument must be seen for what it is: nonsense. “[I]t's unreasonable and unfair if faculty members" are punishing students based on the automated results without also looking at the videos, says a ProctorU spokesperson—but that’s clearly what has been happening, perhaps the majority of the time, resulting in students being punished based on entirely false, automated allegations. This is just one of the many reasons why proctoring companies must admit that their products are flawed, and schools must offer students due process and routes for appeal when these tools flag them, regardless of what software is used to make the allegations.
We are glad to see that ProctorU is ending AI-only proctoring, but it’s disappointing that it took years of offering an automated service—and causing massive distress to students—before doing so. We’ve also yet to see how ProctorU will limit the other harms that the tools cause, from facial recognition bias to data privacy leaks. But this is a good—and important—way for ProctorU to walk the talk after it admitted to the Senate that “humans are simply better than machines alone at identifying intentional misconduct.”
Human Review Leaves Unanswered Questions
Human proctoring isn’t perfect either. It has been criticized for its invasiveness, and for creating an uncomfortable power dynamic where students are surveilled by a stranger in their own homes. And simply requiring human review doesn’t mean students won’t be falsely accused: ExamSoft told the Senate that it relies primarily on human proctors, claiming that video is “reviewed by the proctoring partner’s virtual proctors—trained human invigilators [exam reviewers]—who also flag anomalies,” and that “discrepancies in the findings are reviewed by a second human reviewer,” after which a report is provided to the institution for “final review and determination.”
But that’s the same ExamSoft that proctored the California Bar Exam, in which over one-third of examinees were flagged (over 3,000). After further review, 98% of those flagged were cleared of misconduct, and only 47 test-takers were implicated. Why, if ExamSoft’s human reviewers carefully examined each potential flag, do the results in this case indicate that nearly all of their flags were still false? If the California Bar hadn’t carefully reviewed these allegations, the already-troubling situation, which included significant technical issues such as crashes and problems logging into the site, last-minute updates to instructions, and lengthy tech support wait times, would have been much worse. (Last month, a state auditor’s report revealed that the California State Bar violated state policy when it awarded ExamSoft a new five-year, $4 million contract without evaluating whether it would receive the best value for the money. One has to wonder what, exactly, ExamSoft is offering that’s worth $4 million given this high false-positive rate.)
Unfortunately, additional human review may simply result in teachers and administrators ignoring even more potential false flags, as they further trust the companies to make the decisions for them. We must carefully scrutinize the danger to students whenever schools outsource academic responsibilities to third-party tools, algorithmic or otherwise.
It’s well past time for online proctoring companies to be honest with their users. Each company should release statistics on how many videos are reviewed by humans, at schools or in-house, as well as how many flags are dismissed in each portion of review. This aggregate data would be a first step to understanding the impact of these tools. And the Senate and the Federal Trade Commission should follow up on the claims these companies made in their responses to the senators’ inquiry, which are full of weasel words, misleading descriptions, and other inconsistencies. We’ve outlined our concerns per company below.
- ExamSoft claimed in its response to the Senate that it doesn’t monitor students’ physical environments. But it does keep a recording of your webcam (audio and visual) the entire time you’re being proctored. This recording, with integrated artificial intelligence software, detects, among other things, “student activity” and “background noise.” That sure sounds like environmental monitoring to us.
- ExamSoft omitted from its Senate letter that there have been alleged data security issues. The company’s partner, ProctorU, had a data breach.
- ExamSoft continues to use automated flagging, and conspicuously did not mention disabilities that would lead students to be flagged for cheating, such as stimming. This has already caused a lot of issues for exam-takers with diabetes who have had restrictions on their food availability and insulin use, and have been basically told that a behavior flag is unavoidable.
- The company also claimed that their facial recognition system still allows an exam-taker to proceed with examinations even when there is an issue with identity verification—but users report significant issues with the system recognizing them causing delays and other issues with their exams.
- ProctorU claimed in its response to the Senate that it “prioritizes providing unbiased services,” and its “experienced and trained proctors can distinguish between behavior related to ‘disabilities, muscle conditions, or other traits’” compared with “unusual behavior that may be an attempt to circumvent test rules.” The company does not explain the training proctors receive to make these determinations, or how users can ensure that they are treated fairly when they have concerns about accommodations.
- ProctorU also claims to have received fewer than fifteen complaints related to issues with their facial recognition technology, and claims that it has found no evidence of bias in the facial comparison process it uses to authenticate test-taker identity. This is, to put it mildly, very unlikely.
- ProctorU is currently being sued for violating the Illinois Biometric Information Privacy Act (BIPA), after a data breach affected nearly 500,000 users. The company failed to mention this breach in its response, and while it claims its video files are only kept for up to two years, the lawsuit contends that biometric data from the breach dated back to 2012. There is simply no reason to hold onto biometric data for two years, let alone that eight.
- Aware of face recognition’s well-documented bias, Proctorio has gone out of its way to claim that it doesn’t use it. While this is good news for privacy, it doesn’t negate concerns about bias. The company still uses automation to determine whether a face is in view during exams—what it calls facial detection—which may not compare an exam taker to previous pictures for identification, but still requires, obviously, the ability for the software to match a face in view to an algorithmic model for what a face looks like at various angles. A software researcher has shown that the facial detection model that the company is using “fails to recognize Black faces more than 50 percent of the time.” Separately, Proctorio is facing a lawsuit for misusing the Digital Millennium Copyright Act (DMCA) to force down posts by another security researcher who used snippets of the software’s code in critical commentary online. The company must be more open to criticisms of its automation, and more transparent about its flaws.
- In its response to the Senate, the company claimed that it has “not verified a single instance in which test monitoring was less accurate for a student based on any religious dress, like headscarves they may be wearing, skin tone, gender, hairstyle, or other physical characteristics.” Tell that to the schools who have canceled their contracts due to bias and accessibility issues.
- Lastly, Proctorio continues to promote their automated flagging tools, while dismissing complaints of false-positives by shifting the blame over to schools. As with other online proctoring companies, Proctorio should release statistics on how many videos are reviewed by humans, at schools or in-house, as well as how many flags are dismissed as a result.