Research Tests Face Masks’ Effect on Analytics Accuracy
Print Issue: November 2020
No shirt, no shoes, no mask, no service. While many organizations and facilities changed their policies this year, facial recognition software struggled to maintain accuracy amid this new dress code. According to an interagency report from the U.S. National Institute of Standards and Technology (NIST) released in July 2020, while the best commercial facial recognition algorithms had error rates of about 5 percent, most algorithms failed up to 50 percent of the time when asked to match a person’s photo with a version wearing a digitally applied face mask.
NIST has been performing ongoing tests of facial recognition software and algorithms for decades, so this latest test fits in well with prior research, says Mei Ngan, a computer scientist at NIST and one of the authors of the recent report in the face recognition vendor test series, Face recognition accuracy with face masks using pre-COVID-19 algorithms. The NIST test sought to determine whether pre-pandemic software could cope with face masks, and testing showed that most algorithms would require fine-tuning to reach pre-pandemic levels of one-to-one verification—when the software can accurately tell that two images show the same person.
To leverage NIST’s existing data set of images, including U.S. border crossing photos and travel application photos, researchers applied face mask shapes to the photos digitally to see how the algorithms would cope with matching a masked and unmasked person. The researchers used two mask shapes—wide, which mimics a surgical mask, and round, which mimics an N-95 mask—in either black or light blue. They also moved the mask shape up and down to determine how nose coverage would affect verification.
Overall, the researchers tested 89 pre-March 2020 algorithms on 6.2 million photos, and all algorithms gave increased false non-match rates when confronted with masked faces. Using the same data set, the most accurate algorithms fail to authenticate about 0.3 percent of persons without masks, but 5 percent when masks were applied.
“However, many algorithms are much less tolerant: some algorithms that are quite competitive with unmasked faces...fail to authenticate between 20 percent and 50 percent of images,” the report found.
Mask type and coverage made a significant difference in accuracy. Light blue masks caused lower error rates than black masks. Masks that cover more of the face, such as rectangular or surgical masks, gave higher false negative rates, especially when the mask covers most of the nose.
“This all makes sense—the more of the face that you cover, the less information the algorithms have to extract to do matching with,” Ngan says.
“The amount of nose coverage had an impact on accuracy; we saw about a factor of five decrease in error rates in low masks when compared to high masks,” she adds. “Depending on the application and policies of the environment in which the system is deployed, there could be accuracy reasons to have the user pull their mask down to below the nose for authentication.”
The study had limitations, however. Because of the use of digitally applied masks, which were used instead of developing a new mask-centric data set due to time and resource restrictions, researchers could not assess how the algorithms would handle the wide variety of shapes, textures, and patterns on real face masks, or how mass-produced masks would sit on different face shapes. The image subjects were also cooperative, Ngan says, meaning that most of the subjects were looking straight into the camera. That is unlike trying to authenticate people on the move—walking toward a doorway, for example, or from broad surveillance footage—which could challenge accuracy rates.
The researchers also acknowledged that the results apply to algorithms provided to NIST before the pandemic struck, and that these algorithms were developed without the expectation that they would be tested on masked face images. Ngan adds, however, that the research is ongoing, and as more developers submit post-COVID-19 algorithms or updates to be tested, the NIST leaderboard of the most accurate systems continues to change.
“We recently updated our face mask leaderboard (on the NIST website) with results from a number of algorithms that were submitted to us after the pandemic began,” she says. “While we don’t have information on whether they were specifically designed with face coverings in mind, it’s pretty clear based on the results that there are some developers that have been looking at the face mask problem. And for some developers, we have seen significant improvement in accuracy on masked faces compared to pre-pandemic algorithms.”
Ngan notes that developers looking to adapt algorithms to face mask use can extract more data points around the periocular region (around the eyes), which is usually unhampered by mask use. Other developers are retraining systems to account for masked images.
NIST emphasizes the need to evaluate algorithms in the space where they are used to test accuracy in the field, putting the onus on system owners and operators to “know your algorithm.” In addition, security leaders can ask their system providers about updates, and push for solutions that can handle both masked and unmasked images, Ngan adds.
A new report evaluating “mask-enabled” algorithms submitted to NIST in recent months is expected later in 2020.