Beyond Classification: An Apartheid-Informed Approach to Combating Facial Recognition Bias
My life experience has been one of resistance to being pigeon-holed.
On the phone, my British accent paints one picture of what I might look like. But in Britain, I was classified as Asian. Rarely would a new acquaintance correctly identify my birthplace as Africa. Now, I live in America. It’s complex. I confess that I relish messing with classifiers.
Shortly after we first arrived in the United States, my wife—who would be categorized as white—went to pick up one of our boys from school for the first time. The administration would not let her leave with our son, however, because they said he could not possibly be her child because the color of her skin did not match his.
Humans have evolved shortcuts. Classification is a behavioral expedient tied to our fight-or-flight mechanism. We make snap judgements because we don’t have time for an exhaustive determination of whether someone is a friend or a foe. But our snap judgement, when we see a face, varies in accuracy because of bias—a measure of the variation in accuracy across a range of faces by gender, age, and skin tone among others.
We’ve built artificial intelligence (AI) systems to help us make decisions, perform tasks, and understand the world. We ask these systems to make judgements on our behalf all the time: Is this person at the door a threat? Is it safe for my self-driving car to proceed? Is this person creditworthy?
Bias is human, but we do not want our algorithms to have it. In fact, my hope is that we can build our AI to do better than we as humans are capable of.
A Lifetime of Classification
Born in South Africa during Apartheid, I experienced being classified into a population group. Judgement based on skin color was enshrined in the laws of the land—it determined where we could live, which schools we could attend, whom we could marry, and whether we could vote.
Imagine having your worth determined from a single impression. Everything about you is reduced to a single metric, including your social status, your residency, your domestic rights, and so much more. On a personal level, it’s hurtful. On a systemic level, it’s oppression.
Classification is quite different when self-declared. In the United States especially, it can be a proclamation of pride, heritage, and cultural identity when people assert: “I’m an Italian-American,” or “I’m Asian-American.” It’s far more disturbing to have someone else tell you who you are. Perhaps my hypersensitivity is born from being subjected to Apartheid stereotyping.
When AI Works Well for Some, but Not Others
In 2018, my wife and I attempted to order from a fast-food burger restaurant using an in-store facial recognition kiosk. She was able to breeze right through the process, easily getting scanned into the database, having her reference image saved, ordering, and then paying for her food. My experience, on the other hand, was a disaster. Despite staff troubleshooting, accommodations, and retries, the software did not recognize me as a person. To be invisible to the inanimate terminal was dehumanizing, but not surprising.
The sample dataset used to train AI models is often too small and lacks diversity. The top facial recognition models are trained on millions of faces with a hundred or more images of the same individual.
The Labelled Faces in the Wild (LFW) dataset by the University of Massachusetts, used by many modelers, originally contained 13,000 faces and labels scraped from 2007 news articles. Anecdotally, it contained more images of former U.S. President George W. Bush than images of Black females. So, it was not a dataset that represented the diversity of faces found in America—let alone the world. Models trained on LFW showed high bias in their spread of accuracy across the spectrum of faces. LFW has a known percentage of mislabeled images.
Obtaining well-labelled training sets is difficult. Top Chinese facial recognition vendors were given their national visa and passport databases for training. While this gave them high accuracy for faces of that region, they misrecognize European and African faces to a higher degree as reported by The U.S. National Institute of Standards and Technology (NIST) in Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects.
As chief technology officer of RealNetworks, I’m responsible for helping shape our computer vision products—including those that use facial recognition. So, naturally, I wanted to make our FR fair—as well as accurate.
Building a Low-Bias Face Recognition Model
My experience in South Africa will always inform my life and my work. When I began collaborating on facial recognition software, my approach was open-eyed, with a clear belief that we—as developers—should not classify people in unscientific ways.
If AI and machine learning are to fulfill the promise of providing objective data classification, we need to use sound scientific training data in our models. So when building our product, we decided not to classify humans by ethnicity or race. Training a model that tries to determine someone’s ethnicity or race from their face is about as valuable as building a model that tries to determine someone’s astrological sign—and similarly unscientific. Ethnicity is a complex mix of heritage and culture, and impossible to categorize by appearance. The notion of “Race,” debunked in Victorian times as a biological classification, is in fact a social construct with no scientific basis. Robert Sussman articulates this well in his book, The Myth of Race: The Troubling and Persistence of an Unscientific Idea, published in 2014 by Harvard University Press.
Essential Standards for Success
Choosing to not label people by race or ethnicity turns out to actually reduce bias the FR models.
You must source a diverse training set of faces from around the world with strong representation by age, gender, skin tone, and geographic origin. Then, you must train your model to recognize individuals with a uniform accuracy across this spectrum of faces. Then, you need to rigorously test to identify the bias yet to be reduced.
Independent, third-party benchmarks and industry standards can ensure facial recognition algorithms perform well. NIST conducts an ongoing battery of tests, known as the Face Recognition Vendor Test (FRVT), to measure the key design characteristics of facial recognition algorithms: accuracy, speed, size, and bias. It showed alarming levels of bias in many facial recognition algorithms, including false-positive rates for Asian and African-American faces 10 to 100 times higher than for white faces. This is simply unacceptable.
One change NIST could make to help improve this would be publishing minimum thresholds for specific use cases. These set thresholds would evolve as the technology improves. For example, to qualify for law enforcement surveillance use, a company’s algorithm would need to be below a specific bias threshold set by NIST.
Facial recognition has come a long way and the technology continues to improve. Companies that wish to deploy their algorithms for public use should submit them to NIST to be part of these ongoing tests to continue to this evolution. It’s incumbent on developers to respect the dignity of all humanity and design AI models that see all faces equitably.
Reza Rassool is chief technology officer of RealNetworks.