In a paper published on the preprint server Arxiv.org, coauthors affiliated with Harvard and Autodesk propose enhancing current facial recognition systems’ ability to identify “gender minority subgroups,” such as individuals in the LGBTQ and non-binary communities. The researchers claim the corpora they created — a “racially balanced” database capturing a subset of LGBTQ people and an “inclusive-gender” database — can mitigate bias in gender classification algorithms. But according to University of Washington AI researcher Os Keyes, who wasn’t involved with the research, the paper appears to conceive of gender in a way that’s not only contradictory, but dangerous.
“The researchers go back and forth between treating gender as physiologically and visually modeled in a fixed way and being more flexible and contextual,” Keyes said. “I don’t know the researchers’ backgrounds, but I’m at best skeptical that they ever spoke to trans people about this project.”
Facial recognition is problematic on its face — so much so that the Association for Computing Machinery (ACM) and American Civil Liberties Union (ACLU) continue to call for moratoriums on all forms of the technology. San Francisco, Oakland, Boston, and five other Massachusetts communities have banned the use of facial recognition by local departments. And after the first wave of recent Black Lives Matter protests in the U.S., companies including Amazon, IBM, and Microsoft halted or ended the sale of facial recognition products. Benchmarks of major vendors’ systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) have found that facial recognition technology exhibits racial and gender bias and performs poorly on people who don’t conform to a single gender identity. And facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.
In spite of this, the paper’s coauthors — perhaps with the best of intentions — sought to improve the performance of facial recognition systems when they’re applied to transgender and non-binary people. They posit that current facial recognition algorithms are likely to amplify societal gender bias and that the lack of LGBTQ representation in popular benchmark databases leads to a “false sense of progress” on gender classification tasks in machine learning, potentially harming the self-confidence and psychology of those misgendered by the algorithms.
That’s reasonable, according to Keyes, but the researchers’ assumptions about gender are not.
“They settle on treating gender as fixed and modeling non-binary people as a ‘third gender’ category in between men and women, which isn’t what non-binary means at all,” Keyes said. “People can be non-binary and present in very different ways, identify in very different ways, [and] have many different life histories and trajectories and desired forms of treatment.”
Equally problematic is that the researchers cite and draw support from a controversial study implying all gender transformation procedures, including hormone replacement therapy (HRT), cause “significant” facial variations over time, both in shape and texture. Advocacy groups like GLAAD and the Human Rights Campaign have denounced the study as “junk science” that “threatens the safety and privacy of LGBTQ and non-LGBTQ people alike.”
“This junk science … draws on a lot of (frankly, creepy) evolutionary biology and sexology studies that treat queerness as originating in ‘too much’ or ‘not enough’ testosterone in the womb,” Keyes said. “Again, those studies haven’t been validated — they’re attractive because they imply that gay people are too feminine, or lesbians too masculine, and reinforce social stereotypes. Depending on them and endorsing them in a study the authors claim is for mitigating discrimination is absolutely bewildering.”
The first of the researchers’ databases — the “inclusive database” — contains 12,000 images of 168 unique identities, including 29 White males, 25 White females, 23 Asian males, 23 Asian females, 33 Black males, and 35 Black females from different geographic regions, 21 of whom (9% of the database) identify as LGBTQ. The second — the non-binary gender benchmark database — comprises 2,000 headshots of 67 public figures labeled as “non-binary” on Wikipedia.
Keyes takes issue with the second data set, arguing it is non-representative because it’s self-selecting and because of the way appearance tends to be policed in celebrity culture. “People of color, disabled people, poor people need not apply — certainly not as frequently,” they said. “It’s sort of akin to fixing bias against women by adding a data set exclusively of women with pigtails; even if it ‘works,’ it’s probably of little use to anyone who doesn’t fit a very narrow range of appearances.”
The researchers trained several image classification algorithms on a “racially imbalanced” but popular facial image database — the Open University of Israel’s Adience — augmented with images from their own data sets (1,500 images from the inclusive database and 1,019 images from the non-binary database). They then applied various machine learning techniques to mitigate algorithmic bias and boost the models’ accuracy, which they claim enabled the best-performing model to predict non-binary people with 91.97% accuracy.
The results ignore the fact that “trans-inclusive” systems for nonconsensually defining someone’s gender are a contradiction in terms, according to Keyes. “When you have a technology that is built on the idea that how people look determines, rigidly, how you should classify and treat them, there’s absolutely no space for queerness,” they said. “Rather than making gender recognition systems just, or fair, what projects like this really do is provide a veneer of inclusion that serves mostly to legitimize the surveillance systems being built — indeed, it’s of no surprise to me that the authors end by suggesting that if there are problems with their models, they can be fixed by gathering more data, by surveilling more non-binary people.”