Artificial intelligence has a racism problem. Look no further than the bots that go on racist rants, or the facial recognition tech that refuses to see Black people, or discriminatory HR bots that won’t hire people of color. It’s a pernicious issue plaguing the world of neural networks and machine learning that not only strengthens existing biases and racist thinking, but also worsens the effects of racist behavior towards communities of color everywhere.
And when it’s coupled with the existing racism in the medical world, it can be a recipe for disaster.
That’s what’s so concerning about a new study published in The Lancet last week by a team of researchers from MIT and Harvard Medical School, which created an AI that could accurately identify a patient’s self-reported race based on medical images like X-rays alone. As if that wasn’t creepy enough, the researchers behind the model don’t know how it’s reaching its conclusions.
“The team discovered that the model was able to correctly identify race with roughly 90 percent accuracy—a feat that’s virtually impossible for a human doctor to perform when looking at the same images.”
Marzyeh Ghassemi, assistant professor at the MIT Department of Electrical Engineering and Computer Science and co-author of the paper, told The Daily Beast in an email that the project was created initially out of an effort to find out why an AI model was more likely to underdiagnosed women and minorities. “We wanted to establish how much of this bias could be removed from the models, which led to us asking how much information about the patient’s self-reported race could be detected from these images,” she said.
To do that, they created a deep learning model trained to view X-rays, CT scans, and mammograms from patients who self-reported their race as Asian, Black, or white. While the images contained no mentions of patient race, the team discovered that the model was able to correctly identify race with roughly 90 percent accuracy—a feat that’s virtually impossible for a human doctor to perform when looking at the same images.
Of course, this poses a number of big, hairy ethical issues with some terrifying implications. For one, research like this could give ammunition to so-called race realists and other conspiracy theorists who peddle in pseudoscience that purports that there’s an inherent, medical difference in different racial groups even though that is, of course, complete and utter BS.
There’s also the fact that a model like this can be incredibly harmful if rolled out at scale to hospitals and other practices. The medical industry continues to grapple with an incredibly grim history of medical racism and resulting malpractice. This has irrevocably shaped the way communities of color interact with (or don’t interact with) the healthcare system. If an AI were to be introduced that can somehow detect a person’s race based off of a simple X-ray, this could further deteriorate that already strained relationship.
To their credit, though, this is not the goal of the study’s authors. In fact, they’re looking to strengthen guardrails to help protect the communities disproportionately impacted by practices like medical racism—particularly when it comes to hospitals and medical providers using neural networks.
“The reason we decided to release this paper is to draw attention to the importance of evaluating, auditing, and regulating medical AI,” Leo Anthony Celi, a principal research scientist at MIT and co-author of the paper, told The Daily Beast. “The FDA doesn’t require that model performance in non-medic settings are reported by subgroups, and commercial AI often doesn’t report subgroup performance either.”
However, there’s still the massive, deep-learning elephant in the room: The researchers still have no idea how the AI is ascertaining patient race from an X-ray. The opaque nature of the model is disconcerting—but not uncommon when it comes to AI. In fact, scientists have struggled to understand some of the most advanced machine-learning algorithms in the world—and the model from MIT is no exception. However, this one is further underscored by the grim implications of how it can be used and weaponized to harm people of color.
At the heart of the mystery is proxy discrimination, a term that describes a basic issue with big AI models that might be unintentionally trained to identify race by using a proxy other than a person’s race. In the past, for example, we’ve seen home-lending algorithms that disproportionately reject Black and Brown applicants by using their zip code. Because America is so segregated, zip code will correlate really strongly with race.
Disconcertingly, while the study’s authors looked at certain proxies that the model could be using to ascertain the patients’ race such as the bone density, they couldn’t find the one it was using.
“There were no obvious statistical correlations that humans could be drawing upon,” Brett Karlan, a postdoc researching cognitive science, ethics, and AI at the University of Pittsburgh and who was not involved with the study, told The Daily Beast. “It was just a feature of the opaque network itself—and that’s really scary.”
According to Karlan, the reason it’s scary is simple: We deserve to know how AI—especially when it’s used to manage our physical health—reaches its conclusions. Without that explanation, we don’t know if it’s putting us at risk for harm by way of racist, sexist, and otherwise biased behavior. “You would want to know that an algorithm suggesting a specific diagnostic outcome for you or taking a specific kind of medical treatment was treating you as a member of a racial category,” Karlan explained. “You could ask your doctor why you’re undergoing a specific kind of treatment, but you might not be able to ask your neural network.”
While the reason the AI is capable of reaching its conclusions is still a big question mark, the researchers behind the paper believe that the patients’ melanin, the pigment that gives Black and Brown people their skin color, could be how.
“You would want to know that an algorithm suggesting a specific diagnostic outcome for you or taking a specific kind of medical treatment was treating you as a member of a racial category.”
— Brett Karlan, University of Pittsburgh
“We hypothesize that the melanin levels in human skin modify very slight patterns in all parts of the frequency spectrum during medical imaging,” Ghassemi said. “This hypothesis cannot be verified without images of the patients’ skin tone being matched to their chest X-rays, which we did not have access to for this study.”
She added that similar medical devices have been known to be poorly calibrated for darker skin and that their work “can be viewed as an additional result in this direction.” So it could simply be a case of the AI picking up on very subtle differences between the X-ray images that can’t be discerned by the human eye. In other words, they simply created a glorified melanin detector. If that’s the case, it’s a proxy we can point to as the cause behind the stunning findings. More research is needed, though, before a firm conclusion can be reached—if it’s ever reached at all that is.
For now, the team plans to unveil similar findings in another study where they discovered an AI was able to identify the race of patients based on clinical notes that had race redacted from them. “Just as with the medical imaging example, we found that human experts are not able to accurately predict patient race from the same redacted clinical notes,” Ghassemi said.
As with the medical imaging AI, it’s clear that proxy discrimination can and will continue to be a pervasive issue within medicine. And it’s one that, unlike an X-ray, we can’t always see through that easily.