Public health must diversify which data ‘count’ in AI algorithms
Last Wednesday, Elon Musk, among other industry leaders, urged artificial intelligence labs to institute a six-month moratorium on artificial intelligence research, given the growing concern about the wide-ranging consequences of the groundbreaking technology. Public health experts must immediately follow suit with Musk’s directive before it is too late to repair the irrevocable population health damages.
Despite AI’s promising potential as a public health surveillance and predictive tool, health professionals across different sectors have expressed expansive concerns about its long-term viability, highlighting examples such as poor model interpretability, insufficient infrastructure, lack of regulation, data sharing, and of course, broader privacy and ethical concerns. Arguably the most troubling concern, however, centers on AI’s potential to incorporate bias into its algorithmic data structure, which could disproportionately affect marginalized populations because of their underrepresentation in scientific research more generally.
Proof of this bias exists. For example, recent evidence of racial bias was found in a commercial algorithm commonly used in U.S. hospitals, such that Black patients were assigned consistently lower risk scores despite being equally sick as their white counterparts. These issues are enough to give industry experts pause with fast-tracking AI implementation into ubiquity.
One common solution to AI bias in public health among industry leaders is to include more “representative data” from marginalized populations into AI learning programs. Although an important first step, this solution does not go far enough in scope when assessing the level of negative impact AI poses, particularly with regard to reifying structural health inequities among these groups. Importantly, AI critics must think past mere calls for more “representative data” among these populations to advocate for diversity of data itself.
As an assistant professor in public health at San Jose State University, situated in the heart of Silicon Valley, I have become all too familiar with how “hierarchy of data” debates present in many technocratic-driven spaces such as the National Institutes of Health and the social innovation health sector work to privilege certain data forms over others.
For example, much representative data on marginalized communities counted in machine-learning programs are marked by an overreliance on empirically driven predictive models that draw on decontextual, impersonal data such as large-scale epidemiological studies and national health survey data, and are gathered by experts outside the communities being referenced. Often divorced from the context by which they were collected, the referenced data do not reflect the on-the-ground social, political, economic and health realities of these communities.
The implications of not incorporating diverse data forms into what “counts” as usable AI currency extend further than simple methodological oversight, and have the potential to cause socio-cultural violence by ignoring or erasing altogether context-specific cultural histories, indigenous forms of knowledge, and other alternative understandings from health diagnostic and intervention approaches — which may represent a missing element to predicting positive health-seeking behaviors.
Regardless of whether AI labs agree to Musk’s missive or not, the revolution seems inevitable and requires immediate critical reflection. Leadership in public health and other allied health sectors must have a reckoning with how best to collect, analyze and evaluate AI health data.
A good place to start is by expanding what “counts” as referenceable data in AI algorithms to incorporate things such as lived narratives (e.g., firsthand accounts from individuals, groups, communities), local and cultural knowledge forms (e.g., indigenous knowledge systems), and community-driven data projects (e.g., community-based research and generated solutions) among underrepresented populations, which cannot be deduced down to predictive metrics by outside experts.
Including a diversity of data (e.g., methodologies, evaluative tools, paradigms) can posit many benefits, such as accounting for broader ecological factors when addressing health issues that exist outside of “risk factors,” “behavior change” models, and other individual-level metrics; generating new cause-effect health rationales previously limited to the linear scope of dominant predictive models; and the development of more egalitarian approaches to science that help dismantle data and knowledge hierarchies. Public health stakeholders should immediately develop an accountability process model that identifies existing data gaps and inequities and builds regulatory mechanisms for more equitable practices.
Although not a panacea for solving the multitude of potential AI-generated risks, promoting data diversity represents an important course correction to this potentially dangerous road that fervent AI advocates are taking us down. The next few years will be crucial for policymakers and industry experts to make the right decisions when determining the level of influence this technology will have over society’s lived health realities. Ultimately, we will have to determine if this push for scientific “progress” is worth its wide-ranging risks.
Andrew Carter, PhD, MPH, is an assistant professor in the Department of Public Health and Recreation, College of Health and Human Sciences, at San Jose State University.
Copyright 2023 Nexstar Media Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed. Regular the hill posts