The views expressed by contributors are their own and not the view of The Hill

Flying blind: Data infrastructure needed to fight the next pandemic


Picture a world where a sniffle spurs you to reach into your pocket, pull out your smartphone and enter your symptoms into a self-triage app. In addition to the answers you provide, the app also incorporates data from a synced wearable device. The app suggests consulting a health care professional, and you schedule a telemedicine appointment. Your doctor suggests a diagnostic test, which is delivered to your home. You perform the test and enter the results into your smartphone. The entire process takes place on an open, yet secure, data infrastructure.

Then your health data is reported to your healthcare provider and is also anonymized and reported to public health authorities. The CDC receives data feeds from throughout the country and uses a data-wrangling program to clean and standardize the format of the data. This process allows the nation’s top epidemiologists to see a real-time snapshot of an emerging infectious disease outbreak, which informs and initiates critical response efforts. 

As we enter the second year of the COVID-19 pandemic, with the U.S. death toll well past 500,000, it is time for the government to focus on developing a modern data capture, analysis, and sharing capability. The unfettered access to reliable and up-to-date data was implicit and assumed during the COVID-19 crisis, and yet we realize it as perhaps our biggest failing. We must do better. Here are the three areas of greatest need:

Data from digital health technology

Digital health tools already empower individuals to direct their own care; U.S. consumers can now book telemedicine appointments, self-triage with artificial intelligence-powered symptom checkers, and from their wearables. But what has been done to maximize the public health benefits of digital health technology?

America needs an integrated data infrastructure that allows for collecting and analyzing data generated from digital health tools. So-called “data modernization” cannot come soon enough. Moreover, consumers need privacy and security assurances that will ensure the widespread use of these technologies.

The bioinformatics bottleneck

Bioinformatics is the application of data science to biology, with an emphasis on computational data. A recent article in Nature called to fix the bioinformatics bottleneck, citing critical applications such as better tracking of the COVID-19 variants that have experts worried about a fourth wave of COVID-19 in the country.

Robust bioinformatics capabilities allow for better disease transmission tracking and identifying mutations of COVID-19 and future outbreaks. There are numerous areas for improvement, such as improved data-sharing, improved handling of data uncertainty and establishing open data infrastructures to allow for open-source corrections of data errors. Progress in these areas, combined with increased genomic surveillance, could help us identify emerging threats or re-emerging threats, and prevent us from being blind-sided by variants in future outbreaks. 

The need for data wrangling

Data wrangling might be the most important term you have never heard. This defines the process by which raw data — sometimes of questionable quality or differing formats — is transformed into more polished data that is fit for analysis. Recent surveys report that data scientists spend between 45 percent to 80 percent of their time data wrangling. The challenge is particularly acute for data related to infectious disease outbreaks.

The U.S. public health and health care systems are fragmented: we have a federalist system of federal, state and local public health agencies; dozens of health insurance providers; and a vast and varying network of hospital systems. The country also lacks a universal electronic health record system. This means that data generated by health care providers and public health authorities vary in format, quality, timeliness and more. Integrating clinical digital health data with public health surveillance data remains a major stumbling block. Data wrangling solutions, such as platforms for converting paper records to digital data, could greatly improve the accuracy and speed of these daunting tasks. 

This vision is within reach, now. The recently signed American Rescue Plan includes several provisions for expanding health-related IT capabilities and is an excellent start. So, too, is the significant funding directed toward broadband expansion as a key component of the Biden administration’s $2 trillion infrastructure plan. But as these plans come to fruition, we need to be sure we are also developing the data governance and data trust needed to ensure the broad integration and coordination of data inputs — protected by privacy and security that the American people expect. We need to continue building these data tools so critical for a successful response with our eyes wide open. COVID-19 has taught us a lot. We cannot afford to fly blind into the next crisis. 

Joseph Buccina is a Director at In-Q-Tel’s B.Next, a strategic initiative focused on biotechnology and national security.

Dr. Dan Hanfling is an emergency physician and a national expert on health care system and public health preparedness and response. He is also vice president on the technical staff at In-Q-Tel.