AI offers the potential to revolutionize infectious disease surveillance by improving data speed, accuracy, and completeness. But adoption requires attention to privacy, security, and public health priorities.

AI and Infectious Disease Surveillance

For public health practice, AI is likely to have the most immediate impact on infectious disease surveillance, improving the speed, accuracy, and completeness of data. This is because the business and tech sectors have already developed tools for managing large amounts of health related data — improving how it’s collected, cleaned, analyzed, and used – and disease surveillance systems are simply repositories of data about the health of a community. The AI tools being used for other sectors, therefore, are readily applicable to tracking the spread of infectious diseases and helping public health practitioners prevent and control outbreaks.

Lessons from NYC: Challenges in Disease Surveillance Systems

When I oversaw infectious diseases for the New York City Health Department, I saw how even the most well financed and staffed health agency in the United States relies on surveillance systems that are slow, resource-intensive, and error-prone. We spent millions of dollars upgrading IT systems, hiring skilled epidemiologists, and working closely with labs, our agency’s information technology team, and specialists from each of the infectious disease programs to automate and improve systems. During COVID-19, huge numbers of people in City government were reassigned to enhance these systems and conduct real-time analyses to improve prevention, testing, tracing, and care. And, yet, despite the investments before and during COVID-19, much of the work still depends on human cognitive labor. With AI, we now have the opportunity to make rapid, substantive improvements in infectious disease surveillance beyond what’s ever been possible before.

Understanding Infectious Disease Surveillance and Why It Matters

If you were to create a new country, the first priority for your public health agency would be to develop a system to track who lives and dies and what health conditions make people sick and kill them. You would prioritize health conditions based on several factors. How frequently does this disease occur? Does the disease cause people to get severely ill and die, disrupt important economic functions, or harm a population that is considered particularly important (e.g., children, pregnant women)? Are there measures to prevent these infections that are effective, feasible, affordable, affordable, and ethical? Does it spread from person-to-person?

The approach to tracking who lives and dies is known as “vital statistics” (to be covered in a future blog post), and the approach to tracking disease is known as “public health surveillance.” Public health surveillance is defined as the continuous systematic collection, analysis, and interpretation of health-related data needed to plan, implement, and evaluate public health practices.

Because infectious diseases have disrupted economies and societies for all of recorded history—plague, smallpox, cholera, influenza, HIV, COVID-19— all countries and local jurisdictions maintain a list of viruses, bacteria, fungi, and other pathogens that must be reported to a government health agency (“reportable” or “notifiable” diseases). Agencies use this data to monitor the incidence and burden of infectious disease in the population, detect outbreaks, and measure the impact of programs and policies to control disease.

How Pathogen-Based Surveillance Works in Public Health

Agencies use multiple, complementary approaches to track infectious diseases. For this blog, we will focus on the most well-established and important form of infectious disease surveillance: pathogen-based surveillance, which is the systematic collection of data from clinical microbiology laboratories about specific, high-priority pathogens that have been identified in specimens taken from ill persons. This is also commonly referred to as “laboratory-based surveillance” or “surveillance for laboratory-confirmed disease.”

Public health officials rely on pathogen-based surveillance because it is highly specific and actionable. It ensures that they dedicate their limited time and effort on people they know for sure are sick with a reportable infectious disease (not just a person with ulcers in their genital area, but a person with syphilis), allows them to do further testing to connect cases they may be widely dispersed by time and space (e.g., tracking an E. Coli outbreak by testing its genetic sequence), and allows them to detect new infectious disease threats (e.g., a new strain of influenza or COVID-19 variant).

Of course, pathogen-based surveillance is less sensitive than relying on reports from a doctor or a member of the public, because most people with an infectious disease do not have a laboratory test. (When was the last time you had diarrhea and went to a doctor, had a specimen tested, and got a result back with the report of a pathogen?) Epidemiologists now frequently use statistical methods to estimate how many cases are not being tested and reported, known as “nowcasting.”

Pathogen-based surveillance systems typically rely on laws that require clinical laboratories to report any person with a laboratory test confirming a reportable infectious disease. These lab reports are ideally submitted electronically—although paper, fax, and phone remain sadly all too common—and include data elements, such as the type of pathogen and specimen collection date and, ideally additional data such as: test performed; name, age, date of birth, gender, and home address of the patient; name and address of the provider and facility that ordered the test.

Once this data reaches public health agencies, epidemiologists must manage it using systems that are sometimes automated and, in other situations, require manual data entry and/or coding. The first steps are to verify it is a condition that must be reported from a patient under that agency’s jurisdiction (they live in that state and have a confirmed diagnosis), remove duplicate entries, and resolve inconsistencies. Second, epidemiologists try to augment the data with additional information, such as geographic or population-level demographics, by contacting the medical provider who ordered the test, the lab that tested the specimen, or the patient themselves. Finally, analysts use the data to produce reports, charts, and alerts that inform decision-makers, healthcare providers, and the public.

AI Applications in Data Collection and Reporting for Infectious Disease

AI tools could help labs and public health agencies automate and standardize the reporting process. Large commercial labs already have some of these functions, but many academic and smaller labs do not.

  • Natural Language Processing (NLP): Labs generate vast amounts of unstructured data in free-text lab reports… AI-powered NLP tools could extract relevant information and convert it into structured formats.
  • Connected diagnostics: Machine learning algorithms could monitor the output of laboratory systems for reportable results and automate steps needed for reporting.
  • Compliance: AI could automate monitoring for missing reports, identify anomalies, and even issue compliance notices if authorized.

AI to Improve Infectious Disease Data Cleaning and Integration

Cleaning and integrating data from multiple sources is one of the most tedious steps… AI can help bring these groups together in an automated way to deal with duplicates, validation, and metadata. Machine learning models could identify anomalies or errors and flag them for review.

AI to Augment Data about Infectious Disease Cases

Most lab reports contain sparse data about a patient… AI could link surveillance data with internal and external sources, enabling more detailed demographic data. AI systems could also automate contacting patients or providers to collect follow-up information through surveys, chatbots, or voice systems.

AI to Improve Infectious Disease Data Analysis and Interpretation

  • Nowcasting: Estimating real incidence by combining lab reports with other real-time sources.
  • Forecasting: Running automated infectious disease models to predict outbreaks.
  • Custom Reports: NLP tools could generate tailored summaries for different stakeholders.
  • Trend Detection: Algorithms could identify subtle patterns or emerging disease trends.

Privacy and Security Risks of AI in Disease Surveillance

While HIPAA governs much of healthcare data, infectious disease surveillance is exempt and instead governed by state/local laws. AI systems must adhere strictly to these frameworks, ensuring data protection, anonymization, and authorized access only. AI can also help anonymize data more efficiently to prevent re-identification.

The Future of AI in Infectious Disease Surveillance: What’s Realistic?

In the short term, public health agencies need tools that are practical and well validated. Partnerships among experts, developers, and policymakers will be essential to ensure adoption improves surveillance while maintaining public trust.

In a separate blog post, I’ll cover the other most widely used approach to infectious disease surveillance: event-based surveillance.

About the Author: Dr. Jay Varma

Dr. Jay Varma is a physician and public health expert with extensive experience in infectious diseases, outbreak response, and health policy.