Table of Contents
Published: September 1, 2025
Read Time: 10.6 Mins
Total Views: 1,386
Public health data systems manage highly sensitive information. Strict governance ensures privacy and trust, and, with AI, agencies must balance innovation with legal, ethical, and community protections.
Table of Contents
- Why public health data systems are governed so strictly
- Opportunities and challenges of AI in public health data systems
- Designing AI systems to protect privacy and confidentiality
- Ensuring equitable implementation
- Technical approaches to track and govern AI systems
- Building the Governance Framework for AI Use in Public Health Agencies
- Balancing Innovation and Protection when Implementing AI in Public Health Practice
Why public health data systems are governed so strictly
Public health agencies exist to protect populations from threats to health. In doing so, they gather, store, analyze, and share vast amounts of sensitive data. These data can reveal deeply personal facts about individuals — their medical diagnoses, genetic information, sexual practices, substance use, housing status, and travel patterns — details that are often unknown even to family members, employers, or close friends. The public’s willingness to disclose such information depends on an expectation that it will be protected diligently and used solely for legitimate public health purposes.
The United States has a complex web of laws, regulations, and ethical norms that reflect this responsibility. Federal laws such as the Health Insurance Portability and Accountability Act (HIPAA) govern much of the identifiable health information handled by healthcare providers and insurers, but many public health surveillance systems are explicitly exempt from HIPAA because they operate under legal mandates to collect data without patient consent. Instead of HIPAA, public health agencies must adhere to state laws, regulations, and agency policies that impose equally strict — and in some cases stricter — requirements on data security, access control, and permissible use.
State laws typically specify who may access public health databases, under what circumstances, and for what purposes. They often include criminal penalties for unauthorized disclosure. Agencies require staff to undergo regular confidentiality training, sign legal agreements, and work only on secure, agency-controlled networks. Access is granted on a “minimum necessary” basis, with role-based permissions that prevent users from seeing information unrelated to their job functions.
These measures are also essential to maintaining public trust. Communities that have experienced discrimination or abuse (e.g., undocumented immigrants, people living with HIV, or racial and ethnic minorities) may be reluctant to interact with public health agencies if they believe their information could be misused. A single data breach or misuse can undermine years of relationship-building and compromise the effectiveness of entire programs.
The ethical rationale is clear: individuals often have no choice but to share their information with public health authorities when laws require reporting of certain diseases or conditions. The legal requirements create a corresponding ethical obligation for agencies to ensure the information is kept secure, used only for its intended purpose, and never exploited for unrelated goals, whether commercial, political, or otherwise.
Opportunities and challenges of AI in public health data systems
Artificial intelligence has the potential to transform how public health agencies collect, process, and analyze data. AI tools can accelerate data cleaning, detect patterns, forecast trends, and tailor communications, all of which could make public health more efficient and effective. But the introduction of AI into state and local systems also raises new and difficult privacy, ethical, and governance challenges.
Unlike many existing data-processing tools, modern AI systems, particularly machine learning and generative AI, may “learn” from the data they process, creating risks that sensitive information could be embedded in the model’s parameters or inadvertently revealed in responses. Some AI models require large volumes of training data, which could include personally identifiable health information. Others may rely on third-party cloud services, raising questions about where data are stored, who has access, and whether foreign entities could obtain it.
The stakes are especially high for public health agencies because of the legal mandates and ethical expectations I described above. An AI-enabled breach, misuse, or error could damage public trust in the technology and in the agencies themselves. Conversely, careful integration of AI, with strong safeguards and transparent governance, could enhance privacy protections beyond what is possible with current manual or semi-automated systems.
Designing AI systems to protect privacy and confidentiality
For state and local public health agencies, AI implementation must start with the commitment that privacy and confidentiality are non-negotiable. Systems must be designed so that they do not expand access to sensitive data beyond what is legally and operationally justified. This requires a combination of technical, legal, and procedural controls.
- Role-based access controls and least-privilege principles
AI systems must respect the same granular access rules that govern existing public health databases. If a tuberculosis investigator is permitted to see data only for their county, the AI system must not provide them with records from another jurisdiction, even if the system’s broader dataset contains those records. Fine-grained role-based access control (RBAC) should be enforced within the AI architecture, ensuring that queries and outputs are constrained to the user’s authorized scope. - Segregated environments for sensitive data
Wherever possible, AI models should be trained and operated within secure, closed environments owned or fully controlled by the public health agency. This minimizes the risk of data leaving the agency’s secure network. Public cloud AI services may be appropriate only if they meet strict contractual and technical requirements, including data residency guarantees, encryption, and clear prohibitions on secondary use of data for training unrelated models. - Data minimization and anonymization
AI tools should process only the minimum data necessary to perform their function. In some cases, this means removing direct identifiers (names, addresses, social security numbers) before data enter the AI system. More advanced privacy-preserving techniques — such as differential privacy, which injects statistical “noise” to prevent re-identification — can be applied to protect individuals while still enabling analysis. - Secure training and inference pipelines
For AI models trained on sensitive data, the entire training pipeline should be auditable and secured. This includes verifying that training datasets were approved for use, documenting preprocessing steps, and ensuring that model parameters do not store retrievable personal information. Inference, which is the process of running the model to answer a query, must also be logged, with restrictions on how results can be exported or shared.
Ensuring equitable implementation
Public health agencies are expected not only to protect privacy but also to promote equity. AI systems can either support or undermine these goals, depending on how they are designed and deployed.
- Bias detection and mitigation
Training data for AI models often reflect historical inequities. For example, death records, disease surveillance data, and lab reports may undercount certain populations because of barriers to healthcare access, mistrust, or reporting gaps. If these data are used without correction, AI models may produce outputs that reinforce disparities, for example, by underestimating disease burden in marginalized communities. Agencies must incorporate bias audits into model development and retraining, adjusting datasets or algorithms to reduce inequities. - Inclusive design and community engagement
Equitable AI implementation requires input from the communities whose data are being used. This means involving representatives from diverse racial, ethnic, socioeconomic, and geographic backgrounds in governance committees, testing phases, and ongoing oversight. Community engagement can also help identify cultural or linguistic nuances that AI systems must handle appropriately. - Transparency in decision-making
Where AI tools influence public health actions, such as allocating resources, triggering investigations, or issuing public alerts, agencies should disclose the role of AI in those decisions. Transparency includes explaining what data were used, how the model works in broad terms, and how outputs are validated by human experts.
Technical approaches to track and govern AI systems
Robust governance requires that agencies know exactly how AI systems are being used, by whom, and with what data.
- Comprehensive audit logs
Every interaction with an AI system should be logged, including the identity of the user, the date and time, the specific data accessed, the query or task performed, and the output generated. These logs should be reviewed regularly for unusual patterns, such as repeated access to data outside a user’s normal scope. - Model provenance tracking
Agencies should maintain a record of each AI model in use, including its version history, training data sources, hyperparameters, and known limitations. This “model registry” allows agencies to trace outputs back to the specific model and dataset used, an essential capability if errors or breaches occur. - Dataset inventories and lineage
Public health agencies should maintain inventories of all datasets used in AI systems, documenting their origin, legal basis for use, and any preprocessing or anonymization applied. Data lineage tools can automatically track how datasets are combined, transformed, and fed into models. - Access monitoring and anomaly detection
Automated systems can monitor AI usage patterns and detect anomalies, such as an investigator in one program suddenly accessing large volumes of data unrelated to their work. These alerts can trigger human review to determine whether the activity is authorized or suspicious. - Explainability tools
While some AI models are inherently opaque, agencies should favor approaches that allow for meaningful inspection of how outputs were generated. Explainability tools can highlight which input factors were most influential in a model’s decision, helping to detect bias and improve trust among both staff and the public.
Building the Governance Framework for AI Use in Public Health Agencies
- AI governance committees: Multidisciplinary groups that include epidemiologists, IT specialists, legal counsel, ethicists, and community representatives to review proposed AI projects and monitor existing systems.
- Data sharing agreements: Contracts with AI vendors or other partners that specify permissible uses of data, security requirements, audit rights, and penalties for violations.
- Standard operating procedures: Detailed instructions for staff on how to use AI systems, validate outputs, and report problems.
- Incident response plans: Protocols for investigating and responding to suspected AI-related data breaches or misuse.
Balancing Innovation and Protection when Implementing AI in Public Health Practice
Public health agencies will always place a high value on their legal, ethical, and public trust obligations when evaluating a new technology like AI. An AI system that improves data analysis but erodes trust through privacy lapses or inequitable outcomes will ultimately weaken public health. Conversely, a system built with privacy-by-design principles, strong governance, and community input can both protect individuals and enhance the agency’s ability to fulfill its mission.
AI tools can only be successful when they are paired with human judgment. Public health decisions often involve trade-offs between competing values, such as the urgency of alerting the public about an outbreak versus the risk of stigmatizing a community. AI can inform these decisions with faster, richer analysis, but humans must still weigh the ethical and political consequences.
If state and local public health agencies treat AI not as a black box to be trusted blindly but as a set of tools to be integrated within established legal and ethical frameworks, they can accrue its benefits and simultaneously ensure privacy, dignity, and equity.

