Overfitting / underfitting

Understanding Overfitting and Underfitting
Why Overfitting and Underfitting Matter
Key Components of Overfitting and Underfitting
Applications in Machine Learning Models
Challenges in Addressing Overfitting/Underfitting
Future Research in Model Generalization

Published: August 28, 2025

Read Time: 2.5 Mins

Total Views: 127

ALL 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Overfitting / Underfitting

In plain language, overfitting and underfitting are terms used in machine learning to describe problems with a model’s ability to generalize well to new data.

Understanding Overfitting and Underfitting

Overfitting occurs when a model learns the training data too well, capturing noise along with the underlying pattern. This results in excellent performance on the training data but poor generalization to new, unseen data. Underfitting, on the other hand, happens when a model is too simple to capture the underlying trend of the data, leading to poor performance both on the training data and new data. The balance between these two extremes is crucial for developing effective, generalizable models.

Why Overfitting and Underfitting Matter

These concepts are important because the success of machine learning models in public health relies on their ability to make accurate predictions on new, unseen data. Overfitting can lead to models that perform well in testing environments but fail in real-world applications; underfitting can lead to models that are ineffectual from the start. Policymakers and health professionals depend on reliable models to forecast disease outbreaks, allocate resources, and implement timely interventions.

Key Components of Overfitting and Underfitting

Complexity: Models with too many parameters are prone to overfitting, while those with too few may underfit.
Data Quality: High noise levels or insufficient data can exacerbate these issues.
Model Selection: Choosing the right model architecture is critical to achieving a balance.

Applications in Machine Learning Models

In public health, machine learning models predict outcomes such as disease spread or patient risk factors. Overfitting could lead to incorrect predictions of a disease outbreak, causing misallocation of resources. For example, a model trained on extensive data about a specific population might struggle to predict outcomes in a different demographic if overfitting occurs. Conversely, a model that underfits might miss key trends, failing to alert health officials to impending issues.

Challenges in Addressing Overfitting/Underfitting

Addressing these issues involves trade-offs, such as simplifying the model or increasing data quality. Regularization techniques, cross-validation, and obtaining more comprehensive data sets are common strategies. However, obtaining high-quality data can be difficult and costly, especially in resource-limited settings. Policymakers must be aware of these constraints when relying on machine learning for public health solutions.

Future Research in Model Generalization

Research continues to explore better algorithms and techniques to mitigate overfitting and underfitting. Advances in deep learning, ensemble methods, and data augmentation show promise; however, these require validation in diverse public health contexts. The future of model generalization lies in developing robust techniques that adapt to various data types and settings, ensuring reliable decision-making in public health.

« Back to Glossary Index

About the Author: Dr. Jay Varma

Dr. Jay Varma is a physician and public health expert with extensive experience in infectious diseases, outbreak response, and health policy.