Understanding basic descriptive statistics for Public health professionals
Jesca Birungi
Biostatistician | helping healthcare professionals and scientists understand hidden insights in complex healthcare data | Open to PHD and research opportunities in Biostatistics
Descriptive statistics form the foundation of data analysis, offering deep and clear insights into the characteristics of datasets and guiding the way to more complex inferential statistics. In this article, we’ll cover the basic descriptive statistics that every public health professional should be familiar with and how they can be applied in the field.
Why do descriptive statistics matter in Public Health
Descriptive statistics help summarize large amounts of data, providing a clear picture of trends, patterns, and distributions in the data. Whether you're working on epidemiological studies, analyzing clinical data, or assessing community health programs, descriptive statistics allow you to:
What are the Key descriptive statistics for Public health professionals?
1. Measures of central tendency
Central tendency describes the center or typical value of a dataset. They provide a single value that summarizes the entire dataset or a variable , allowing public health professionals to understand the "typical" or "average" case in a population. The three main measures are:
Mean
The average of all data points. It’s useful for understanding the overall level of a dataset. It takes all values into account and is useful for normally distributed data. The mean can however be skewed by extreme outliers (e.g., very high or low values). An example would be the mean number of new COVID-19 cases per day in a population.
formula
Median
The middle value in a sorted dataset; when arranged in ascending order (small to large). The median is more robust to outliers than the mean and provides a better measure of central tendency for skewed data.
For a given data set: 12, 14, 11, 12, 12, 12, 15, 17, 22, 15, 12
Ascending Order: 11, 12, 12, 12, 12, 12, 14, 15, 15, 17, 22
Thus, the middle number in the data set Median = 12
E.g. the median age of patients admitted to a hospital for treatment.
Mode
The most frequently occurring value in a dataset. It’s helpful when analyzing categorical data or when the data has multiple peaks. There may be no mode or more than one mode, and it does not always provide a clear measure of central tendency.
Example: The mode of the most common health condition in a rural community.
2. Measures of dispersion
Measures of dispersion (or variability) describe the spread or distribution of data around a central value (e.g., mean or median). They help to understand the degree of variability within a dataset, indicating how consistent or scattered the data points are. These include;
领英推荐
Range
The difference between the maximum and minimum values.
The range is simple to calculate and gives a quick sense of the data spread. However it is sensitive to outliers, and provides no information on how data points are distributed between the extremes.
An example could be the range of systolic blood pressure levels among patients at a clinic (e.g., 140 mmHg - 100 mmHg = 40 mmHg)
Standard Deviation (SD)
This is referred to as the average distance of the individual observations from the mean. Standard deviation of the population is represented as "σ". Standard deviation of the sample is represented as "s".
Formula
The standard deviation is easy to interpret and is widely used. It indicates the typical distance of data points from the mean. However, like variance, it is sensitive to outliers
Variance
The variance indicates the square of standard deviation.
Formula
Frequencies and Percentages
When working with categorical data, frequencies (counts) and percentages provide simple yet informative insights. These are normally reported in a table.
Visualizing descriptive statistics
Data visualization is a powerful way to communicate the insights gained from descriptive statistics. Common visualizations in public health include:
Practical applications of descriptive statistics in Public Health
Descriptive statistics are the backbone of public health data analysis. They help professionals summarize, visualize, and interpret data, guiding decision-making and the design of interventions. Mastering these basic concepts enables public health professionals to better understand the populations they serve and make data-driven decisions that improve health outcomes.
"Bsc, MPH | Experienced Epidemiologist & Monitoring and Evaluation Expert| Committed to Advancing Public Health"
1 个月This is a very good summary. Very informative
Very informative
Business Development Manager @ GreenLife Pharma | Public Health Professional
1 个月I found it a great post! Because in my point of view, understanding data is vital in public health, and you're highlighting descriptive statistics as a foundation amazingly. I think these concepts should be applied in real-world projects.
Biostatistician | | Data Scientist|| Mathematical Modelling|| Epidemiologist (better biostatistics, better clinical research)
1 个月Simple and very important concepts
Public Health Practitioner | Health Policy and Systems Researcher (HPSR) | Health Systems Strengthening | Implementation Evaluation|
1 个月Andrew Mhangira