Understanding basic descriptive statistics for Public health professionals

Understanding basic descriptive statistics for Public health professionals

Descriptive statistics form the foundation of data analysis, offering deep and clear insights into the characteristics of datasets and guiding the way to more complex inferential statistics. In this article, we’ll cover the basic descriptive statistics that every public health professional should be familiar with and how they can be applied in the field.

Why do descriptive statistics matter in Public Health

Descriptive statistics help summarize large amounts of data, providing a clear picture of trends, patterns, and distributions in the data. Whether you're working on epidemiological studies, analyzing clinical data, or assessing community health programs, descriptive statistics allow you to:

  • Summarize data efficiently
  • Understand the distribution and variability in th data
  • Make comparisons across groups in the data.
  • Provide a guide for further inferential analysis

What are the Key descriptive statistics for Public health professionals?



1. Measures of central tendency

Central tendency describes the center or typical value of a dataset. They provide a single value that summarizes the entire dataset or a variable , allowing public health professionals to understand the "typical" or "average" case in a population. The three main measures are:

Mean

The average of all data points. It’s useful for understanding the overall level of a dataset. It takes all values into account and is useful for normally distributed data. The mean can however be skewed by extreme outliers (e.g., very high or low values). An example would be the mean number of new COVID-19 cases per day in a population.

formula

Median

The middle value in a sorted dataset; when arranged in ascending order (small to large). The median is more robust to outliers than the mean and provides a better measure of central tendency for skewed data.


For a given data set: 12, 14, 11, 12, 12, 12, 15, 17, 22, 15, 12

Ascending Order: 11, 12, 12, 12, 12, 12, 14, 15, 15, 17, 22

Thus, the middle number in the data set Median = 12

E.g. the median age of patients admitted to a hospital for treatment.

Mode

The most frequently occurring value in a dataset. It’s helpful when analyzing categorical data or when the data has multiple peaks. There may be no mode or more than one mode, and it does not always provide a clear measure of central tendency.

Example: The mode of the most common health condition in a rural community.

2. Measures of dispersion

Measures of dispersion (or variability) describe the spread or distribution of data around a central value (e.g., mean or median). They help to understand the degree of variability within a dataset, indicating how consistent or scattered the data points are. These include;

Range

The difference between the maximum and minimum values.

The range is simple to calculate and gives a quick sense of the data spread. However it is sensitive to outliers, and provides no information on how data points are distributed between the extremes.

An example could be the range of systolic blood pressure levels among patients at a clinic (e.g., 140 mmHg - 100 mmHg = 40 mmHg)

Standard Deviation (SD)

This is referred to as the average distance of the individual observations from the mean. Standard deviation of the population is represented as "σ". Standard deviation of the sample is represented as "s".

Formula


  • Sx stands for standard deviation of the sample.
  • xi is the value of each variable in the data set.
  • x bar represents the mean.
  • n is the total sample size.
  • and Σ stands for summation i.e., the sum of “xi – x bar” for all values of x

The standard deviation is easy to interpret and is widely used. It indicates the typical distance of data points from the mean. However, like variance, it is sensitive to outliers

Variance

The variance indicates the square of standard deviation.


Formula

Frequencies and Percentages

When working with categorical data, frequencies (counts) and percentages provide simple yet informative insights. These are normally reported in a table.

  • Frequency distributions are useful for summarizing how often different values or categories occur in a dataset e.g. the number of individuals in different age groups participating in a smoking cessation program.
  • Percentages express these frequencies as a proportion of the whole.

Visualizing descriptive statistics

Data visualization is a powerful way to communicate the insights gained from descriptive statistics. Common visualizations in public health include:

  • Histograms are useful for displaying the distribution of continuous data.
  • Bar charts are often used to compare frequencies or percentages of different categories.
  • Box plots help visualize the distribution of data, including the median, quartiles, and potential outliers.

Practical applications of descriptive statistics in Public Health

  • In epidemiological studies, descriptive statistics are often the first step in understanding the spread of diseases, identifying patterns, and planning interventions.
  • Public health professionals frequently assess the needs of communities by summarizing data from surveys or health records. Average life expectancy, and the proportion of individuals with access to healthcare services are common metrics used to identify areas for improvement in the community.
  • Descriptive statistics can evaluate the success of public health programs. For instance, you might calculate the percentage of patients who completed a treatment program or the standard deviation of their recovery times to assess program consistency.

Descriptive statistics are the backbone of public health data analysis. They help professionals summarize, visualize, and interpret data, guiding decision-making and the design of interventions. Mastering these basic concepts enables public health professionals to better understand the populations they serve and make data-driven decisions that improve health outcomes.


Anne Mwende Ndolo

"Bsc, MPH | Experienced Epidemiologist & Monitoring and Evaluation Expert| Committed to Advancing Public Health"

1 个月

This is a very good summary. Very informative

回复
回复
Sarim Ahmad

Business Development Manager @ GreenLife Pharma | Public Health Professional

1 个月

I found it a great post! Because in my point of view, understanding data is vital in public health, and you're highlighting descriptive statistics as a foundation amazingly. I think these concepts should be applied in real-world projects.

Dr. Innocent Maposa

Biostatistician | | Data Scientist|| Mathematical Modelling|| Epidemiologist (better biostatistics, better clinical research)

1 个月

Simple and very important concepts

Rashid Hamisi MPH, PhD Student

Public Health Practitioner | Health Policy and Systems Researcher (HPSR) | Health Systems Strengthening | Implementation Evaluation|

1 个月

要查看或添加评论,请登录

社区洞察

其他会员也浏览了