登录查看更多内容

Understanding basic descriptive statistics for Public health professionals

Jesca Birungi

Biostatistician | helping healthcare professionals and scientists understand hidden insights in complex healthcare data | Open to PHD and research opportunities in Biostatistics

发布日期: 2024年9月30日

Descriptive statistics form the foundation of data analysis, offering deep and clear insights into the characteristics of datasets and guiding the way to more complex inferential statistics. In this article, we’ll cover the basic descriptive statistics that every public health professional should be familiar with and how they can be applied in the field.

Why do descriptive statistics matter in Public Health

Descriptive statistics help summarize large amounts of data, providing a clear picture of trends, patterns, and distributions in the data. Whether you're working on epidemiological studies, analyzing clinical data, or assessing community health programs, descriptive statistics allow you to:

Summarize data efficiently
Understand the distribution and variability in th data
Make comparisons across groups in the data.
Provide a guide for further inferential analysis

What are the Key descriptive statistics for Public health professionals?

1. Measures of central tendency

Central tendency describes the center or typical value of a dataset. They provide a single value that summarizes the entire dataset or a variable , allowing public health professionals to understand the "typical" or "average" case in a population. The three main measures are:

Mean

The average of all data points. It’s useful for understanding the overall level of a dataset. It takes all values into account and is useful for normally distributed data. The mean can however be skewed by extreme outliers (e.g., very high or low values). An example would be the mean number of new COVID-19 cases per day in a population.

formula

Median

The middle value in a sorted dataset; when arranged in ascending order (small to large). The median is more robust to outliers than the mean and provides a better measure of central tendency for skewed data.

For a given data set: 12, 14, 11, 12, 12, 12, 15, 17, 22, 15, 12

Ascending Order: 11, 12, 12, 12, 12, 12, 14, 15, 15, 17, 22

Thus, the middle number in the data set Median = 12

E.g. the median age of patients admitted to a hospital for treatment.

Mode

The most frequently occurring value in a dataset. It’s helpful when analyzing categorical data or when the data has multiple peaks. There may be no mode or more than one mode, and it does not always provide a clear measure of central tendency.

Example: The mode of the most common health condition in a rural community.

2. Measures of dispersion

Measures of dispersion (or variability) describe the spread or distribution of data around a central value (e.g., mean or median). They help to understand the degree of variability within a dataset, indicating how consistent or scattered the data points are. These include;

Prof. Procyon Mukherjee 4 年前

MBA & EMBA LOR Recommendation Examples

Leah Derus 6 个月前

Healthcare Data Analytics Market Is Likely to…

Nikansh Bhavsar 1 年前

Range

The difference between the maximum and minimum values.

The range is simple to calculate and gives a quick sense of the data spread. However it is sensitive to outliers, and provides no information on how data points are distributed between the extremes.

An example could be the range of systolic blood pressure levels among patients at a clinic (e.g., 140 mmHg - 100 mmHg = 40 mmHg)

Standard Deviation (SD)

This is referred to as the average distance of the individual observations from the mean. Standard deviation of the population is represented as "σ". Standard deviation of the sample is represented as "s".

Formula

Sx stands for standard deviation of the sample.
xi is the value of each variable in the data set.
x bar represents the mean.
n is the total sample size.
and Σ stands for summation i.e., the sum of “xi – x bar” for all values of x

The standard deviation is easy to interpret and is widely used. It indicates the typical distance of data points from the mean. However, like variance, it is sensitive to outliers

Variance

The variance indicates the square of standard deviation.

Formula

Frequencies and Percentages

When working with categorical data, frequencies (counts) and percentages provide simple yet informative insights. These are normally reported in a table.

Frequency distributions are useful for summarizing how often different values or categories occur in a dataset e.g. the number of individuals in different age groups participating in a smoking cessation program.
Percentages express these frequencies as a proportion of the whole.

Visualizing descriptive statistics

Data visualization is a powerful way to communicate the insights gained from descriptive statistics. Common visualizations in public health include:

Histograms are useful for displaying the distribution of continuous data.
Bar charts are often used to compare frequencies or percentages of different categories.
Box plots help visualize the distribution of data, including the median, quartiles, and potential outliers.

Practical applications of descriptive statistics in Public Health

In epidemiological studies, descriptive statistics are often the first step in understanding the spread of diseases, identifying patterns, and planning interventions.
Public health professionals frequently assess the needs of communities by summarizing data from surveys or health records. Average life expectancy, and the proportion of individuals with access to healthcare services are common metrics used to identify areas for improvement in the community.
Descriptive statistics can evaluate the success of public health programs. For instance, you might calculate the percentage of patients who completed a treatment program or the standard deviation of their recovery times to assess program consistency.

Descriptive statistics are the backbone of public health data analysis. They help professionals summarize, visualize, and interpret data, guiding decision-making and the design of interventions. Mastering these basic concepts enables public health professionals to better understand the populations they serve and make data-driven decisions that improve health outcomes.

Anne Mwende Ndolo

"Bsc, MPH | Experienced Epidemiologist & Monitoring and Evaluation Expert| Committed to Advancing Public Health"

1 个月

This is a very good summary. Very informative

Wangari Mutuku (MPh)

1 个月

Very informative

Sarim Ahmad

Business Development Manager @ GreenLife Pharma | Public Health Professional

1 个月

I found it a great post! Because in my point of view, understanding data is vital in public health, and you're highlighting descriptive statistics as a foundation amazingly. I think these concepts should be applied in real-world projects.

1 次回应

Dr. Innocent Maposa

Biostatistician | | Data Scientist|| Mathematical Modelling|| Epidemiologist (better biostatistics, better clinical research)

1 个月

Simple and very important concepts

1 次回应

Rashid Hamisi MPH, PhD Student

Public Health Practitioner | Health Policy and Systems Researcher (HPSR) | Health Systems Strengthening | Implementation Evaluation|

1 个月

Andrew Mhangira

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Understanding basic descriptive statistics for Public health professionals

Jesca Birungi

Biostatistician | helping healthcare professionals and scientists understand hidden insights in complex healthcare data | Open to PHD and research opportunities in Biostatistics

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

March 2024 news

"Mitigating Multicollinearity in Health Data: A Ridge Regression Analysis with Real-Time Data"

"Evergreen Field Career Insights: How to Become a Successful Healthcare Statistician"

Decoding Mortality Statistics with ChatGPT’s Advanced Analytics

Trends towards non-significance

Analyzing COVID-19's Impact in Alberta: A Comprehensive Data Analysis Using Python

A Deep Dive into Survival Analysis with Clinical SAS

SQL Cut Down the Cost in HealthCare: HealthCare Analysis Using SQL

The Power of Sample Size: How It Affects the Reliability of Your Results

Why is Data Integration Fundamental to Effective Population Health Management?

领英推荐

Understanding publication bias: Implications and solutions

2024年9月8日

Reproducibility and replicability in biomedical research: challenges and solutions

2024年9月6日

Time-to-Event analysis: beyond survival curves

2024年9月6日

When to Use the Accelerated Failure Time (AFT) Model in Survival Analysis

2024年9月4日

Why Complete Case Analysis May Not Be the Best Solution to missing data

2024年8月1日

Understanding Wide Confidence Intervals and Significant p-values in Research

2024年7月26日

Understanding the ROC Curve and AUC in Biostatistics

2024年7月10日

Understanding Competing Risks in Survival Analysis

2024年7月3日

Understanding the Cox Proportional Hazards Model

2024年6月12日

The Critical Role of Biostatisticians in Research: Why Early Involvement Matters

2024年5月26日

社区洞察

其他会员也浏览了

March 2024 news

"Mitigating Multicollinearity in Health Data: A Ridge Regression Analysis with Real-Time Data"

"Evergreen Field Career Insights: How to Become a Successful Healthcare Statistician"

Decoding Mortality Statistics with ChatGPT’s Advanced Analytics

Trends towards non-significance

Analyzing COVID-19's Impact in Alberta: A Comprehensive Data Analysis Using Python

A Deep Dive into Survival Analysis with Clinical SAS

SQL Cut Down the Cost in HealthCare: HealthCare Analysis Using SQL

The Power of Sample Size: How It Affects the Reliability of Your Results

Why is Data Integration Fundamental to Effective Population Health Management?