登录查看更多内容

Chi-Squared Test for Pharmaceutical Industry

Dr. Oualid S.

AI & Marketing Expert | Bridging Business and Science

发布日期: 2023年11月29日

In this comprehensive article, we will delve deep into the fascinating world of the Chi-Squared Distribution. We’ll explore its origins, properties, probability density function (PDF), cumulative distribution function (CDF), and its practical applications in statistical analysis.

One such practical application will take us into the realm of pharmaceutical research, where we’ll join Maya, a data scientist with a mission. Maya is part of a team planning to launch a new dietary supplement aimed at assisting individuals with pre-diabetes in maintaining balance and preventing diabetes. We’ll follow Maya’s journey as she employs the Chi-Squared Distribution to investigate whether there exists a correlation between the severity of pre-diabetes and the effectiveness of the product. Let’s dive in!

Chi-Squared Distribution

We denote the Chi-Squared distribution with the capital Greek letter χ2 followed by (k) depicting the degrees of freedom.

It’s strange but it was discovered twice! The first one was by Freidrich Robert Helmert a German mathematician in 1875 and the second time was by Karl Pearson an English mathematician and biostatistician in 1900.

Chi-Squared Distribution. Image source Dr. Walid Soula

The Chi Distribution χ2 is related to the standard normal distribution, if a random variable Z has the standard normal distribution, then Z2 has the χ2 distribution with one degree of freedom.

If Z1,Z2,….Zk are independent standard normal random variables, then those independent standard normal random variables squared have a χ2 distribution with k degrees of freedom

You can see in a way that the Chi-Squared Distribution is a normal distribution squared, the negative values of the standard normal distribution will be squared and moved to the right.

Chi-square Distribution. Image source: analystprep

The more DF you have the more the Chi-Square Distribution will look like Normal Distribution.

Chi-square Distribution and Normal distribution.Image source: YouTube @EquitableEquations

Parameters

E(X) = k (degree of freedom)
Var(X) = 2k
Mode = k-2 (as long as the degrees of freedom are at least 2 otherwise it will be 0)

Probability density function (PDF) of Chi-Squared Distribution

It describes the likelihood of obtaining a specific value from the distribution, the formula is as follows:

1 / (2^(k/2) * Γ(k/2)) is a normalization constant that ensures the total area under the PDF curve is equal to 1
2^(k/2) represents 2 raised to the power of (k/2).
Γ(k/2) is the gamma function evaluated at k/2.
x is the random variable for which we want to calculate the PDF

Don’t worry you can calculate the PDF with software or using a programming language like Python! Let’s take an example of it :

Let’s say you are working on a research project for a Biology lab involving the measurement of the lifespans of a particular species of insects. You collect a sample of 20 insects and record their lifespans in days. Based on previous studies, you expect the lifespans to follow a chi-squared distribution with 10 degrees of freedom.

You want to calculate the probability of observing a lifespan of 10 days for a randomly selected insect from this species.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Define the degrees of freedom parameter
k = 10

# X-Value
x = 10

# Calculate the PDF of the chi-squared distribution
pdf = chi2.pdf(x, k)

# Print the value
pdf

Easy right? The results would be 0.08773368488392541

If you want to do it by hand you just need to substitute “k” with 10 and “x” with 10

Cumulative distribution function (CDF) of Chi-Squared Distribution

F(x; k) represents the cumulative probability up to x for a chi-squared random variable with k degrees of freedom.
Γ(a) is the gamma function. and Γ(a, x) is the lower incomplete gamma function, which is defined as an integral from zero to a variable upper limit.

Python will make your daily work easy!

from scipy.stats import chi2

# Define the degrees of freedom parameter
k = 10

# X-Value
x = 10

# Calculate the CDF of the chi-squared distribution
cdf = chi2.cdf(x, k)

# Print the value
cdf

Significance testing for variance

It is a statistical test that assesses whether the variance of a sample is significantly different from a specific value or another sample. It is commonly used when working with continuous data.

The hypothesis should be formulated like this :

Null hypothesis (H0): The population variance is equal to a specific value (σ2).
Alternative hypothesis (Ha): The population variance is greater than or less than σ2

Significance testing for variance formula

Significance testing for variance. Dr. Walid Soula

S is the variance of a sample with size n.
χ2 is the test statistic that follows a chi-square distribution with (n — 1) degrees of freedom.
n is the sample size.
σ2 is the specific value of the population variance under the null hypothesis.

Let’s take an example to facilitate understanding

A pharmaceutical company found that the historical standard deviation (σ) for drug delivery to wholesalers is 4 minutes. When implementing a new process, the development team started the new delivery process on 26 wholesalers and they reached a standard deviation (s) of 3 minutes.

Should the management adopt the new process? with α = 5%.

1/ Writing the hypotheses

Null hypothesis (H0): σ = 4
Alternative hypothesis (Ha): σ <4

2/ Calculate χ2

By substituting the values we get :

χ2 = 14.0625

3/ Let’s look at the table now to find the critical value

Chi_sq_table.Image source Dr Walid Soula

领英推荐

Pharma Grade Sodium Bicarbonate Market: Global Trends,…

24ChemicalResearch 1 个月前

Evonik starts operation of new spray drying facility…

PharmaExcipients.com 5 个月前

Fish-Oil Based Drug Market to See Massive Growth by…

HTF Market Intelligence Consulting Pvt Ltd 6 个月前

4/ Result Interpretation

As the calculated value is less than the critical value we will reject the null hypothesis, so the process really works.

Chi-Square Test (Goodness-of-fit testing)

It’s a hypothesis test that is used when you want to determine whether there is a relationship between two categorical variables.

Categorical variables (also called qualitative variables) can be either ordinal (the categories can be ranked from high to low) or nominal (the categories cannot be ranked from high to low), for example; gender variables, College major, and so on.

Let’s take an example to understand

Consider the scenario of Maya, a data scientist in the pharmaceutical industry. Her company is planning to introduce a new product to the market, a dietary supplement that helps people with pre-diabetes to balance themselves and avoid becoming diabetic.

Maya’s objective is to investigate whether there exists a correlation between the severity of pre-diabetes and the effectiveness of the product.

To enhance the product’s relevance to healthcare professionals and its market potential, the company has initiated a clinical study involving 300 participants. These participants are stratified into three distinct severity levels based on their likelihood of transitioning to type 2 diabetes.

As part of the study, the participants have been randomly allocated to one of two groups: Group A will be administered the dietary supplement, while Group B will receive a placebo.

Note: Maya aims for a confidence rate of 95%

Group A and B of the study. Image source Dr. Walid Soula

1/ First thing to do is to set up a Hypothesis:

H0: There is no association between severity level (as it goes higher) and treatment effectiveness.
Ha: There is an association between severity level (as it goes higher) and treatment effectiveness.

2/ Maya needs to calculate the expected frequency, the formula is as follows

Expected frequency formula. Image source Dr. Walid Soula

E is the expected frequency
RT is the total row values
CT is the column's total value
N is the number of observations

The first one would be equal to :

E = ((40+60) * 90)/(90+210) and it is equal to 30, same for all other columns.

Expected and Observed frequency. Image source Dr. Walid Soula

3/ Calculate the Chi-squared test statistic :

The formula for the Chi-squared test statistic is as follows :

Chi-Squared test. Image source Dr. Walid Soula

χ2 represents the chi-squared test statistic.
O refers to the observed frequency.
E represents the expected frequency.

Let’s start the calculation

χ2 = ((40–30)2 / 30) = 3.33
χ2 = ((60–70)2 / 70) = 1.43
χ2 = ((30–30)2 / 30) = 0
χ2 = ((70–70)2 / 70) = 0
χ2 = ((20–30)2 / 30) = 3.33
χ2 = ((80–70)2 / 70) = 1.43

Sum everything

χ2 = 3.33+ 1.43 + 0 + 0 + 3.33 + 1.43 = 9.52

4/ Determine the degrees of freedom (df):

When conducting a Chi-Squared Test, determining the degrees of freedom (df) is crucial. Degrees of freedom represent the number of independent pieces of information available for estimating the distribution parameters. You can calculate the degrees of freedom using the following formula:

Degrees of freedom in CST. Image source Dr. Walid Soula

df represents the degrees of freedom.
R is the number of rows
C is the number of columns

df = (3–1) * (2–1) and it is equal to 2.

5/ Find the critical value for α = 0.05 and df = 2

Table of the chi-squared distribution.image source Dr. Walid Soula

You can also use a calculator, in our case the critical value is 5.991

6/ Comparing the results :

We will compare the calculated chi-squared value (9.52) to the critical value (5.99)

7/ Interpretation: If the calculated chi-squared value is greater than or equal to the critical value, we reject the null hypothesis (it’s in the rejection region).

In our example, since Maya found the chi-squared value (9.52; in the rejection region for the test) is greater than the critical value (5.99), we reject the null hypothesis. This suggests that there is significant evidence of an association between severity level and treatment effectiveness in the pharmaceutical trial.

This information is valuable for the pharmaceutical industry as it helps in making decisions regarding the development, marketing, and usage of that food supplements.

As I conclude this article, I trust you found the content engaging. In summary, the Chi-square test offers a valuable tool for comparing two categorical variables. Our exploration delved into a practical example within the pharmaceutical industry with Maya, specifically focusing on dietary supplements. Your thoughts and feedback on this topic are greatly appreciated!

If you found this helpful, consider Resharing ?? and follow me Dr. Oualid Soula for more content like this.

Join the journey of discovery and stay ahead in the world of data science and AI! Don't miss out on the latest insights and updates - subscribe to the newsletter for free ????https://lnkd.in/eNBG5dWm , and become part of our growing community!

Mo Amouri

Keeping Organizations and Communities Safe | Empowering Safety & Security | Mass Notification | Emergency Response | Threat Intelligence??

1 年

The impact AI and Data Science are making in the pharmaceutical industry is really exciting to see ! Looking forward to learning more about this innovative dietary supplement.

1 次回应

Manel Boualam

Linguist & Researcher at Yobi- Unlocking the Power of AI for Education

1 年

I believe this is indeed promising news for millions around the globe. A big shoutout to all the teams behind these breakthroughs!???? Thanks for sharing insightful content,Dr. Oualid S.! ????

1 次回应

Sallah Khan

Seasoned Business Development Director | AI Product Strategist | Strategic Marketing Expert | Senior Business Strategy Manager | President of Capital K-9 Association

1 年

Exciting to see the convergence of AI, data science, and pharmaceuticals for impactful solutions in healthcare! Looking forward to learning more about this innovative dietary supplement.

1 次回应

Meridja Hadj

Product Manager and User Experience designer

1 年

Excited to learn about the new dietary supplement that applies AI and data science to tackle pre-diabetes. Let's see how it works!

1 次回应

查看更多评论

要查看或添加评论，请登录

Dr. Oualid S.的更多文章

Herfindahl-Hirschman Index (HHI)

2025年2月28日

Herfindahl-Hirschman Index (HHI)

In this article, I will discuss a key metric in market research known as the Herfindahl-Hirschman Index (HHI), which is…
Evaluating a company’s portfolio with the MABA Analysis

2025年2月21日

Evaluating a company’s portfolio with the MABA Analysis

In this article, we will cover another tool that can be used in international marketing called MABA Analysis. This tool…
7S McKinsey Model for Internal Analysis

2025年2月14日

7S McKinsey Model for Internal Analysis

It's been quite a while since I wrote an article on business strategies, so I thought I'd kick off this week by…

2 条评论
Step by Step guide A/B for UX (Binary Data)

2025年2月7日

Step by Step guide A/B for UX (Binary Data)

In the last article I covered how to execute a hypothesis test illustrated by a UX research design where we compared…
Retail Analytics project

2025年1月31日

Retail Analytics project

This article is an introduction to the world of machine learning, for anyone wanting to participate in small-scale…
From Sci-Fi to Reality | Exploring the root of AI

2025年1月24日

From Sci-Fi to Reality | Exploring the root of AI

For people who have not jumped into AI or are just hooked on generative AI and want to understand how things work?…
Apache Airflow Building End To End ETL Project

2025年1月17日

Apache Airflow Building End To End ETL Project

In that article I will cover the essential that you need to know about Airflow, if you don’t know what it is, I wrote…
Diving Deep into Significance Analysis

2025年1月10日

Diving Deep into Significance Analysis

In the constantly changing landscape of scientific research, the pursuit of significance extends well beyond the usual…
Volcano Plots

2025年1月3日

Volcano Plots

In this article, I will cover a well-known plot used mainly in genomics called the volcano plot. It is used to…
Simpson’s Paradox

2024年12月27日

Simpson’s Paradox

In this article, I will cover a well-known statistical phenomenon that you may have heard of before called ‘Simpson’s…

See all articles

Chi-Squared Test for Pharmaceutical Industry

Dr. Oualid S.

AI & Marketing Expert | Bridging Business and Science

Chi-Squared Distribution

Probability density function (PDF) of Chi-Squared Distribution

Cumulative distribution function (CDF) of Chi-Squared Distribution

Significance testing for variance

Significance testing for variance formula

领英推荐

Chi-Square Test (Goodness-of-fit testing)

Dr. Oualid S.的更多文章

社区洞察

其他会员也浏览了

GES Newsletter June 2023

Sustained Release Coatings Market to Reach USD 770 Million & Growing at a CAGR of 6.9% (2022-2028).

Top 10 Pharma Companies in Delhi Shortlisted by PharmaHopers

Trisodium Citrate Dihydrate as API

Emcure Pharma IPO

Comparing Superdisintegrants: Croscarmellose Sodium vs. Sodium Starch Glycolate

From capsule to tablet in six weeks

Meet the pharma company with a build-as-you go approach

Planning for resilience in pharmaceutical supply chains with natural bioactive compounds

MOON SHOT THINKING

Chi-Squared Distribution

Probability density function (PDF) of Chi-Squared Distribution

Cumulative distribution function (CDF) of Chi-Squared Distribution

Significance testing for variance

Significance testing for variance formula

领英推荐

Chi-Square Test (Goodness-of-fit testing)

Dr. Oualid S.的更多文章

Herfindahl-Hirschman Index (HHI)

Evaluating a company’s portfolio with the MABA Analysis

7S McKinsey Model for Internal Analysis

Step by Step guide A/B for UX (Binary Data)

Retail Analytics project

From Sci-Fi to Reality | Exploring the root of AI

Apache Airflow Building End To End ETL Project

Diving Deep into Significance Analysis

Volcano Plots

Simpson’s Paradox

社区洞察

其他会员也浏览了

GES Newsletter June 2023

Sustained Release Coatings Market to Reach USD 770 Million & Growing at a CAGR of 6.9% (2022-2028).

Top 10 Pharma Companies in Delhi Shortlisted by PharmaHopers

Trisodium Citrate Dihydrate as API

Emcure Pharma IPO

Comparing Superdisintegrants: Croscarmellose Sodium vs. Sodium Starch Glycolate

From capsule to tablet in six weeks

Meet the pharma company with a build-as-you go approach

Planning for resilience in pharmaceutical supply chains with natural bioactive compounds

MOON SHOT THINKING