Chi-Squared Test for Pharmaceutical Industry
In this comprehensive article, we will delve deep into the fascinating world of the Chi-Squared Distribution. We’ll explore its origins, properties, probability density function (PDF), cumulative distribution function (CDF), and its practical applications in statistical analysis.
One such practical application will take us into the realm of pharmaceutical research, where we’ll join Maya, a data scientist with a mission. Maya is part of a team planning to launch a new dietary supplement aimed at assisting individuals with pre-diabetes in maintaining balance and preventing diabetes. We’ll follow Maya’s journey as she employs the Chi-Squared Distribution to investigate whether there exists a correlation between the severity of pre-diabetes and the effectiveness of the product. Let’s dive in!
Chi-Squared Distribution
We denote the Chi-Squared distribution with the capital Greek letter χ2 followed by (k) depicting the degrees of freedom.
It’s strange but it was discovered twice! The first one was by Freidrich Robert Helmert a German mathematician in 1875 and the second time was by Karl Pearson an English mathematician and biostatistician in 1900.
The Chi Distribution χ2 is related to the standard normal distribution, if a random variable Z has the standard normal distribution, then Z2 has the χ2 distribution with one degree of freedom.
If Z1,Z2,….Zk are independent standard normal random variables, then those independent standard normal random variables squared have a χ2 distribution with k degrees of freedom
You can see in a way that the Chi-Squared Distribution is a normal distribution squared, the negative values of the standard normal distribution will be squared and moved to the right.
The more DF you have the more the Chi-Square Distribution will look like Normal Distribution.
Parameters
Probability density function (PDF) of Chi-Squared Distribution
It describes the likelihood of obtaining a specific value from the distribution, the formula is as follows:
Don’t worry you can calculate the PDF with software or using a programming language like Python! Let’s take an example of it :
Let’s say you are working on a research project for a Biology lab involving the measurement of the lifespans of a particular species of insects. You collect a sample of 20 insects and record their lifespans in days. Based on previous studies, you expect the lifespans to follow a chi-squared distribution with 10 degrees of freedom.
You want to calculate the probability of observing a lifespan of 10 days for a randomly selected insect from this species.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2
# Define the degrees of freedom parameter
k = 10
# X-Value
x = 10
# Calculate the PDF of the chi-squared distribution
pdf = chi2.pdf(x, k)
# Print the value
pdf
Easy right? The results would be 0.08773368488392541
If you want to do it by hand you just need to substitute “k” with 10 and “x” with 10
Cumulative distribution function (CDF) of Chi-Squared Distribution
Python will make your daily work easy!
from scipy.stats import chi2
# Define the degrees of freedom parameter
k = 10
# X-Value
x = 10
# Calculate the CDF of the chi-squared distribution
cdf = chi2.cdf(x, k)
# Print the value
cdf
Significance testing for variance
It is a statistical test that assesses whether the variance of a sample is significantly different from a specific value or another sample. It is commonly used when working with continuous data.
The hypothesis should be formulated like this :
Significance testing for variance formula
Let’s take an example to facilitate understanding
A pharmaceutical company found that the historical standard deviation (σ) for drug delivery to wholesalers is 4 minutes. When implementing a new process, the development team started the new delivery process on 26 wholesalers and they reached a standard deviation (s) of 3 minutes.
Should the management adopt the new process? with α = 5%.
1/ Writing the hypotheses
2/ Calculate χ2
By substituting the values we get :
3/ Let’s look at the table now to find the critical value
领英推荐
4/ Result Interpretation
As the calculated value is less than the critical value we will reject the null hypothesis, so the process really works.
Chi-Square Test (Goodness-of-fit testing)
It’s a hypothesis test that is used when you want to determine whether there is a relationship between two categorical variables.
Categorical variables (also called qualitative variables) can be either ordinal (the categories can be ranked from high to low) or nominal (the categories cannot be ranked from high to low), for example; gender variables, College major, and so on.
Let’s take an example to understand
Consider the scenario of Maya, a data scientist in the pharmaceutical industry. Her company is planning to introduce a new product to the market, a dietary supplement that helps people with pre-diabetes to balance themselves and avoid becoming diabetic.
Maya’s objective is to investigate whether there exists a correlation between the severity of pre-diabetes and the effectiveness of the product.
To enhance the product’s relevance to healthcare professionals and its market potential, the company has initiated a clinical study involving 300 participants. These participants are stratified into three distinct severity levels based on their likelihood of transitioning to type 2 diabetes.
As part of the study, the participants have been randomly allocated to one of two groups: Group A will be administered the dietary supplement, while Group B will receive a placebo.
Note: Maya aims for a confidence rate of 95%
1/ First thing to do is to set up a Hypothesis:
2/ Maya needs to calculate the expected frequency, the formula is as follows
The first one would be equal to :
E = ((40+60) * 90)/(90+210) and it is equal to 30, same for all other columns.
3/ Calculate the Chi-squared test statistic :
The formula for the Chi-squared test statistic is as follows :
Let’s start the calculation
Sum everything
4/ Determine the degrees of freedom (df):
When conducting a Chi-Squared Test, determining the degrees of freedom (df) is crucial. Degrees of freedom represent the number of independent pieces of information available for estimating the distribution parameters. You can calculate the degrees of freedom using the following formula:
df = (3–1) * (2–1) and it is equal to 2.
5/ Find the critical value for α = 0.05 and df = 2
You can also use a calculator, in our case the critical value is 5.991
6/ Comparing the results :
We will compare the calculated chi-squared value (9.52) to the critical value (5.99)
7/ Interpretation: If the calculated chi-squared value is greater than or equal to the critical value, we reject the null hypothesis (it’s in the rejection region).
In our example, since Maya found the chi-squared value (9.52; in the rejection region for the test) is greater than the critical value (5.99), we reject the null hypothesis. This suggests that there is significant evidence of an association between severity level and treatment effectiveness in the pharmaceutical trial.
This information is valuable for the pharmaceutical industry as it helps in making decisions regarding the development, marketing, and usage of that food supplements.
As I conclude this article, I trust you found the content engaging. In summary, the Chi-square test offers a valuable tool for comparing two categorical variables. Our exploration delved into a practical example within the pharmaceutical industry with Maya, specifically focusing on dietary supplements. Your thoughts and feedback on this topic are greatly appreciated!
If you found this helpful, consider Resharing ?? and follow me Dr. Oualid Soula for more content like this.
Join the journey of discovery and stay ahead in the world of data science and AI! Don't miss out on the latest insights and updates - subscribe to the newsletter for free ????https://lnkd.in/eNBG5dWm , and become part of our growing community!
Keeping Organizations and Communities Safe | Empowering Safety & Security | Mass Notification | Emergency Response | Threat Intelligence??
1 年The impact AI and Data Science are making in the pharmaceutical industry is really exciting to see ! Looking forward to learning more about this innovative dietary supplement.
Linguist & Researcher at Yobi- Unlocking the Power of AI for Education
1 年I believe this is indeed promising news for millions around the globe. A big shoutout to all the teams behind these breakthroughs!???? Thanks for sharing insightful content,Dr. Oualid S.! ????
Seasoned Business Development Director | AI Product Strategist | Strategic Marketing Expert | Senior Business Strategy Manager | President of Capital K-9 Association
1 年Exciting to see the convergence of AI, data science, and pharmaceuticals for impactful solutions in healthcare! Looking forward to learning more about this innovative dietary supplement.
Product Manager and User Experience designer
1 年Excited to learn about the new dietary supplement that applies AI and data science to tackle pre-diabetes. Let's see how it works!