Chi-Squared Test for Pharmaceutical Industry
Maya, Data Scientist

Chi-Squared Test for Pharmaceutical Industry

In this comprehensive article, we will delve deep into the fascinating world of the Chi-Squared Distribution. We’ll explore its origins, properties, probability density function (PDF), cumulative distribution function (CDF), and its practical applications in statistical analysis.

One such practical application will take us into the realm of pharmaceutical research, where we’ll join Maya, a data scientist with a mission. Maya is part of a team planning to launch a new dietary supplement aimed at assisting individuals with pre-diabetes in maintaining balance and preventing diabetes. We’ll follow Maya’s journey as she employs the Chi-Squared Distribution to investigate whether there exists a correlation between the severity of pre-diabetes and the effectiveness of the product. Let’s dive in!

Chi-Squared Distribution

We denote the Chi-Squared distribution with the capital Greek letter χ2 followed by (k) depicting the degrees of freedom.

It’s strange but it was discovered twice! The first one was by Freidrich Robert Helmert a German mathematician in 1875 and the second time was by Karl Pearson an English mathematician and biostatistician in 1900.

Chi-Squared Distribution. Image source Dr. Walid Soula

The Chi Distribution χ2 is related to the standard normal distribution, if a random variable Z has the standard normal distribution, then Z2 has the χ2 distribution with one degree of freedom.

If Z1,Z2,….Zk are independent standard normal random variables, then those independent standard normal random variables squared have a χ2 distribution with k degrees of freedom

You can see in a way that the Chi-Squared Distribution is a normal distribution squared, the negative values of the standard normal distribution will be squared and moved to the right.

Chi-square Distribution. Image source: analystprep

The more DF you have the more the Chi-Square Distribution will look like Normal Distribution.

Chi-square Distribution and Normal distribution.Image source: YouTube @EquitableEquations

Parameters

  • E(X) = k (degree of freedom)
  • Var(X) = 2k
  • Mode = k-2 (as long as the degrees of freedom are at least 2 otherwise it will be 0)

Probability density function (PDF) of Chi-Squared Distribution

It describes the likelihood of obtaining a specific value from the distribution, the formula is as follows:

Probability density function (PDF) of Chi-Squared Distribution. Image source Dr. Walid Soula

  • 1 / (2^(k/2) * Γ(k/2)) is a normalization constant that ensures the total area under the PDF curve is equal to 1
  • 2^(k/2) represents 2 raised to the power of (k/2).
  • Γ(k/2) is the gamma function evaluated at k/2.
  • x is the random variable for which we want to calculate the PDF

Don’t worry you can calculate the PDF with software or using a programming language like Python! Let’s take an example of it :

Let’s say you are working on a research project for a Biology lab involving the measurement of the lifespans of a particular species of insects. You collect a sample of 20 insects and record their lifespans in days. Based on previous studies, you expect the lifespans to follow a chi-squared distribution with 10 degrees of freedom.

You want to calculate the probability of observing a lifespan of 10 days for a randomly selected insect from this species.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Define the degrees of freedom parameter
k = 10

# X-Value
x = 10

# Calculate the PDF of the chi-squared distribution
pdf = chi2.pdf(x, k)

# Print the value
pdf        

Easy right? The results would be 0.08773368488392541

If you want to do it by hand you just need to substitute “k” with 10 and “x” with 10

Cumulative distribution function (CDF) of Chi-Squared Distribution

Cumulative distribution function (CDF) of Chi-Squared Distribution.Image source Dr. Walid Soula

  • F(x; k) represents the cumulative probability up to x for a chi-squared random variable with k degrees of freedom.
  • Γ(a) is the gamma function. and Γ(a, x) is the lower incomplete gamma function, which is defined as an integral from zero to a variable upper limit.

Python will make your daily work easy!

from scipy.stats import chi2

# Define the degrees of freedom parameter
k = 10

# X-Value
x = 10

# Calculate the CDF of the chi-squared distribution
cdf = chi2.cdf(x, k)

# Print the value
cdf        

Significance testing for variance

It is a statistical test that assesses whether the variance of a sample is significantly different from a specific value or another sample. It is commonly used when working with continuous data.

The hypothesis should be formulated like this :

  • Null hypothesis (H0): The population variance is equal to a specific value (σ2).
  • Alternative hypothesis (Ha): The population variance is greater than or less than σ2

Significance testing for variance formula

Significance testing for variance. Dr. Walid Soula

  • S is the variance of a sample with size n.
  • χ2 is the test statistic that follows a chi-square distribution with (n — 1) degrees of freedom.
  • n is the sample size.
  • σ2 is the specific value of the population variance under the null hypothesis.

Let’s take an example to facilitate understanding

A pharmaceutical company found that the historical standard deviation (σ) for drug delivery to wholesalers is 4 minutes. When implementing a new process, the development team started the new delivery process on 26 wholesalers and they reached a standard deviation (s) of 3 minutes.

Should the management adopt the new process? with α = 5%.

1/ Writing the hypotheses

  • Null hypothesis (H0): σ = 4
  • Alternative hypothesis (Ha): σ <4

2/ Calculate χ2

By substituting the values we get :

  • χ2 = 14.0625

3/ Let’s look at the table now to find the critical value

Chi_sq_table.Image source Dr Walid Soula


4/ Result Interpretation

As the calculated value is less than the critical value we will reject the null hypothesis, so the process really works.

Chi-Square Test (Goodness-of-fit testing)

It’s a hypothesis test that is used when you want to determine whether there is a relationship between two categorical variables.

Categorical variables (also called qualitative variables) can be either ordinal (the categories can be ranked from high to low) or nominal (the categories cannot be ranked from high to low), for example; gender variables, College major, and so on.

Let’s take an example to understand

Maya, Data Scientist


Consider the scenario of Maya, a data scientist in the pharmaceutical industry. Her company is planning to introduce a new product to the market, a dietary supplement that helps people with pre-diabetes to balance themselves and avoid becoming diabetic.

Maya’s objective is to investigate whether there exists a correlation between the severity of pre-diabetes and the effectiveness of the product.

To enhance the product’s relevance to healthcare professionals and its market potential, the company has initiated a clinical study involving 300 participants. These participants are stratified into three distinct severity levels based on their likelihood of transitioning to type 2 diabetes.

As part of the study, the participants have been randomly allocated to one of two groups: Group A will be administered the dietary supplement, while Group B will receive a placebo.

Maya, Data Scientist

Note: Maya aims for a confidence rate of 95%

Group A and B of the study. Image source Dr. Walid Soula

1/ First thing to do is to set up a Hypothesis:

  • H0: There is no association between severity level (as it goes higher) and treatment effectiveness.
  • Ha: There is an association between severity level (as it goes higher) and treatment effectiveness.

2/ Maya needs to calculate the expected frequency, the formula is as follows

Expected frequency formula. Image source Dr. Walid Soula

  • E is the expected frequency
  • RT is the total row values
  • CT is the column's total value
  • N is the number of observations

The first one would be equal to :

E = ((40+60) * 90)/(90+210) and it is equal to 30, same for all other columns.

Expected and Observed frequency. Image source Dr. Walid Soula

3/ Calculate the Chi-squared test statistic :

The formula for the Chi-squared test statistic is as follows :

Chi-Squared test. Image source Dr. Walid Soula

  • χ2 represents the chi-squared test statistic.
  • O refers to the observed frequency.
  • E represents the expected frequency.

Let’s start the calculation

  • χ2 = ((40–30)2 / 30) = 3.33
  • χ2 = ((60–70)2 / 70) = 1.43
  • χ2 = ((30–30)2 / 30) = 0
  • χ2 = ((70–70)2 / 70) = 0
  • χ2 = ((20–30)2 / 30) = 3.33
  • χ2 = ((80–70)2 / 70) = 1.43

Sum everything

  • χ2 = 3.33+ 1.43 + 0 + 0 + 3.33 + 1.43 = 9.52

4/ Determine the degrees of freedom (df):

When conducting a Chi-Squared Test, determining the degrees of freedom (df) is crucial. Degrees of freedom represent the number of independent pieces of information available for estimating the distribution parameters. You can calculate the degrees of freedom using the following formula:

Degrees of freedom in CST. Image source Dr. Walid Soula

  • df represents the degrees of freedom.
  • R is the number of rows
  • C is the number of columns

df = (3–1) * (2–1) and it is equal to 2.

5/ Find the critical value for α = 0.05 and df = 2

Table of the chi-squared distribution.image source Dr. Walid Soula

You can also use a calculator, in our case the critical value is 5.991

6/ Comparing the results :

We will compare the calculated chi-squared value (9.52) to the critical value (5.99)

7/ Interpretation: If the calculated chi-squared value is greater than or equal to the critical value, we reject the null hypothesis (it’s in the rejection region).

In our example, since Maya found the chi-squared value (9.52; in the rejection region for the test) is greater than the critical value (5.99), we reject the null hypothesis. This suggests that there is significant evidence of an association between severity level and treatment effectiveness in the pharmaceutical trial.

This information is valuable for the pharmaceutical industry as it helps in making decisions regarding the development, marketing, and usage of that food supplements.

Maya, Data Scientist

As I conclude this article, I trust you found the content engaging. In summary, the Chi-square test offers a valuable tool for comparing two categorical variables. Our exploration delved into a practical example within the pharmaceutical industry with Maya, specifically focusing on dietary supplements. Your thoughts and feedback on this topic are greatly appreciated!


If you found this helpful, consider Resharing ?? and follow me Dr. Oualid Soula for more content like this.

Join the journey of discovery and stay ahead in the world of data science and AI! Don't miss out on the latest insights and updates - subscribe to the newsletter for free ????https://lnkd.in/eNBG5dWm , and become part of our growing community!

Mo Amouri

Keeping Organizations and Communities Safe | Empowering Safety & Security | Mass Notification | Emergency Response | Threat Intelligence??

1 年

The impact AI and Data Science are making in the pharmaceutical industry is really exciting to see ! Looking forward to learning more about this innovative dietary supplement.

Manel Boualam

Linguist & Researcher at Yobi- Unlocking the Power of AI for Education

1 年

I believe this is indeed promising news for millions around the globe. A big shoutout to all the teams behind these breakthroughs!???? Thanks for sharing insightful content,Dr. Oualid S.! ????

Sallah Khan

Seasoned Business Development Director | AI Product Strategist | Strategic Marketing Expert | Senior Business Strategy Manager | President of Capital K-9 Association

1 年

Exciting to see the convergence of AI, data science, and pharmaceuticals for impactful solutions in healthcare! Looking forward to learning more about this innovative dietary supplement.

Meridja Hadj

Product Manager and User Experience designer

1 年

Excited to learn about the new dietary supplement that applies AI and data science to tackle pre-diabetes. Let's see how it works!

要查看或添加评论,请登录

Dr. Oualid S.的更多文章

  • Herfindahl-Hirschman Index (HHI)

    Herfindahl-Hirschman Index (HHI)

    In this article, I will discuss a key metric in market research known as the Herfindahl-Hirschman Index (HHI), which is…

  • Evaluating a company’s portfolio with the MABA Analysis

    Evaluating a company’s portfolio with the MABA Analysis

    In this article, we will cover another tool that can be used in international marketing called MABA Analysis. This tool…

  • 7S McKinsey Model for Internal Analysis

    7S McKinsey Model for Internal Analysis

    It's been quite a while since I wrote an article on business strategies, so I thought I'd kick off this week by…

    2 条评论
  • Step by Step guide A/B for UX (Binary Data)

    Step by Step guide A/B for UX (Binary Data)

    In the last article I covered how to execute a hypothesis test illustrated by a UX research design where we compared…

  • Retail Analytics project

    Retail Analytics project

    This article is an introduction to the world of machine learning, for anyone wanting to participate in small-scale…

  • From Sci-Fi to Reality | Exploring the root of AI

    From Sci-Fi to Reality | Exploring the root of AI

    For people who have not jumped into AI or are just hooked on generative AI and want to understand how things work?…

  • Apache Airflow Building End To End ETL Project

    Apache Airflow Building End To End ETL Project

    In that article I will cover the essential that you need to know about Airflow, if you don’t know what it is, I wrote…

  • Diving Deep into Significance Analysis

    Diving Deep into Significance Analysis

    In the constantly changing landscape of scientific research, the pursuit of significance extends well beyond the usual…

  • Volcano Plots

    Volcano Plots

    In this article, I will cover a well-known plot used mainly in genomics called the volcano plot. It is used to…

  • Simpson’s Paradox

    Simpson’s Paradox

    In this article, I will cover a well-known statistical phenomenon that you may have heard of before called ‘Simpson’s…

社区洞察

其他会员也浏览了