登录查看更多内容

Why use t-stat?

Sai Krishna Dammalapati

Civic Technology | Statistics | Data | Science

发布日期: 2024年2月15日

Hope you’ve read my article on Confidence Intervals. If not, better to read that before proceeding.

When you are calculating confidence intervals, you have two choices — either to use z-stat or a t-stat. Most often, because you don’t know the standard deviation of the population, you’ll approximate it with the standard deviation of the sample (S). Then you’ll be asked to use t-stat. But why?

The calculation of confidence interval lies on the assumption that the sampling distribution is in normal distribution.

But what if it is not?

In normal distribution, 95% of the values are within 2 standard deviation. Extreme values have very less chance of occurring. Can we say the same thing about other distributions? No, there is uncertainty. This uncertainty is even more if the sample size is small (< 30)

To account for this uncertainty, we assume a t-distribution instead. We are now telling that the extreme values have more chance of occurring when compared to the normal distribution.

As a result when you calculate Confidence intervals using t-stat, the band will be wider. Because at same confidence level, t-distribution covers more on x-axis.

But this divergence between t-distribution and normal distribution reduced with increasing sample size

https://byjus.com/question-answer/what-happens-to-the-t-distribution-as-the-sample-size-increases/

So when n>30, you can calculate confidence intervals using z-stat only.

You might ask, in the world of big data, when do we even have a sample size of < 30? Why even learn about t-distribution?

A lot of ML applications assume t-distributions in their algorithms. Example, t-SNE Dimensionality Reduction. So, there are applications beyond simple inferences.

要查看或添加评论，请登录

Sai Krishna Dammalapati的更多文章

LogProbs

2025年3月21日

LogProbs

LogProbs is one of the basic skills for a prompt engineer to have. Some background before implementing it: An LLM model…
When to brush your teeth? A good ANOVA study!

2025年1月10日

When to brush your teeth? A good ANOVA study!

I found this paper which did a simple ANOVA study to find out when should one brush their teeth! TL;DR Brush twice a…
Statistical issues in this paper studying relation between air quality and LULC

2024年12月24日

Statistical issues in this paper studying relation between air quality and LULC

A paper got published in Environmental Monitoring and Assessment. It studied relation between land-use classes (Urban…
Bayesian probabilistic forecasts using categorical information | Part 1

2024年12月13日

Bayesian probabilistic forecasts using categorical information | Part 1

In this blog, I will make Bayesian forecasts of Ozone concentrations. My previous blog on Bayesian analysis: Bayesian…
100% Mediation in Action

2024年12月5日

100% Mediation in Action

I wrote about Mediators in the previous article. This is a follow-up to it.
Mediators

2024年12月2日

Mediators

I one of my previous blogs, we saw Omitted Variable Bias. In this blog, we’ll do mediation analysis using the same…
Visualize Collider Bias with me

2024年11月30日

Visualize Collider Bias with me

It’s 2020. You are a doctor.
A Statistician counts well

2024年11月27日

A Statistician counts well

I’ve come across an article Counting as Statistics in Saket Choudhary's blog. The blog has a story on how statisticians…
Omitted Variable Bias (OVB)

2024年11月23日

Omitted Variable Bias (OVB)

You performed a regression between house prices and area and obtained a coefficient (β) for area. You’d interpret it…
Clarifications into Regression Discontinuity Design (RDD)

2024年11月19日

Clarifications into Regression Discontinuity Design (RDD)

I came across one RDD study last week where observational data was used to find the causal link between air pollution…

See all articles

Why use t-stat?

Sai Krishna Dammalapati

Civic Technology | Statistics | Data | Science

Sai Krishna Dammalapati的更多文章

社区洞察

其他会员也浏览了

Floating Point Types in C++: Understanding Precision and Usage

Credit Card Operators Data Analysis with?R

Differences Between DBSCAN and RANSAC

Outlier Removal and Real-World Applications ????

Is it wise to use CART technique when the dependent variable is skewed towards one of the class?

Elastic Net Regularization

Pathways to folly

Accurate Data Means Everything

How can you handle missing values in a time-series dataset?

"Application of Chebyshev's Theorem"

Sai Krishna Dammalapati的更多文章

LogProbs

When to brush your teeth? A good ANOVA study!

Statistical issues in this paper studying relation between air quality and LULC

Bayesian probabilistic forecasts using categorical information | Part 1

100% Mediation in Action

Mediators

Visualize Collider Bias with me

A Statistician counts well

Omitted Variable Bias (OVB)

Clarifications into Regression Discontinuity Design (RDD)

社区洞察

其他会员也浏览了

Floating Point Types in C++: Understanding Precision and Usage

Credit Card Operators Data Analysis with?R

Differences Between DBSCAN and RANSAC

Outlier Removal and Real-World Applications ????

Is it wise to use CART technique when the dependent variable is skewed towards one of the class?

Elastic Net Regularization

Pathways to folly

Accurate Data Means Everything

How can you handle missing values in a time-series dataset?

"Application of Chebyshev's Theorem"