Why use t-stat?

Why use t-stat?

Hope you’ve read my article on Confidence Intervals. If not, better to read that before proceeding.


When you are calculating confidence intervals, you have two choices — either to use z-stat or a t-stat. Most often, because you don’t know the standard deviation of the population, you’ll approximate it with the standard deviation of the sample (S). Then you’ll be asked to use t-stat. But why?


The calculation of confidence interval lies on the assumption that the sampling distribution is in normal distribution.

But what if it is not?

In normal distribution, 95% of the values are within 2 standard deviation. Extreme values have very less chance of occurring. Can we say the same thing about other distributions? No, there is uncertainty. This uncertainty is even more if the sample size is small (< 30)

To account for this uncertainty, we assume a t-distribution instead. We are now telling that the extreme values have more chance of occurring when compared to the normal distribution.

As a result when you calculate Confidence intervals using t-stat, the band will be wider. Because at same confidence level, t-distribution covers more on x-axis.

But this divergence between t-distribution and normal distribution reduced with increasing sample size

https://byjus.com/question-answer/what-happens-to-the-t-distribution-as-the-sample-size-increases/

So when n>30, you can calculate confidence intervals using z-stat only.

You might ask, in the world of big data, when do we even have a sample size of < 30? Why even learn about t-distribution?

A lot of ML applications assume t-distributions in their algorithms. Example, t-SNE Dimensionality Reduction. So, there are applications beyond simple inferences.

要查看或添加评论,请登录

Sai Krishna Dammalapati的更多文章

  • LogProbs

    LogProbs

    LogProbs is one of the basic skills for a prompt engineer to have. Some background before implementing it: An LLM model…

  • When to brush your teeth? A good ANOVA study!

    When to brush your teeth? A good ANOVA study!

    I found this paper which did a simple ANOVA study to find out when should one brush their teeth! TL;DR Brush twice a…

  • Statistical issues in this paper studying relation between air quality and LULC

    Statistical issues in this paper studying relation between air quality and LULC

    A paper got published in Environmental Monitoring and Assessment. It studied relation between land-use classes (Urban…

  • Bayesian probabilistic forecasts using categorical information | Part 1

    Bayesian probabilistic forecasts using categorical information | Part 1

    In this blog, I will make Bayesian forecasts of Ozone concentrations. My previous blog on Bayesian analysis: Bayesian…

  • 100% Mediation in Action

    100% Mediation in Action

    I wrote about Mediators in the previous article. This is a follow-up to it.

  • Mediators

    Mediators

    I one of my previous blogs, we saw Omitted Variable Bias. In this blog, we’ll do mediation analysis using the same…

  • Visualize Collider Bias with me

    Visualize Collider Bias with me

    It’s 2020. You are a doctor.

  • A Statistician counts well

    A Statistician counts well

    I’ve come across an article Counting as Statistics in Saket Choudhary's blog. The blog has a story on how statisticians…

  • Omitted Variable Bias (OVB)

    Omitted Variable Bias (OVB)

    You performed a regression between house prices and area and obtained a coefficient (β) for area. You’d interpret it…

  • Clarifications into Regression Discontinuity Design (RDD)

    Clarifications into Regression Discontinuity Design (RDD)

    I came across one RDD study last week where observational data was used to find the causal link between air pollution…

社区洞察

其他会员也浏览了