Statistics and Probability :-

Statistics and Probability :-

ANALYSING CATAGORICAL DATA:-

Categorical data analysis is the analysis of data where the response variable has been grouped into a set of mutually exclusive ordered (such as age group) or unordered (such as eye color) categories.

Summarizing quantitative data:-

No alt text provided for this image

population data set contains all members of a specified group (the entire list of possible data values).

Example: The population may be "ALL people living in the US.

sample data set contains a part, or a subset, of a population. The size of a sample is always less than the size of the population from which it is taken.

Example: The sample may be "SOME people living in the US."

Mean, median and mode

We’ll start by describing the mean, median and mode: Suppose you have the following numbers : 23,29,20,32,23,21,33,25

The mean of the data is the cumulative average of the numbers which is 23+29+20+32+23+21+33+25/8 = 25.75

Median always gets the middle number from our observation. In our observation we have 8 numbers, so the median will be the average of 4th and 5th numbers. Arranging the numbers in ascending order we have 20,21,23,23,25,29,32,33. Our median is 23+25/2=24.

Even though 24 is not present in our data set, it is our median.

Mode is the value that occurs the most of the time in the dataset, which here is 23.


Interquartile range:

It’s the measure of spread of the data points. We calculate the difference between the middle of the first half and the middle of the second half.

Suppose we have a dataset consisting of following numbers: 4,4,10,11,15,7,14,12,6

Arranging the values in the dataset ascending order: 4,4,6,7,10,11,12,14,15

Probability:- Whenever we’re unsure about the outcome of an event, we can talk about the probabilities of certain outcomes—how likely they are. The analysis of events governed by probability is called statistics.

Formula to Calculate Probability

The formula of the probability of an event is:

No alt text provided for this image


Or,

P(A) = n(A)/n(S)

Where,

  • P(A) is the probability of an event “A”
  • n(A) is the number of favourable outcomes
  • n(S) is the total number of events in the sample space

Note: Here, the favourable outcome means the outcome of interest.

Random variables:- A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment's outcomes. Random variables are often designated by letters and can be classified as discrete, which are variables that have specific values, or continuous, which are variables that can have any values within a continuous range.Random variables are often used in econometric or regression analysis to determine statistical relationships among one another.

KEY TAKEAWAYS

  • A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment's outcomes.
  • A random variable can be either discrete (having specific values) or continuous (any value in a continuous range).
  • The use of random variables is most common in probability and statistics, where they are used to quantify outcomes of random occurrences.
  • Risk analysts use random variables to estimate the probability of an adverse event occurring.
  • Significance tests (hypothesis testing):- Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it's thanks to a hypothesis test.

Steps in Testing for Statistical Significance

1) State the Research Hypothesis

2) State the Null Hypothesis

3) Select a probability of error level (alpha level)

4) Select and compute the test for statistical significance

5) Interpret the results

Inference for categorical data (chi-square tests) :- There are two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes:

  • A chi-square goodness of fit test determines if a sample data matches a population.
  • chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variable differ from each another.
  • very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship.
  • very large chi square test statistic means that the data does not fit very well. In other words, there isn’t a relationship.
  • A chi-square statistic is one way to show a relationship between two catagorical variable. In statistics, there are two types of variables: numerical and non numerical variables. The chi-squared statistic is a single number that tells you how much difference exists between your observed counts and the counts you would expect if there were no relationship at all in the population.
  • Advanced regression (inference and transforming):-We usually rely on statistical software to identify point estimates and standard errors for parameters of a regression line. After verifying conditions hold for fitting a line, we can use the methods learned earlier for the t-distribution to create confidence intervals for regression parameters or to evaluate hypothesis tests.

LINEAR VS NON LINEAR REGRESSION:

Many people think that the difference between linear and nonlinear regression is that linear regression involves lines and nonlinear regression involves curves. This is partly true, and if you want a loose definition for the difference, you can probably stop right there. However, linear equations can sometimes produce curves.In order to understand why, you need to take a look at the linear regression equation form.

No alt text provided for this image

Linear regression can, surprisingly, produce curves.

Nonlinear regression uses nonlinear regression equations, which take the form:

Y = f(X,β) + ε

X = a vector of predictors,

  • β = a vector of parameters,
  • f(-) = a known regression function,
  • ε = an ERROR term.

Combinations and Permutations

Another type of counting question is when you have a given number of objects, you want to choose some (or all) of them, and you want to know how many ways there are to do this. For example, a teacher with a class of 30 students wants 5 of them to do a presentation, and she wants to know how many ways this could happen. These types of questions have to do with combinations and permutations. The difference between combinations and permutations is whether or not the order you are choosing the objects matters.

  • A teacher choosing a group to make a presentation is a combination problem, because order does not matter.
  • A teacher choosing 1st-, 2nd-, and 3rd-place winners in a science fair is a permutation problem, because the order does matter. (1st place and 2nd place are different outcomes.)

Analysis of variance (ANOVA) :- We can use a statistical technique which can compare these three treatment samples and depict how different these samples are from one another. Such a technique, which compares the samples on the basis of their means, is called ANOVA. Analysis of variance (ANOVA) is a statistical technique that is used to check if the means of two or more groups are significantly different from each other. ANOVA checks the impact of one or more factors by comparing the means of different samples.

No alt text provided for this image

There are two main types: one-way and two-way. Two-way tests can be with or without replication.

  • One-way ANOVA between groups: used when you want to test two groups to see if there’s a difference between them.
  • Two way ANOVA without replication: used when you have one group and you’re double-testing that same group. For example, you’re testing one set of individuals before and after they take a medication to see if it works or not.
  • Two way ANOVA with replication: Two groups, and the members of those groups are doing more than one thing. For example, two groups of patients from different hospitals trying two different therapies.

One Way ANOVA

A one way ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means are unequal.

Examples of when to use a one way ANOVA

A one way ANOVA will tell you that at least two groups were different from each other. But it won’t tell you which groups were different. If your test returns a significant f-statistic, you may need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly which groups had a difference in means.

Limitations of the One Way ANOVA

A one way ANOVA will tell you that at least two groups were different from each other. But it won’t tell you which groups were different. If your test returns a significant f-statistic, you may need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly which groups had a difference in means.

Two Way ANOVA

A Two Way ANOVA is an extension of the One Way ANOVA. With a One Way, you have one independent variable affecting a dependent variable. With a Two Way ANOVA, there are two independents. Use a two way ANOVA when you have one measurement variable (i.e. a quantitative variable) and two nominal variables. In other words, if your experiment has a quantitative outcome and you have two categorical explanatory variables, a two way ANOVA is appropriate.

For example, you might want to find out if there is an interaction between income and gender for anxiety level at job interviews. The anxiety level is the outcome, or the variable that can be measured. Gender and Income are the two categorical variables. These categorical variables are also the independent variables, which are called factors in a Two Way ANOVA.

Know how sum of squares relate to Analysis of Variance.

Total Variation = Explained Variation + Unexplained Variation.





要查看或添加评论,请登录

Mohsin khan的更多文章

  • -:Computer Science for Business Professionals :-

    -:Computer Science for Business Professionals :-

    PROGRAMING LANGUAGES: At the end of the day programming that we all consider is all about creating softwares. There are…

  • :- linear algebra :-

    :- linear algebra :-

    :-Vectors and spaces:- A vector space V is a collection of objects with a (vector) addition and scalar multiplication…

  • introduction of sql

    introduction of sql

    SQL basics :- SQL was developed at IBM by Donald D. Chamberlin and Raymond F.

  • ABOUT EXCEL :-

    ABOUT EXCEL :-

    Learn to create well-designed graphs in Excel:- In addition to working with large volumes of data, finance and…

  • LEARNING OBJECT :-

    LEARNING OBJECT :-

    Different types of charts:- BAR CHART :-Use bar charts to compare data across categories. You create a bar chart by…

  • INTRODUCTION COMPUTER :-

    INTRODUCTION COMPUTER :-

    A computer system has three main components: hardware, software, and people. .

社区洞察

其他会员也浏览了