ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Statistics and Probability :-

Mohsin khan

DATA SCIENTIST DATA ANALYSIS |BUSINESS ANALYSIS

å‘å¸ƒæ—¥æœŸ: 2020å¹´11æœˆ2æ—¥

+ å…³æ³¨

ANALYSING CATAGORICAL DATA:-

Categorical data analysis is the analysis of data where the response variable has been grouped into a set of mutually exclusive ordered (such as age group) or unordered (such as eye color) categories.

Summarizing quantitative data:-

A population data set contains all members of a specified group (the entire list of possible data values).

Example: The population may be "ALL people living in the US.

A sample data set contains a part, or a subset, of a population. The size of a sample is always less than the size of the population from which it is taken.

Example: The sample may be "SOME people living in the US."

Mean, median and mode

Weâ€™ll start by describing the mean, median and mode: Suppose you have the following numbers : 23,29,20,32,23,21,33,25

The mean of the data is the cumulative average of the numbers which is 23+29+20+32+23+21+33+25/8 = 25.75

Median always gets the middle number from our observation. In our observation we have 8 numbers, so the median will be the average of 4th and 5th numbers. Arranging the numbers in ascending order we have 20,21,23,23,25,29,32,33. Our median is 23+25/2=24.

Even though 24 is not present in our data set, it is our median.

Mode is the value that occurs the most of the time in the dataset, which here is 23.

Interquartile range:

Itâ€™s the measure of spread of the data points. We calculate the difference between the middle of the first half and the middle of the second half.

Suppose we have a dataset consisting of following numbers: 4,4,10,11,15,7,14,12,6

Arranging the values in the dataset ascending order: 4,4,6,7,10,11,12,14,15

Probability:- Whenever weâ€™re unsure about the outcome of an event, we can talk about the probabilities of certain outcomesâ€”how likely they are. The analysis of events governed by probability is called statistics.

Formula to Calculate Probability

The formula of the probability of an event is:

Or,

P(A) = n(A)/n(S)

Where,

P(A) is the probability of an event â€œAâ€
n(A) is the number of favourable outcomes
n(S) is the total number of events in the sample space

Note: Here, the favourable outcome means the outcome of interest.

Random variables:- A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment's outcomes. Random variables are often designated by letters and can be classified as discrete, which are variables that have specific values, or continuous, which are variables that can have any values within a continuous range.Random variables are often used in econometric or regression analysis to determine statistical relationships among one another.

KEY TAKEAWAYS

A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment's outcomes.
A random variable can be either discrete (having specific values) or continuous (any value in a continuous range).
The use of random variables is most common in probability and statistics, where they are used to quantify outcomes of random occurrences.
Risk analysts use random variables to estimate the probability of an adverse event occurring.
Significance tests (hypothesis testing):- Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. When we say that a finding is statistically significant, it's thanks to a hypothesis test.

Steps in Testing for Statistical Significance

1) State the Research Hypothesis

2) State the Null Hypothesis

3) Select a probability of error level (alpha level)

4) Select and compute the test for statistical significance

5) Interpret the results

Inference for categorical data (chi-square tests) :- There are two types of chi-square tests. Both use the chi-square statistic and distribution for different purposes:

A chi-square goodness of fit test determines if a sample data matches a population.
A chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variable differ from each another.
A very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship.
A very large chi square test statistic means that the data does not fit very well. In other words, there isnâ€™t a relationship.
A chi-square statistic is one way to show a relationship between two catagorical variable. In statistics, there are two types of variables: numerical and non numerical variables. The chi-squared statistic is a single number that tells you how much difference exists between your observed counts and the counts you would expect if there were no relationship at all in the population.
Advanced regression (inference and transforming):-We usually rely on statistical software to identify point estimates and standard errors for parameters of a regression line. After verifying conditions hold for fitting a line, we can use the methods learned earlier for the t-distribution to create confidence intervals for regression parameters or to evaluate hypothesis tests.

LINEAR VS NON LINEAR REGRESSION:

Many people think that the difference between linear and nonlinear regression is that linear regression involves lines and nonlinear regression involves curves. This is partly true, and if you want a loose definition for the difference, you can probably stop right there. However, linear equations can sometimes produce curves.In order to understand why, you need to take a look at the linear regression equation form.

Linear regression can, surprisingly, produce curves.

Nonlinear regression uses nonlinear regression equations, which take the form:

Y = f(X,Î²) + Îµ

X = a vector of p predictors,

Î² = a vector of k parameters,
f(-) = a known regression function,
Îµ = an ERROR term.

Combinations and Permutations

Another type of counting question is when you have a given number of objects, you want to choose some (or all) of them, and you want to know how many ways there are to do this. For example, a teacher with a class of 30 students wants 5 of them to do a presentation, and she wants to know how many ways this could happen. These types of questions have to do with combinations and permutations. The difference between combinations and permutations is whether or not the order you are choosing the objects matters.

A teacher choosing a group to make a presentation is a combination problem, because order does not matter.
A teacher choosing 1st-, 2nd-, and 3rd-place winners in a science fair is a permutation problem, because the order does matter. (1st place and 2nd place are different outcomes.)

Analysis of variance (ANOVA) :- We can use a statistical technique which can compare these three treatment samples and depict how different these samples are from one another. Such a technique, which compares the samples on the basis of their means, is called ANOVA. Analysis of variance (ANOVA) is a statistical technique that is used to check if the means of two or more groups are significantly different from each other. ANOVA checks the impact of one or more factors by comparing the means of different samples.

There are two main types: one-way and two-way. Two-way tests can be with or without replication.

One-way ANOVA between groups: used when you want to test two groups to see if thereâ€™s a difference between them.
Two way ANOVA without replication: used when you have one group and youâ€™re double-testing that same group. For example, youâ€™re testing one set of individuals before and after they take a medication to see if it works or not.
Two way ANOVA with replication: Two groups, and the members of those groups are doing more than one thing. For example, two groups of patients from different hospitals trying two different therapies.

One Way ANOVA

A one way ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means are unequal.

Examples of when to use a one way ANOVA

A one way ANOVA will tell you that at least two groups were different from each other. But it wonâ€™t tell you which groups were different. If your test returns a significant f-statistic, you may need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly which groups had a difference in means.

Limitations of the One Way ANOVA

Two Way ANOVA

A Two Way ANOVA is an extension of the One Way ANOVA. With a One Way, you have one independent variable affecting a dependent variable. With a Two Way ANOVA, there are two independents. Use a two way ANOVA when you have one measurement variable (i.e. a quantitative variable) and two nominal variables. In other words, if your experiment has a quantitative outcome and you have two categorical explanatory variables, a two way ANOVA is appropriate.

For example, you might want to find out if there is an interaction between income and gender for anxiety level at job interviews. The anxiety level is the outcome, or the variable that can be measured. Gender and Income are the two categorical variables. These categorical variables are also the independent variables, which are called factors in a Two Way ANOVA.

Know how sum of squares relate to Analysis of Variance.

Total Variation = Explained Variation + Unexplained Variation.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Mohsin khançš„æ›´å¤šæ–‡ç«

-:Computer Science for Business Professionals :-

2020å¹´11æœˆ25æ—¥

-:Computer Science for Business Professionals :-

PROGRAMING LANGUAGES: At the end of the day programming that we all consider is all about creating softwares. There areâ€¦
:- linear algebra :-

2020å¹´11æœˆ25æ—¥

:- linear algebra :-

:-Vectors and spaces:- A vector space V is a collection of objects with a (vector) addition and scalar multiplicationâ€¦
introduction of sql

2020å¹´11æœˆ1æ—¥

introduction of sql

SQL basics :- SQL was developed at IBM by Donald D. Chamberlin and Raymond F.
ABOUT EXCEL :-

2020å¹´10æœˆ21æ—¥

ABOUT EXCEL :-

Learn to create well-designed graphs in Excel:- In addition to working with large volumes of data, finance andâ€¦
LEARNING OBJECT :-

2020å¹´10æœˆ21æ—¥

LEARNING OBJECT :-

Different types of charts:- BAR CHART :-Use bar charts to compare data across categories. You create a bar chart byâ€¦
INTRODUCTION COMPUTER :-

2020å¹´10æœˆ20æ—¥

INTRODUCTION COMPUTER :-

A computer system has three main components: hardware, software, and people. .

See all articles

Statistics and Probability :-

Mohsin khan

DATA SCIENTIST DATA ANALYSIS |BUSINESS ANALYSIS

Categorical data analysis is the analysis of data where the response variable has been grouped into a set of mutually exclusive ordered (such as age group) or unordered (such as eye color) categories.

Summarizing quantitative data:-

Mean, median and mode

Interquartile range:

Formula to Calculate Probability

KEY TAKEAWAYS

Steps in Testing for Statistical Significance

Combinations and Permutations

One Way ANOVA

Examples of when to use a one way ANOVA

Limitations of the One Way ANOVA

Two Way ANOVA

Mohsin khançš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Data Deluge

From Data to Decisions: The Role of EDA in Business Strategy

The What How & When in the life of a Histogram

Analysis of Complex Sample Survey Data Course

Data Classification and It's Types

How to perform descriptive statistical analytics

Chi-Square and Psi (Î¨) to Build Meaningful Insights from Base Data

Highlights of Essential Stats in the World of Data

Will the Curve ever flatten? Data Analysis of COVID-19

"Exploring Time Series Analysis: An Overview of Techniques and Methods"

Categorical data analysis is the analysis of data where the response variable has been grouped into a set of mutually exclusive ordered (such as age group) or unordered (such as eye color) categories.

Summarizing quantitative data:-

Mean, median and mode

Interquartile range:

Formula to Calculate Probability

KEY TAKEAWAYS

Steps in Testing for Statistical Significance

Combinations and Permutations

One Way ANOVA

Examples of when to use a one way ANOVA

Limitations of the One Way ANOVA

Two Way ANOVA

Mohsin khançš„æ›´å¤šæ–‡ç«

-:Computer Science for Business Professionals :-

:- linear algebra :-

introduction of sql

ABOUT EXCEL :-

LEARNING OBJECT :-

INTRODUCTION COMPUTER :-

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Data Deluge

From Data to Decisions: The Role of EDA in Business Strategy

The What How & When in the life of a Histogram

Analysis of Complex Sample Survey Data Course

Data Classification and It's Types

How to perform descriptive statistical analytics

Chi-Square and Psi (Î¨) to Build Meaningful Insights from Base Data

Highlights of Essential Stats in the World of Data

Will the Curve ever flatten? Data Analysis of COVID-19

"Exploring Time Series Analysis: An Overview of Techniques and Methods"

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†