Basics of Statistics

Basics of Statistics

  • Statistics is a branch of science that deals with the collection, presentation, analysis, interpretation and making inferences (prediction) from masses of data.”
  • A single day of our life cannot pass without the help of information technology and statistics in this modern time.?

Population and Sample

  • Population is the entire group of individuals that we want information about.
  • A sample is a group of individuals that we examine to gather information from the population.

Population vs Sample

Surveys

There are two kinds of surveys which are census and sample surveys.?

  • A census deals with collecting information from all units of the population.?
  • But in a sample survey collects information from only a small portion of units from the population.


Parameter and Estimator

  • Parameter is a population characteristic

example?

? population average

population proportion

population total

  • Estimator is a formula by which an estimate of the parameter is calculated from the sample

Parameter vs Estimator

  • we can get an idea about the population means from the sample mean.
  • Therefore, sample mean () is an estimator or statistic of the population mean.

Individuals and Variables

  • individual is a person or object?

known as the experimental unit or sampling unit.

  • A variable is any characteristic of the individual. The variable can take on different values for different individuals.
  • We observe or measure the value of the variable of interest for each individual.?

Some examples of variables are:

1.? Today’s temperature. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

2.? The type of crime committed by people in jail.

3.? The number of daily accidents on a certain road. ? ?

4.? A person’s religion, gender, political affiliation, etc.

5.? Height, weight, Lipid profile, etc. of individuals in an institution.


Types of variables

Qualitative or categorical

  • Qualitative variable places an individual into one of several groups or categories.

?For example,

a person’s religion

gender

smoking status, etc.


Quantitative (Numerical)

  • The quantitative variable takes numerical values for which arithmetic operations make sense.?

For example,

weight

height

age, etc.



Scales of Measurement

  • Though there are four types of scales of measurement, in SPSS, these are classified into three types; namely nominal, ordinal, and scale where the last one comprises interval and ratio scales.
  • There are different ways for graphical representation of the data.

  1. Bar Diagram: Used to represent categorical data with rectangular bars. The length of each bar is proportional to the value it represents. Suitable for comparing different categories.
  2. Pie Chart: Circular chart divided into slices to illustrate numerical proportions. Each slice represents a category's contribution to the whole. Best for showing relative percentages.
  3. Histogram: Used to represent the distribution of numerical data. It divides the data into bins and displays the frequency of data points in each bin. Useful for understanding the distribution and spread of continuous data.
  4. Stemplot (Stem-and-Leaf Plot): Displays quantitative data by splitting each value into a "stem" and a "leaf." It retains the original data values while showing the shape of the distribution. Useful for small datasets.
  5. Boxplot (Box-and-Whisker Plot): Summarizes the distribution of a dataset by displaying its minimum, first quartile, median, third quartile, and maximum. Useful for identifying outliers and comparing distributions.
  6. Scatter Plot: Displays the relationship between two quantitative variables. Each point represents an observation. Useful for identifying correlations and patterns between variables.
  7. Q-Q Plot (Quantile-Quantile Plot): Compares the quantiles of a dataset to the quantiles of a theoretical distribution (e.g., normal distribution). Used to assess if a dataset follows a particular distribution.

Categorical variables - Bar Diagram & Pie Chart
Quantitative variables - Histograms
Quantitative variables - Stem Plots & Box Plots
Quantitative variables - Scatter Plot
Quantitative variables - Q-Q Plot


Bar Diagram vs Pie Chart vs Histogram


Stemplot vs Boxplot vs Scatter Plot vs Q-Q Plot


Examining the Distribution of Quantitative Data

If we would like to observe the distribution of quantitative data, we need to learn about:

●?The overall pattern of the graph? ? ? ? ?

●?Deviations from the overall pattern ? ?

●?Shape of the data

●?Center of the data ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

●?Spread of the data (Variation) ? ? ? ? ? ? ?

● Outliers

The shape of the data

Symmetric
Asymmetric
Bimodal


Outliners

Outliers are the values that fall outside the overall pattern of the dataset. Such values:

● May occur naturally?????????????????????????????????????????

● ? ? ? ? ? ? ? ? ? ? due to an error in the recording

● ? ? ? ? ? ? ? ? ? ? due to an error in measuring??????????

? ●? Observational unit may be fundamentally different


Observational Study vs Designed Experiment


Sampling Error vs Non-Sampling Error

  • Sampling Error: This type of error arises because the estimate is based on a sample rather than the entire population. It is the difference between the sample statistic and the actual population parameter. Sampling error can be reduced by increasing the sample size.
  • Non-Sampling Error: Encompasses all other types of errors that can affect the accuracy of survey results. These errors can occur during data collection, processing, or analysis and are not related to the act of sampling itself. Examples include measurement errors, data entry errors, nonresponse errors, and biases in survey design.

Sampling Error vs Non-Sampling Error


  • Both types of errors can significantly impact the accuracy and reliability of research findings, so understanding and addressing them is crucial in survey and experimental research.

Sampling Methods


Probability Sampling vs Non-Probability Sampling

  • Probability Sampling: A sampling method in which every member of the population has a known, non-zero chance of being selected. This method relies on random selection, ensuring that the sample is representative of the population, allowing for generalization of results.
  • Non-Probability Sampling: A sampling method where some members of the population have no chance of being selected. This method does not rely on random selection, making it less likely to produce a representative sample. It is often used when probability sampling is not feasible.

Probability Sampling vs Non-Probability Sampling


Both methods have their appropriate applications depending on the research objectives, available resources, and the need for precision and generalizability.

Probability Sampling Methods

  1. Simple Random Sampling (SRS): Every member of the population has an equal chance of being selected. Selection is done purely at random, ensuring that the sample is representative of the population.
  2. Systematic Sampling (SYS): Selects members from a larger population at a regular interval after a random starting point. This method is easier to implement than SRS while still providing a representative sample.
  3. Cluster Sampling: The population is divided into clusters, and a random sample of clusters is selected. All members of the selected clusters are then included in the sample. This method is cost-effective for geographically dispersed populations.
  4. Stratified Random Sampling (STR): The population is divided into strata (subgroups) based on a specific characteristic. A random sample is then taken from each stratum. This method ensures that all subgroups are adequately represented.
  5. Probability Proportional to Size (PPS) Sampling: The probability of selecting a unit is proportional to its size. Larger units have a higher chance of being included in the sample, which is useful in ensuring that larger entities are adequately represented.
  6. Double Sampling: Also known as two-phase sampling, it involves taking a preliminary sample to gather information, which is then used to inform a second, more detailed sampling phase. This method improves the efficiency and accuracy of the final sample.

Probability Sampilng


Non-Probability Sampling Designs

  1. Convenience Sampling: Samples are taken from a group that is conveniently accessible to the researcher. This method is quick, easy, and cost-effective, but it may not be representative of the population.
  2. Judgment Sampling: Also known as purposive sampling, this method involves the researcher using their expertise to select the sample that they believe is most representative of the population. This method relies on the judgment of the researcher and may introduce bias.
  3. Quota Sampling: The population is divided into subgroups (quotas) and samples are taken from each subgroup to meet a predefined number. This method ensures that specific subgroups are represented, but the selection within each subgroup is non-random.
  4. Snowball Sampling: Existing study subjects recruit future subjects from among their acquaintances. This method is useful for reaching hard-to-access or hidden populations but may introduce bias as the sample is not randomly selected.

Convenience Sampling
Judgment Sampling
Quota Sampling
Snowball Sampling


Reference: Alison Diploma Courses



要查看或添加评论,请登录

社区洞察

其他会员也浏览了