What is sampling? and Why Re-sampling required?
Amit Patriwala
Enterprise Solution Architect | Leading Data-Driven Innovation with AI & Cloud
As a Data Scientist, we do all the possible way to identified the data accuracy, hidden pattern and mapping of data - right?
What is Sampling?
Data sampling is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points to identify patterns and trends in the larger data set being examined.
What is Re-sampling?
Once we have a data sample, it can be used to estimate the population parameter. The problem is that we only have a single estimate of the population parameter, with little idea of the variability or uncertainty in the estimate. One way to address this is by estimating the population parameter multiple times from our data sample. This is called resampling.
Why Re-sampling required?
Resampling is a methodology of economically using a data sample to improve the accuracy and quantify the uncertainty of a population parameter.
Re-sampling is done in any of these cases:
1) Estimating the accuracy of sample statistics by using subsets of accessible data or drawing randomly with replacement from a set of data points
2) Substituting labels on data points when performing significance tests
3) Validating models by using random subsets (bootstrapping, cross-validation)
Happy Learning!!