Three Approaches to prepare datasets.
Sravanthi kuruva
Software Engineer | WordPress Developer | Transitioning to React.js | Passionate about Frontend Development | Open to New Opportunities.
Why you use Fit_transform on the train set, but just transform on the test set?
Continuation.... To the previous post.
1.why we should not apply 100% data into train set?
Ans:-If you train your model with a training set and test it with that same set, of course your model will do well! You want to evaluate performance of your model on a set of data it has never seen before.
example:-
2) i) Keep 70 to 80% data into train set remaining 20 to 30% in the test set.
Ans:-
Example:-If my dataset consists of 10,000 rows. It has 30 patterns.
领英推荐
train-->70%-->7000 rows-->25 patterns
Test-->30%-->3000 rows-->10 patterns.(Here 5 patterns are common, 5 patterns are unknown to test the model. We can observe the performance of our model in that unknown patterns)
ii) Shuffle data, keep 70 to 80% into train set remaining 20 to 30% in the test set. Among the two methods when we have to apply and which method we have to choose?
Example:-If my dataset consists of 10,000 rows. It has 30 patterns.
Shuffle data--> The reason to shuffle data is may be data set contains continuous repeats like 10,10,20,20,500,500,500...numerical...dog,dog,dog,cat,cat,d
train-->70%-->7000 rows-->25 patterns
Test-->30%-->3000 rows-->10 patterns.(Here 5 patterns are common, 5 patterns are unknown to test the model. We can observe the performance of our model in that unknown patterns)
Avoids overfitting. We can test model performance, in unknown pattern condition