Why Split Data?
G Muralidhar
?GenAI Specialist ?AI & Business Strategist ?Productivity Coach ? 20+ years Experience
To check how well the model works on unseen data (test set).
This ensures the model doesn't just "memorize" the data but can generalize to new situations.
How to Split Data
Usually, data is divided into two parts:
·?????? Training Set: 70–80% of the data (for learning).
·?????? Test Set: 20–30% of the data (for testing).
Example: Predicting Exam Scores
Suppose we have data for 10 students, showing:
·?????? Study hours (input/independent variable).
·?????? Exam scores (output/dependent variable).
Test Set (2 students): Use this to evaluate the model.
?
Splitting the Data
If we split this data into 80% training and 20% testing:
领英推荐
Output
Exercise:
1.??? Purpose of Splitting: Why is it necessary to split data into training and test sets in machine learning? Use an example to support your answer.
2.??? Avoiding Overfitting: How does splitting data into training and test sets help prevent overfitting, and why is this important for building a reliable model?
3.??? Testing Generalization: How does the test set help us evaluate whether the model can generalize to new, unseen data?
Example: split the data
How to split data in python
Note:
World's first simplest and easiest explanation of AI and Machine Learning. Many resources are too technical, limiting their reach. If this article makes machine learning easier to understand, please share it with others who might benefit. Your likes and shares help spread these insights. Thank you for reading!
?GenAI Specialist ?AI & Business Strategist ?Productivity Coach ? 20+ years Experience
2 个月Thanks Danapati