Why Split Data?

Why Split Data?

To check how well the model works on unseen data (test set).

This ensures the model doesn't just "memorize" the data but can generalize to new situations.


How to Split Data

Usually, data is divided into two parts:

·?????? Training Set: 70–80% of the data (for learning).

·?????? Test Set: 20–30% of the data (for testing).

Example: Predicting Exam Scores

Suppose we have data for 10 students, showing:

·?????? Study hours (input/independent variable).

·?????? Exam scores (output/dependent variable).


Test Set (2 students): Use this to evaluate the model.

?

Splitting the Data

If we split this data into 80% training and 20% testing:

  • Training Set (8 students): Use this to train the model

Output

  • Training Data: Used by the model to learn patterns.
  • Test Data: Used to check how well the model predicts scores.


Exercise:

1.??? Purpose of Splitting: Why is it necessary to split data into training and test sets in machine learning? Use an example to support your answer.

2.??? Avoiding Overfitting: How does splitting data into training and test sets help prevent overfitting, and why is this important for building a reliable model?

3.??? Testing Generalization: How does the test set help us evaluate whether the model can generalize to new, unseen data?

Example: split the data

How to split data in python


Previous Chapter: What are Training Set and Test Set?

Index of All Chapters

Next Chapter: What are Features in Machine Learning?

Note:

World's first simplest and easiest explanation of AI and Machine Learning. Many resources are too technical, limiting their reach. If this article makes machine learning easier to understand, please share it with others who might benefit. Your likes and shares help spread these insights. Thank you for reading!




G Muralidhar

?GenAI Specialist ?AI & Business Strategist ?Productivity Coach ? 20+ years Experience

2 个月

Thanks Danapati

回复

要查看或添加评论,请登录

G Muralidhar的更多文章

  • 100+ AI Tools & Big Collection

    100+ AI Tools & Big Collection

    This collection will keep expanding, so save this post—it will be very useful! Contents of All AI-Insights Editions AI…

  • Your First Python Program in Google Colab

    Your First Python Program in Google Colab

    How to create google colab file. Introduction to Google Colab Interface.

  • Getting Started with Python on Google Colab

    Getting Started with Python on Google Colab

    Installing Google colab in your Google Drive Installing Google Colab in Google Drive Steps to install a Google Colab…

  • What is Data Preprocessing?

    What is Data Preprocessing?

    Data preprocessing is the process of preparing raw data into a clean and usable format for machine learning models…

  • What is Feature Scaling?

    What is Feature Scaling?

    Feature scaling is a technique in machine learning where we adjust the values of different features (or columns) in our…

  • How Features Are Used in Models?

    How Features Are Used in Models?

    Features are the input variables for machine learning models. These inputs are processed by algorithms to uncover…

  • What are Features in Machine Learning?

    What are Features in Machine Learning?

    What are Features in Machine Learning? In machine learning, a feature is an individual measurable property or…

  • Contents

    Contents

    At AI Insights, I am deeply committed to delivering exceptional value to my subscribers. This thoughtfully crafted…

  • What are Training Set and Test Set?

    What are Training Set and Test Set?

    When we train a machine learning model, we need data. This data is split into two main parts 1.

  • Beyond Models: The Real Measure of ChatGPT Model is Value Addition

    Beyond Models: The Real Measure of ChatGPT Model is Value Addition

    In the world of generative AI, it’s tempting to assume that models with advanced labels, like “o1,” are inherently…

社区洞察

其他会员也浏览了