登录查看更多内容

Why Split Data?

G Muralidhar

?GenAI Specialist ?AI & Business Strategist ?Productivity Coach ? 20+ years Experience

发布日期: 2024年12月28日

+ 关注

To check how well the model works on unseen data (test set).

This ensures the model doesn't just "memorize" the data but can generalize to new situations.

How to Split Data

Usually, data is divided into two parts:

·?????? Training Set: 70–80% of the data (for learning).

·?????? Test Set: 20–30% of the data (for testing).

Example: Predicting Exam Scores

Suppose we have data for 10 students, showing:

·?????? Study hours (input/independent variable).

·?????? Exam scores (output/dependent variable).

Test Set (2 students): Use this to evaluate the model.

Splitting the Data

If we split this data into 80% training and 20% testing:

Training Set (8 students): Use this to train the model

领英推荐

What is Feature Engineering? —Tools and Techniques for…

Rajoo Jha 1 年前

Hyperparameter Tuning

Shorthills AI 2 年前

Unlocking the Power of Machine Learning: The Right…

Dr.Manish Kumar Jain 6 个月前

Output

Training Data: Used by the model to learn patterns.
Test Data: Used to check how well the model predicts scores.

Exercise:

1.??? Purpose of Splitting: Why is it necessary to split data into training and test sets in machine learning? Use an example to support your answer.

2.??? Avoiding Overfitting: How does splitting data into training and test sets help prevent overfitting, and why is this important for building a reliable model?

3.??? Testing Generalization: How does the test set help us evaluate whether the model can generalize to new, unseen data?

Example: split the data

How to split data in python

Previous Chapter: What are Training Set and Test Set?

Index of All Chapters

Next Chapter: What are Features in Machine Learning?

Note:

World's first simplest and easiest explanation of AI and Machine Learning. Many resources are too technical, limiting their reach. If this article makes machine learning easier to understand, please share it with others who might benefit. Your likes and shares help spread these insights. Thank you for reading!

AI Insights

505 位关注者

G Muralidhar

?GenAI Specialist ?AI & Business Strategist ?Productivity Coach ? 20+ years Experience

2 个月

Thanks Danapati

要查看或添加评论，请登录

G Muralidhar的更多文章

100+ AI Tools & Big Collection

2025年3月16日

100+ AI Tools & Big Collection

This collection will keep expanding, so save this post—it will be very useful! Contents of All AI-Insights Editions AI…
Your First Python Program in Google Colab

2025年2月11日

Your First Python Program in Google Colab

How to create google colab file. Introduction to Google Colab Interface.
Getting Started with Python on Google Colab

2025年1月27日

Getting Started with Python on Google Colab

Installing Google colab in your Google Drive Installing Google Colab in Google Drive Steps to install a Google Colab…
What is Data Preprocessing?

2025年1月15日

What is Data Preprocessing?

Data preprocessing is the process of preparing raw data into a clean and usable format for machine learning models…
What is Feature Scaling?

2025年1月10日

What is Feature Scaling?

Feature scaling is a technique in machine learning where we adjust the values of different features (or columns) in our…
How Features Are Used in Models?

2025年1月6日

How Features Are Used in Models?

Features are the input variables for machine learning models. These inputs are processed by algorithms to uncover…
What are Features in Machine Learning?

2025年1月2日

What are Features in Machine Learning?

What are Features in Machine Learning? In machine learning, a feature is an individual measurable property or…
Contents

2024年12月19日

Contents

At AI Insights, I am deeply committed to delivering exceptional value to my subscribers. This thoughtfully crafted…
What are Training Set and Test Set?

2024年12月14日

What are Training Set and Test Set?

When we train a machine learning model, we need data. This data is split into two main parts 1.
Beyond Models: The Real Measure of ChatGPT Model is Value Addition

2024年12月12日

Beyond Models: The Real Measure of ChatGPT Model is Value Addition

In the world of generative AI, it’s tempting to assume that models with advanced labels, like “o1,” are inherently…

See all articles

Why Split Data?

G Muralidhar

?GenAI Specialist ?AI & Business Strategist ?Productivity Coach ? 20+ years Experience

领英推荐

Example: split the data

Note:

AI Insights

505 位关注者

G Muralidhar的更多文章

社区洞察

其他会员也浏览了

A Simple Machine Learning Example.

Day 17 - CatBoost

Decision Tree in Machine Learning - An Overview

Balancing the Scales : Handling Class Imbalance

Part 2 - Keep it Simple : Machine Learning & Algorithms for Big Boys

Predictive Analytics and Machine Learning: Discovering the Likelihood of a Future Outcome

Isolation Forest- An overview

Statistics for Machine Learning: Essential Terms Explained Simply

Beyond Model Training: The Critical Role of Data Validation in MLOps

Navigating the Maze: A Comprehensive Guide to Debugging in Machine Learning

领英推荐

Example: split the data

Note:

AI Insights

505 位关注者

G Muralidhar的更多文章

100+ AI Tools & Big Collection

Your First Python Program in Google Colab

Getting Started with Python on Google Colab

What is Data Preprocessing?

What is Feature Scaling?

How Features Are Used in Models?

What are Features in Machine Learning?

Contents

What are Training Set and Test Set?

Beyond Models: The Real Measure of ChatGPT Model is Value Addition

社区洞察

其他会员也浏览了

A Simple Machine Learning Example.

Day 17 - CatBoost

Decision Tree in Machine Learning - An Overview

Balancing the Scales : Handling Class Imbalance

Part 2 - Keep it Simple : Machine Learning & Algorithms for Big Boys

Predictive Analytics and Machine Learning: Discovering the Likelihood of a Future Outcome

Isolation Forest- An overview

Statistics for Machine Learning: Essential Terms Explained Simply

Beyond Model Training: The Critical Role of Data Validation in MLOps

Navigating the Maze: A Comprehensive Guide to Debugging in Machine Learning