登录查看更多内容

why you should not give too much stress on this value in ML ?

Indrajit S.

Senior Data Scientist @ Citi | GenAI | Kaggle Competition Expert | PHD research scholar in Data Science

发布日期: 2022年9月1日

What is seed

Seed in machine learning means the initialization state of a pseudo-random number generator.?If you use the same seed you will get exactly the same pattern of numbers.

This means that whether you're making a train test split,?generating a NumPy array from some random distribution, or even fitting an ML model,?setting seed will be giving you the same set of results time and again.

It is used for reproducibility.
We use the seed in multiple places, the purpose remains the same which is reproducibility.
When your train and test data are ready, we train and test the model, in between that we train the model and validate it until there is no underfitting or overfitting. When doing so we play with the hyperparameters, so when you work with hyperparameters the randomness should be on the same set of data to make sure the change in the model performance is due to the hyperparameter that we changed and not due to the seed change.
if you get a very high accuracy model with a specific seed but not with a different seed it means your model is no good.
So we use Cross-validation to over come that by training and testing the model and different sets of data.

Indrajit S.

Senior Data Scientist @ Citi | GenAI | Kaggle Competition Expert | PHD research scholar in Data Science

2 年

https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe_and_Everything_.2842.29

要查看或添加评论，请登录

Indrajit S.的更多文章

Common XGBoost Mistakes to Avoid

2024年12月31日

Common XGBoost Mistakes to Avoid

Using Default Hyperparameters - Why Wrong: Different datasets need different settings - Fix: Always tune learning_rate,…
Processing Large Multiline Files in Spark: Strategies and Best Practices

2024年11月10日

Processing Large Multiline Files in Spark: Strategies and Best Practices

Handling large, multiline files can be a tricky yet essential task when working with different types of data from…
Integrating a Hugging Face Model with Google Colab

2024年5月23日

Integrating a Hugging Face Model with Google Colab

Integrating models from Hugging Face with Google Colab. Install Hugging Face Transformers Install required libs…
PyTorch GPU

2023年12月23日

PyTorch GPU

Check if CUDA is Available: This command returns True if PyTorch can access a CUDA-enabled GPU, otherwise False. Get…
How to choose the right model

2023年8月4日

How to choose the right model

Choosing the right model for a machine learning problem involves multiple steps, each of which can influence the…
???? #DataScience Insight: The Significance of Data Cleaning ????

2023年7月29日

???? #DataScience Insight: The Significance of Data Cleaning ????

In the world of Data Science, it's often said that 80% of a data scientist's valuable time is spent simply finding…
Machine Learning Model Monitoring

2023年3月18日

Machine Learning Model Monitoring

Machine Learning Model Monitoring ML monitoring verifies model behavior in the early phases of the MLOps lifecycle and…
How to optimise XGBOOST MODEL

2022年12月23日

How to optimise XGBOOST MODEL

How to optimise XGBOOST model XGBoost is a powerful tool for building and optimizing machine learning models, and there…

1 条评论
Performance Tuning in join Spark 3.0

2020年10月23日

Performance Tuning in join Spark 3.0

When we perform join in spark and if your data is small in size .Then spark by default applies the broad cast join .
Spark concepts deep dive

2020年8月22日

Spark concepts deep dive

Spark core architecture To summerize it in simple line Spark runs in local and cluster and Messos mode . Image copied…

1 条评论

See all articles

why you should not give too much stress on this value in ML ?

Indrajit S.

Senior Data Scientist @ Citi | GenAI | Kaggle Competition Expert | PHD research scholar in Data Science

What is seed

Indrajit S.的更多文章

社区洞察

其他会员也浏览了

Evaluating Machine learning models without giving context, please no!!

Season 3: Introduction to Machine Learning

Lessons in A.I. from a Budding Machine Learning Engineer — Getting to the Core — Part II

Predicting Survival on The Titanic: My First Kaggle Competition With XGBoost and CatBoost.

Role of Confusion matrix in machine learning

List of Common Machine Learning Algorithms - Algo-1/Week-1/Day-1

K-Nearest Neighbor Machine Learning algorithm

XGBOOST

How You Can Choose Best Features For Your ML Model ?

What is seed

Indrajit S.的更多文章

Common XGBoost Mistakes to Avoid

Processing Large Multiline Files in Spark: Strategies and Best Practices

Integrating a Hugging Face Model with Google Colab

PyTorch GPU

How to choose the right model

???? #DataScience Insight: The Significance of Data Cleaning ????

Machine Learning Model Monitoring

How to optimise XGBOOST MODEL

Performance Tuning in join Spark 3.0

Spark concepts deep dive

社区洞察

其他会员也浏览了

Evaluating Machine learning models without giving context, please no!!

Season 3: Introduction to Machine Learning

Lessons in A.I. from a Budding Machine Learning Engineer — Getting to the Core — Part II

Predicting Survival on The Titanic: My First Kaggle Competition With XGBoost and CatBoost.

Role of Confusion matrix in machine learning

List of Common Machine Learning Algorithms - Algo-1/Week-1/Day-1

K-Nearest Neighbor Machine Learning algorithm

XGBOOST

How You Can Choose Best Features For Your ML Model ?