登录查看更多内容

Topic : Repeatability Vs Reproducibility In Experiments

Amit Goswami

Director/Head of Engineering At UnitedHealth Group/Optum @Top 5th Fortune 500 company | Shaping the Future of Product Innovation | Technology Leader| I Turn Ideas Into Tangible Successes

发布日期: 2023年11月6日

Article talks about the difference between Reproducibility and Repeatability in the scientific research experiments and why its important. Article also relates it's importance in Machine Learning experiment tracking, Projects/model deployments & Operations.

Repeatability means having one result from an experiment, you can try the same experiment again, with the same setup, and produce/get that exact same result.

Reproducibility is a measure if the same result can be attained by a different? team, using the same artifacts.

Reproducibility Crisis specially in research and development experienced by scientists that they are not able to reproduce the experiment again in Lab for a particular result. However this is the separate concern and not the scope of this article.

Machine Learning end to end development and deployment is complex and considering the non-deterministic nature of?subject it become?important to strategize how to flawlessly integrate it with business and?products. Machine Learning has deep roots in Statistics & probabilities and sometimes slight change in the order of the input/Data can change the output of the ML program. So Lets dive in!

What is reproducibility in Machine Learning?

In simple language same input should provided by same output for multiple runs. Keeping in mind it's non-deterministic nature, Machine learning code should be reproducible and every run should not give different results. Usually we make use of test_train_split, while doing so, it performs a random shuffle of dataset. If SEED value is not set, every run produces different training dataset distribution for test train split.

What are SEED ? Why defining? SEED help to produce the reproducibility in the Machine Learning Algorithms??

SEED helps to produce reproducibility & bring the randomness in the Machine Learning Algorithms using Random Numbers. Random numbers are of two types, pseudorandom numbers and true random numbers. "Pseudorandom numbers" are numbers that appears to be random, but they are not truly random. Typically, pseudorandom numbers will be generated using a SEED value (provided by a user) which is then passed to an algorithm that uses the value to generate a new number.?

For example, let’s say we use the following? simple equation to generate a series of random numbers:

R = (387 x S + 217) // 954

Where:

R is the random number to be produced, S is the seed value for R, Lets start with a seed value (S) value of 43.

R = (387 x 43 + 217) // 953

R = 657 (First Random Number)

领英推荐

How much data do you need for a machine learning…

Ajit Jaokar 1 年前

4 algorithms machine learning engineers should know

Naveen Joshi 7 年前

Top 10 Machine Learning Algorithms You Must Know in…

The Education Magazine 6 个月前

To produce the second random number, we then insert 657 as S, back into the equation:

R = (387 x 657 + 217) // 953

R =?25 (Second Random Number)

If the seed value (S) is the same, the sequence of "random" numbers produced by the algorithm will be exactly the same every time. This means that if you know the equation and the seed value, you can predict the entire sequence of "random" numbers. This process can be repeated as many times as needed, generating an apparently random series of numbers. The numbers seem random to us, but actually generates a deterministic algorithm that creates number sequences that (only) look random.

“Using a pseudorandom number generator ensures that we are able to replicate our results and in this particular case able to generate the exact same train-test split dataset from the data corpus”

?Why Does it Matter??

Process of selecting a random sample for a scientific study,? using pseudorandom numbers, allows others to replicate your results by using the same seed value.

In video games, being able to trigger the same "random" events is very useful when the game is being tested.?
In applications were its require to make use of encryption, using true random numbers is particularly important. It helps to ensure that data remains protected.
Similarly, for online gambling, gaming companies need to have a very high level of confidence that the way results are being produced in everything from blackjack (how the cards are shuffled), to roulette (where the ball lands) and poker machines? is a truly random process, or they risk someone reverse engineering the algorithm.

Why there is need for Reproducibility in Machine Learning Architecture???

Reproducibility is the ability to duplicate the Machine Learning Model exactly, with the same?raw data as input, it should returns the same result. We don’t generally deploy Machine Learning algorithms but we deploy the entire ML pipeline. We need to make sure that every single step of the ML pipeline is reproducible.

Every Steps in Machine Learning Pipeline should be reproducible including Data gathering, feature creation, model building & deployment, so versioning both code, data and exacts infrastructure environment with right configurations is critical.

This become the basis of the Machine Learning Operations (MLOps) !

要查看或添加评论，请登录

Amit Goswami的更多文章

What is Federated Learning ?It's Privacy-Preserving Machine Learning

2024年4月12日

What is Federated Learning ?It's Privacy-Preserving Machine Learning

What is Federated Machine Learning ? Federated Machine Learning is simply the decentralized form of Machine Learning…

2 条评论
Large Language Models (LLMs) - Enhance Accuracy & Reliability of Inferences !

2023年11月22日

Large Language Models (LLMs) - Enhance Accuracy & Reliability of Inferences !

This article talks about ways to enhance the accuracy, efficiency and reliability of responses from Generative AI…

6 条评论

Topic : Repeatability Vs Reproducibility In Experiments

Amit Goswami

Director/Head of Engineering At UnitedHealth Group/Optum @Top 5th Fortune 500 company | Shaping the Future of Product Innovation | Technology Leader| I Turn Ideas Into Tangible Successes

What is reproducibility in Machine Learning?

What are SEED ? Why defining? SEED help to produce the reproducibility in the Machine Learning Algorithms??

领英推荐

“Using a pseudorandom number generator ensures that we are able to replicate our results and in this particular case able to generate the exact same train-test split dataset from the data corpus”

?Why Does it Matter??

Why there is need for Reproducibility in Machine Learning Architecture???

Every Steps in Machine Learning Pipeline should be reproducible including Data gathering, feature creation, model building & deployment, so versioning both code, data and exacts infrastructure environment with right configurations is critical.

Amit Goswami的更多文章

社区洞察

其他会员也浏览了

Machine learning

Regularization in Machine Learning

Machine Learning

Implementing a Machine Learning Solution: A Practical Guide

BxD Primer Series: ECLAT Pattern Search Algorithm

What is Hypothesis and Inductive Bias in Machine Learning?

Why Machines Learn? purpose and process

Understanding the Essentials of Machine Learning: A Deep Dive into Module 1 of Tom M. Mitchell, Machine Learning Book

Enhancing Fairness in Machine Learning Resource Allocation

Hyperparameter optimization in Machine Learning Part-1: Algorithms

What is reproducibility in Machine Learning?

What are SEED ? Why defining? SEED help to produce the reproducibility in the Machine Learning Algorithms??

领英推荐

“Using a pseudorandom number generator ensures that we are able to replicate our results and in this particular case able to generate the exact same train-test split dataset from the data corpus”

?Why Does it Matter??

Why there is need for Reproducibility in Machine Learning Architecture???

Every Steps in Machine Learning Pipeline should be reproducible including Data gathering, feature creation, model building & deployment, so versioning both code, data and exacts infrastructure environment with right configurations is critical.

Amit Goswami的更多文章

What is Federated Learning ?It's Privacy-Preserving Machine Learning

Large Language Models (LLMs) - Enhance Accuracy & Reliability of Inferences !

社区洞察

其他会员也浏览了

Machine learning

Regularization in Machine Learning

Machine Learning

Implementing a Machine Learning Solution: A Practical Guide

BxD Primer Series: ECLAT Pattern Search Algorithm

What is Hypothesis and Inductive Bias in Machine Learning?

Why Machines Learn? purpose and process

Understanding the Essentials of Machine Learning: A Deep Dive into Module 1 of Tom M. Mitchell, Machine Learning Book

Enhancing Fairness in Machine Learning Resource Allocation

Hyperparameter optimization in Machine Learning Part-1: Algorithms