登录查看更多内容

Data Optimizations Techniques in the Machine Learning

Rama Krishna Reddy Dyava

Expert in Robotics & Medical Device | CTO | Surgical Roboticist | Smart and Futuristic Health Care |@ Ex-DLR

发布日期: 2023年3月20日

Curve fitting is one of the most theoretically challenging parts of machine learning, primarily due to how important it is to the end result. While it might not pose a challenge when working with relatively simple datasets with a few features, in more complicated projects an improper fit is much more likely.

Consider that we have collected sensor, motor and joint data from the problem domain with inputs and outputs.

The x-axis is the independent variable or the input to the function.

The y-axis is the dependent variable or the output of the function.

We don’t know the form of the function that maps examples of inputs to outputs, but we suspect that we can approximate the function with a standard function form.

Curve fitting involves first defining the functional form of the mapping function (also called the?basis function?or objective function), then searching for the parameters to the function that result in the minimum error.

Error is calculated by using the observations from the domain and passing the inputs to our candidate mapping function and calculating the output, then comparing the calculated output to the observed output.

Once fit, we can use the mapping function to interpolate or extrapolate new points in the domain. It is common to run a sequence of input values through the mapping function to calculate a sequence of outputs, then create a line plot of the result to show how output varies with input and how well the line fits the observed points.

The key to curve fitting is the form of the mapping function.

Four scenarios

All curve fitting (for machine learning, at least) can be separated into four categories based on the?a priori?knowledge about the problem at hand:

Completely known. There is no fitting problem to be had as, if f(x) is known, then it can be applied without any guessing. All future data will fall onto the curve neatly.
Unknown, but the structure is known. In such a case, the curve may be known as, for example, a straight line, but no data on other parameters is available.
Unknown, but can be guessed. In two-dimensional data, sometimes we have nothing, but since the outlay is relatively simple, we can make a reasonable assumption of what the curve should be.
Unknown. Model function f(x) is completely unknown, there are no guesses to be made, and the parameters are mysterious.

Iain Brown Ph.D. 9 个月前

Machine Learning Unleashed: Transforming Business Data…

Eric D. Brown, DSc 3 个月前

K-Nearest Neighbors (KNN) Algorithm for…

Vrata Tech Solutions (VTS) 8 个月前

A polynomial regression model by adding squared terms to the objective function.

A fifth-degree polynomial fit to the data.

A Curve fitting with Sine functions

Underfitting and Overfitting

First, curve fitting is an optimization problem. Each time the goal is to find a curve that?properly?matches the data set. There are two ways of improperly doing it — underfitting and overfitting.

Underfitting is easier to grasp for nearly everyone. It happens whenever the function barely captures the complexity of the distribution of data in, say, a scatter plot. As is often the case, these are easiest to visualize in two dimensions, but curve fitting often has to be done in more.

The problem with underfitting is quite clear. A model with such a curve will make erroneous predictions because it attempts to simplify everything to a significant degree. It might, for example, capture just a few data points out of dozens.

Overfitting is a bit more complicated. Intuitively it may seem that you’d like to maximize the accuracy of a model by fitting the curve perfectly. In the real world, overfitting causes numerous errors to appear when testing the model.

There are many potential ways to understand why overfitting is an issue. One is to think of any dataset as incomplete. Unless you acquire all existing data points, there will be some unknowns that will have some predictable, but not identical distribution. An overfitted model would have learned the patterns so well that it would expect them to be identical in the future.

In the end, one might think of overfitting as bringing the model closer to determinism instead of leaving it stochastic. Proper fit is somewhere in between underfitting and overfitting.

查看更多评论

要查看或添加评论，请登录

Rama Krishna Reddy Dyava的更多文章

My first Robotic work is developing Control System for Manipulating Industrial Robots — Performance criteria and related test methods: ISO 9283

2023年2月24日

My first Robotic work is developing Control System for Manipulating Industrial Robots — Performance criteria and related test methods: ISO 9283

The ISO standard “ISO9283: Manipulating industrial robots - Performance criteria and related test methods” describes…

3 条评论
Serial Communication in Robotics

2023年2月11日

Serial Communication in Robotics

After studying and working over the years, many communication protocols exist in the industry. However, still, serial…
PID in Motion Control(Robotics)

2023年2月7日

PID in Motion Control(Robotics)

PID is essential for any motion control in robotics, but most of us are still confused always and I will try to explain…

11 条评论
PHYSICAL HUMAN-ROBOT INTERACTION USIND TACTILE FEEDBACK

2022年5月7日

PHYSICAL HUMAN-ROBOT INTERACTION USIND TACTILE FEEDBACK
Interpretation & Analysis in Robotics

2021年6月7日

Interpretation & Analysis in Robotics

In robotics, data plotting and interpretation are very important, these days a lot of simulations software are helping…

2 条评论

See all articles

Data Optimizations Techniques in the Machine Learning

Rama Krishna Reddy Dyava

Expert in Robotics & Medical Device | CTO | Surgical Roboticist | Smart and Futuristic Health Care |@ Ex-DLR

Four scenarios

领英推荐

Underfitting and Overfitting

Rama Krishna Reddy Dyava的更多文章

社区洞察

其他会员也浏览了

Making Sense of Data Features

Machine Learning Monitoring, Part 5: Why You Should Care About Data and Concept Drift

Understanding the Bias-Variance Tradeoff: Balancing Model Performance in Machine Learning

Feature Selection In Machine Learning Version 1.0('Layman words') !!

Comparison of Dimensionality Reduction Methods

Effective XGBoost by Matt Harrison

How do I determine which evaluation metric is most appropriate for my specific machine learning task?

A Brief Guide to Price Elasticity Modeling

Leveraging inherent structure to work with 90% less data

Why is it called Support Vector Machine(SVM)?

Four scenarios

领英推荐

Underfitting and Overfitting

Rama Krishna Reddy Dyava的更多文章

My first Robotic work is developing Control System for Manipulating Industrial Robots — Performance criteria and related test methods: ISO 9283

Serial Communication in Robotics

PID in Motion Control(Robotics)

PHYSICAL HUMAN-ROBOT INTERACTION USIND TACTILE FEEDBACK

Interpretation & Analysis in Robotics

社区洞察

其他会员也浏览了

Making Sense of Data Features

Machine Learning Monitoring, Part 5: Why You Should Care About Data and Concept Drift

Understanding the Bias-Variance Tradeoff: Balancing Model Performance in Machine Learning

Feature Selection In Machine Learning Version 1.0('Layman words') !!

Comparison of Dimensionality Reduction Methods

Effective XGBoost by Matt Harrison

How do I determine which evaluation metric is most appropriate for my specific machine learning task?

A Brief Guide to Price Elasticity Modeling

Leveraging inherent structure to work with 90% less data

Why is it called Support Vector Machine(SVM)?