登录查看更多内容

machine learning: intuition of Gaussian processing

Chen Yang??????

Machine & Deep Learning | Big Data Cloud

发布日期: 2018年1月30日

This article talks about the Gaussian process and Gaussian regression. We begin with an intuition of assumption that data points are close by then their height should be also close by. With this, we can draw prior samples from test data and draw posterior samples from conditional distribution. The conditional distribution is obtained from multivariate Gaussian theorem via the joint vector distribution. The following expands by referencing slides from machine learning lecture taught by Processor Nando. And his video lecture can be found on youtube.

Now we're gonna assume X's is given like {x_1, x_2, x_3} , and want to model the f(x)'s. Moreover, we assume f's are multivariance function, so we have a vector (f_1, f_2, f_3), and this vector has zero mean, and covariance matrix which captures the relationship of the three points in the diagram. For example, x_1 and x_2 should be more correlated because they are nearby than x_1 and x_3. And thus k_12 and k_21 has larger value (0.7) than k_13 or k_31.

There are many ways to measure the similarity x_i and x_j, one of the possible way here we use is squared exponential curve. It can take as similarity functions because it gets zero when the distance between x_i and x_j is very large and is 1 when the two points are equal each other. We can use this function to fill the covariance matrix to describe the cloud of points, or at least describe their height. We don't intend to describe the x-axis because X is given, the only we use X to construct K, the covariance matrix.

As shown in the following slide, when we add a new point x asterisk between x_2 and x_3, but don't know the height (i.e., f asterisk) of the new point. So, where is the possible height of the new point? It seems to believe shade triangle is better than other possible choices in the following diagram. Because if x-axis distances between x asterisk and its nearest point x_3 are small, somehow we expect the height distance f asterisk and its nearest points are also small. This is called smoothness in machine learning. So, a small variation in x-axis you also wanna small variance in the y-axis.

Here we assume f asterisk also comes from a Gaussian distribution which is also zero mean because we assume test data comes from the same distribution as the training data. If we have multivariance Gaussian as joint Gaussian (f, f asterisk), and we want conditional p(f asterisk | f), we can use multivariance Gaussian theorem to obtain mean mu asterisk and variance sigma asterisk.

In other words, for any point from the test set, I can predict the mean and variance. If I take a large number of test point I can plot a beautiful line. From the following diagram, we predict mean and variance and thus the confidence for red dots which are test points. And from intuition, we see the confidence is high where the (training) data is, so the uncertainty is low. When we don't have data, saying for the test data, I cannot be too confident of the prediction.

Now what I have is a function that I put as input x asterisk, and I get as output the mean and variance. The Gaussian process is the distribution of functions because mean and variance is just function of x. If I have those two functions what I am to do is that I can get a very fine-grained of points.

The following python code snippet is for creating the prior function. What is the first thing that I should do is that I create test dataset X_(1:N), where N points are so close to each other, and the assume zero mu vector and compute out covariance matrix K using the kernel function. We randomly sample 10 normal distributions for obtaining the prior function by the product of the normal distributions and cholesky matrix L, where it's noted that L transpose times L is equal to similarity matrix K. As shown in the diagram in the above slide, each color is a single prior function that I obtain in this way.

Here we only need to take a look at the multivariate Gaussian theorem, which basically can figure out the posterior mean and variance for conditional distribution p(f asterisk | f) from the joint vector (f, f asterisk) distribution.

In other words, given that I have training points set D, I will combine the data with my prior functions. The prior functions, as above slides showing, is specified via the covariance that I want the function to be smooth because the similarity matrix assume if the two points are close by the height is also close by.

And if I draw functions from the conditional Gaussian I get the bottom right figure. Remember that I can evaluate the conditional Gaussian p(f asterisk | f) at any point, and for all those points I can compute the mean and the variance. And then I can get those beautiful plots where the training data basically squishes the uncertainty, grabs those function and ties them. Note in the bottom right picture, each color is one vector (f_*1, f_*2, ...) and with the number of f_* points along x-axis goes very large, I will get a smooth curve or saying a function from the vector.

With this, we have Noiseless Gaussian process regression as defined in the following two slides.

There are cases when we cannot be sure about fs, there has some error. And this is the case of noisy Gaussian regression, with extra term epsilon normally distributed. From the following slide, having Gaussian noise epsilon only involves a minor modification of the algorithm which is the addition of a diagonal term sigma square everything else is the same when you plot this you however that even when you have data, as shown in the blue frame in the following diagram, the certainty is collapse because we assume the noise.

For noisy GP, we can rewrite the mean of f asterisk as in the sum of basis functions. So the Gaussian process if you look at the mean, what you're doing is you're just fitting an RBF you're just fitting a non-linear function using the combination of basis functions where you placed each basis function at the data. Now we know that putting the basis function where the data is is the principle thing.

Chen Yang

要查看或添加评论，请登录

Chen Yang??????的更多文章

Practice on using ansible 2.4 to deploy HDP 2.6.4.0

2018年4月18日

Practice on using ansible 2.4 to deploy HDP 2.6.4.0

I'm practicing ansible installation of hdp 2.6.
Deep learning--CNN: localization in object detection (1/2)

2018年4月3日

Deep learning--CNN: localization in object detection (1/2)

Deep learning has been successfully applied to computer vision, speech recognition, online advertising, logistics many…
Deep learning--CNN: classic ConvNet, residual networks, inception network

2018年3月20日

Deep learning--CNN: classic ConvNet, residual networks, inception network

There are some classic neural network architectures LeNet-5, AlexNet, and VGG-16. First, let's look at the following…

1 条评论
Deep learning--CNN: Padding, strided convolution, convolution over volume, pooling layer

2018年3月12日

Deep learning--CNN: Padding, strided convolution, convolution over volume, pooling layer

In order to build deep neural networks, one modification to the basic convolutional operation that you need to really…
Deep learning--CNN: Edge detection

2018年3月11日

Deep learning--CNN: Edge detection

Computer vision is one of the areas advancing rapidly thanks to deep learning. Deep learning is now helping the…
Deep learning: End-to-end deep learning

2018年3月7日

Deep learning: End-to-end deep learning

One of the exciting recent developments in deep learning has been a rise of end-to-end deep learning. Basically, there…
Deep learning: Transfer learning, multitask learning

2018年3月6日

Deep learning: Transfer learning, multitask learning

One of the powerful ideas of deep learning is that sometimes you can take knowledge, the neural network has learned…

1 条评论
Deep learning: Training and testing on different distributions

2018年3月5日

Deep learning: Training and testing on different distributions

If you're working on a brand new machine learning application, one of the pieces of advice is that you should build…
Deep learning: Error analysis

2018年3月4日

Deep learning: Error analysis

You've heard about orthogonalization, how to set up your dev and test, human-level performance as a proxy for Bayes…
Deep learning: human-level performance

2018年3月3日

Deep learning: human-level performance

In the last few years, there were a lot of talks about comparing the machine learning systems to human-level…

See all articles

machine learning: intuition of Gaussian processing

Chen Yang??????

Machine & Deep Learning | Big Data Cloud

Chen Yang??????的更多文章

社区洞察

其他会员也浏览了

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

How (not) to use Machine Learning for time series forecasting: The sequel

Unveiling the Art of Feature Selection in Machine Learning

What Is Polynomial Regression in Machine Learning?

Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

What Is Lasso and Ridge Regression in Machine Learning?

Day 15 — XGBoost

Machine Learning 7:'Classification' Day 3

Machine Learning 6:'Classification' Day 2

What is RandomizedSearchCV in Machine Learning

Chen Yang??????的更多文章

Practice on using ansible 2.4 to deploy HDP 2.6.4.0

Deep learning--CNN: localization in object detection (1/2)

Deep learning--CNN: classic ConvNet, residual networks, inception network

Deep learning--CNN: Padding, strided convolution, convolution over volume, pooling layer

Deep learning--CNN: Edge detection

Deep learning: End-to-end deep learning

Deep learning: Transfer learning, multitask learning

Deep learning: Training and testing on different distributions

Deep learning: Error analysis

Deep learning: human-level performance

社区洞察

其他会员也浏览了

Demystifying Machine Learning: A Guided Tour of the Top 10 Algorithms

How (not) to use Machine Learning for time series forecasting: The sequel

Unveiling the Art of Feature Selection in Machine Learning

What Is Polynomial Regression in Machine Learning?

Class 15 - INTRO TO SCIKIT LEARN AND CLASSIFICATION Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

What Is Lasso and Ridge Regression in Machine Learning?

Day 15 — XGBoost

Machine Learning 7:'Classification' Day 3

Machine Learning 6:'Classification' Day 2

What is RandomizedSearchCV in Machine Learning