Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Radial Basis Neural Networks. Let’s get started:
The What:
Radial Basis Function Networks (RBFN) differs from traditional feedforward neural networks in their hidden layer activation functions. While traditional feedforward neural networks (FFNNs) use sigmoidal or ReLU activation functions, RBFNs use radial basis functions (RBFs) as activation functions in the hidden layer.
RBFs are functions that have a center and a radius that determine the degree of influence of input data on output of the function. Activation of each neuron in hidden layer of RBFN is based on the similarity of input to the center of that neuron's RBF.
Note: Typically, there is only one hidden layer (with several neurons) in RBFN.
Use Cases:
Here are some use cases where RBF networks have an advantage over traditional FFNNs:
- Function Approximation: RBF networks are used for function approximation tasks such as interpolation and extrapolation. Compared to other types of neural networks, RBF networks are faster and more accurate in this task.
- If we have data points for temperature at 9:00 AM and 10:00 AM,?interpolation?can be used to estimate temperature at 9:30 AM based on values at 9:00 AM and 10:00 AM.
- If we have data points for temperature from Monday to Friday,?extrapolation?can be used to estimate temperature on Saturday or Sunday based on values from previous days.
- Pattern Recognition: RBF networks are used for pattern recognition tasks such as image classification, object recognition, handwritten digit recognition and face recognition. The key advantage of RBF networks in this tasks is their ability to separate patterns with nonlinear decision boundaries using radial basis functions.
- Time Series Prediction: The ability of RBF networks to approximate nonlinear relationships between inputs and outputs make them particularly effective in time series prediction tasks such as forecasting stock prices or weather patterns.
- Control Systems: RBF networks can learn the underlying nonlinear dynamics of a system and can be used to control the systems such as robotics and autonomous vehicles.
Types of Radial Basis Functions (RBFs):
Most commonly used RBFs are:
??Gaussian RBF?uses a Gaussian distribution as its activation function. It is the most commonly used RBF due to its simplicity and effectiveness.
- It takes the form of a bell-shaped curve that peaks at the center point and tapers off symmetrically on either side. Width of the curve is determined by a single parameter called the "width" or "sigma".
- A width that is too large can lead to under-fitting, while a width that is too small can lead to overfitting.
- It is commonly used for function approximation and pattern recognition tasks and can be applied to both linearly and non-linearly separable datasets.
The output of j’th RBF neuron for the input vector x_i is:
- x_i is the i’th input vector
- c_j is the center of j’th RBF neuron
- σ?is the width parameter of RBF function.
??Multiquadratic RBF?can be used for same applications as Gaussian RBF.
- It has non-zero values over the entire input domain, which means that it can capture global patterns in data.
- It is sensitive to the value of spread parameter c. If?c?is set too small, RBF will be very narrow and have high sensitivity to small changes in input data. If?c?is set too large, RBF will be very wide and provide a very coarse approximation to underlying function.
- It is a good choice for function approximation and regression problems where a smooth approximation of data is desired.
- It is useful for clustering and classification problems, particularly when the data is high-dimensional and the underlying structure is not well-understood.
- It is not the best choice for problems where data is noisy or contains outliers, as it is sensitive to the distance between data points and may be influenced by these outliers.
- r?is the Euclidean distance between input data point and the center of RBF
- c?is a user-defined parameter known as spread parameter
??Inverse Multiquadratic RBF?is preferred over Multiquadratic RBF in certain situations due to its faster rate of decay as the distance from center increases, which leads to better generalization and less susceptibility to overfitting. It is often used for function approximation, interpolation, and data smoothing applications.
However, Multiquadratic RBF is preferred in cases where a smoother, more gradual decay is desired, which can provide a better fit to the underlying data distribution.
- r?is the Euclidean distance between input data point and center of RBF
- c?is a positive constant known as the RBF width or shape parameter.
??Thin Plate Spline RBF?is particularly useful in problems where the number of training samples is small compared to the dimensionality of input space.
- It is often used for interpolation, extrapolation, surface reconstruction and image warping in applications such as computer graphics, computer vision, shape interpolation etc.
Where?r?is the Euclidean distance between input data point and center of the RBF.
??Polyharmonic spline RBF?is often used for function approximation and image processing.
Parameter?k?determines the order of spline, which affects the smoothness of function.
- Higher values of?k?result in a smoother function, but can lead to overfitting if value is too large.
Where?r?is the Euclidean distance between the input point and center of the RBF.
- Even values of k is to ensure that the function is smooth and has continuous derivatives up to order k-2.
- Odd values of k result in a less smooth function with discontinuous derivatives, which can lead to numerical instability and poor performance.
??Cauchy RBF?is often used for regression, anomaly detection and classification.
- It is unbounded, meaning that it can take on very large or very small values for inputs that are far from center.
- It has heavy tails, meaning that it decays slowly as the distance from center increases. This property can be useful in situations where there are outliers or noise in the data.
- Use for regression problems, where there may be outliers or noise in data. Its heavy-tailed nature can help to capture just the right amount of patterns from this data.
- Use for anomaly detection, where it can be used to model normal behavior of a system and detect when there are deviations from this behavior.
- Avoid in cases of high dimensionality, where it is caught in curse of dimensionality.
- α?is a hyperparameter that controls the width of basis function
- r?is the Euclidean distance between input and center of RBF.
The How:
The first step in constructing a Gaussian RBF neural network is to choose the centers of radial basis functions. There are several methods for doing this, but one common approach is to use k-means clustering (check our edition on k-means clustering?here
) to identify representative points in input space.
- x_1, x_2, …, x_n be the inputs to RBF network of?m?neurons denoted as?x?vector.
- c_1, c_2, …, c_m be the centers of radial basis functions, and let?σ?be the width parameter of Gaussian functions.
- y be the output of network
- w_i?is the weight of i’th neuron in hidden layer
- h_i?is the output of i’th neuron in hidden layer
The weights between hidden and output layers are typically learned using back-propagation. We already did a comprehensive coverage on back-propagation algorithm in a previous edition, check it?here
.
Parameters for Optimization:
In a RBF neural network, there are several parameters and hyper-parameters that need to be set before the network can be trained and used to make predictions. Here is a list of important ones:
- Centers of radial basis functions: c_1, c_2, …, c_k
- Width parameter of the Gaussian functions: sigma (σ)
- Weights between the hidden and output layers:?w_i
- Number of radial basis functions:?k
- Learning rate for the optimization algorithm: alpha (α)
- Regularization parameter to prevent overfitting: lambda (λ)
- Number of training iterations:?T
- Loss function to be optimized:?L
? If the number of radial basis functions is too small, the network may not be able to capture the complexity of input-output mapping, while if it is too large, the network may overfit to the training data.
? If the learning rate is too high, the optimization algorithm may overshoot the minimum of loss function, while if it is too low, the algorithm may converge too slowly.
? Techniques such as grid search, random search, and Bayesian optimization are used to automate the process of hyperparameter tuning.
The Why:
Reasons to use RBF Neural Networks:
- Suitable for a wide range of applications where dataset has highly non-linear relationships.
- Effective at interpolating data, meaning they can accurately estimate output values for input data that falls between the training examples. This makes them useful in applications where continuous outputs are desired.
- Do not require explicit feature engineering, as they are able to automatically learn features from the input data.
- Robust to noise in the input data, as the use of radial basis functions allows them to focus on important features of the data and ignore noisy features.
- Can be trained faster than other types of neural networks, as they only require the estimation of centers of radial basis functions and weights between hidden and output layers.
The Why Not:
Reasons to not use RBF Neural Networks:
- Can be more complex to implement and tune than other types of neural networks, as they require tuning of RBF specific parameter as well.
- Less scalable than other types of neural networks.
- Prone to overfitting if the number of radial basis functions is too large, which leads to poor generalization performance on unseen data.
- Less interpretable than other types of ML models, as the meaning of radial basis functions and weights may not be immediately clear.
Time for you to support:
- Reply to this email with your question
- Forward/Share to a friend who can benefit from this
- Chat on Substack with BxD (here
)
- Engage with BxD on LinkedIN (here
)
In next edition, we will cover Recurrent Neural Networks.
Let us know your feedback!