Optimization, Design of Experiments & Deep learning training
I have been interested in AI applications in mechanical engineering for some time, especially those related to image processing, object classification, and identification, with applications in autonomous vehicles.
Therefore, courses like the following offered by MathWorks:
helped me a lot to understand and apply concepts related to the use, creation and training of various types of networks.
An interesting point is that during the network training phase, applying tools such as design of experiments (DOE) and optimization is quite useful to maximize, for example, the accuracy of the network in learning in a more reliable way than only through the method of try to failure.
Next, I will explain how some of the Matlab functions are applied to improve the accuracy in learning the neural network.
The first step is that it was determined that there are 3 factors that determine how high the percentage of network learning is (0 to 100%)
The factors are A, B, C
A= Max Epoch
B= Learning rate
C= Validation Frequency
Then Generate the experimental runs for a?Box-Behnken design?in coded (normalized) variables [-1, 0, +1]:
CodedValue = bbdesign(3)
The first column is for the factor A the second column is for the factor B, and the third column is for the factor C. The value of the factor are:
Factor A: 15 to 40 ?no units.
Factor B: 15 to 50 ?no units.
Factor C: 15 to 50 ?no units.
The next step is to randomize the order of the runs, convert the coded design values to real-world units, and perform the experiment in the order specified.
The results of the experiments executed according to the random order generated are as shown in the following table:
-Next is to store the results in the TestResults array:
TestResult = [98.21 97.02 93.45 93.45 97.62 96.43 97.62 93.45 97.02 96.43 97.62 93.45 97.62 97.62 97.62]';
-Next, the design values and the response are displayed and stored.
Display designed values & response
A quick analysis of the data shown in the results table shows a certain trend, but it does not seem decisive, so I carried an analysis using optimization tools out to make as much as possible at least 99% accuracy in the network learning.
The next step is to propose a useful function to know the optimal values of the parameters A, B & C.
A second-degree function could be enough to model the phenomenon. The proposed function is:
AC = b0+(b1*M)+(b2*L)+(b3*V)+(b4*M*L)+(b5?M?V)+(b6?L?V)+(b7?M^2)+(b8?L^2)+(b9?V^2)
Where?AC?is the Accuracy and?bi?is the coefficient for the term?i. Estimate the coefficients of this model using the?fitlm?function from Statistics and Machine Learning Toolbox:
mdl = fitlm(Expmt,'Airflow~ M*L*V-M:L:V+M^2+L^2+V^2');
Then display the magnitudes of the coefficients (for normalized values) in a bar chart.
figure()
h = bar(mdl.Coefficients.Estimate(2:10));
set(h,'facecolor',[0.8 0.8 0.9])
legend('Coefficient')
set(gcf,'units','normalized','position',[0.05 0.4 0.35 0.4])
set(gca,'xticklabel',mdl.CoefficientNames(2:10))
ylabel('Accurancy(%)')
xlabel('Normalized Coefficient')
title('Quadratic Model Coefficients')
?
领英推荐
Then type mdl.Coefficients in order to know the values of the coefficients of the proposed function and mdl.Rsquared.Ordinary helps us to evaluate how well the proposed function fits the results of the experiments (% Accurrancy).
As long as mdl.Rsquared the value is closer to 1, it means that the fit between the data and the proposed function is well.
The previous results show that the proposed function fits the data of the experiments well, but something ideal would be for the value of Rsquared to be at least 0.98, so I should carry out a deeper analysis.
For example, generating generate response surface plots using the plotSlice function, which helps to quantify the effects of factors on AC and is also useful as a graphical method to identify optimal values in % learning accuracy.
The following graphs show the gap in the fit of the proposed model vs the data with the space between the graphs in red and the graph in green, besides a non-linear relationship, but relatively close to it, so the next step is to use a linear model to verify the fit, in case the value of Rsquared remains below 0.98, it is necessary to test a cubic model (third degree).
None of the following linear models meets the criterion of Rsquared >= 0.98. Therefore, I tested a model of degree 3.
Linear model proposal (1)
Linear model proposal (2)
The proposed third-degree model appears below, which perfectly fits the data of the experiments, in addition, the results of generating generate response Surface plots show an exact overlap, so it is possible to proceed to optimizing the values of the factors M, L & V.
The best option to optimize the third-degree function is to generate a custom objective function, which is shown below.
function F = root2d(x)
F(1) = -(97.62 -(0.745*x(1))-(0.595*x(2))-(1.19*x(3))+(0.2975*x(1)*x(2))-(0.745*x(1)*x(3))-(0.895*x(2)*x(3))-(0.96875*x(1)^2)-(1.1187*x(2)^2) - (0.37125*x(3)^2) + (0.2975*x(1)^2*x(2)) -(1.3375*x(1)*x(2)^2) - (0.15*x(1)^2*x(3)) + (0*x(2)^2*x(3)) + (0*x(1)*x(3)^2) + (0*x(2)*x(3)^2) + (0*x(1)^3) + (0*x(2)^3) + (0*x(3)^3));
Minimizing the negative accurancy using?fmincon?is the same as maximizing the original objective function. The constraints are the upper and lower limits tested (in coded values). Set the initial starting point to be the center of the design of the experimental test matrix.
lb = [-1 -1 -1]; % Lower bound????????????????????????????? ????????
ub = [1 1 1];?? ?% Upper bound????????????????????? ?
x0 = [0 0 0];?? ?% Starting point
[optfactors,fval] = fmincon(@root2d,x0,[],[],[],[],lb,ub,[]); % Invoke the solver
Then convert the results to a maximization problem and real-world units
maxval = -fval;
maxloc = (optfactors + 1)';
bounds = [15 40;15 50;10 50];
maxloc=bounds(:,1)+maxloc .* ((bounds(:,2) - bounds(:,1))/2);
disp('Optimal Values:')
disp({'Distance','Pitch','Clearance','Airflow'})
disp([maxloc' maxval])
Therefore, the optimal values to get the maximum potential value of the Accuracy% in learning only considering the factors related to the training options:
Max Epoch = 27.6273
Learning Rate = 34.8419
Frequency = 10
The following graph shows the values of the Max Epoch constant (27.6273) vs Frequency & Learning Rate
The following graph shows the values of the Learning Rate constant (34.84) vs Max Epoch & Frequency.
The following graph shows the values of Learning Rate (10) vs Max Epoch & Learning Rate optimization.
All the above is a sample of how useful the design of experiments and optimization can be in any field, in this case during the training phase of a network.