"Deep Learning from Scratch" review
The New Year's eve is a particularly productive time for research and development. This time I took on Deep Learning.
There are so many self-proclamed AI experts nowadays, but hardly anyone understands the math. At the university, we used to learn physics from the base principles. The book Deep Learning from Scratch from O'Reilly is not free from errors and omissions, but it takes the right approach, from my point of view. By far, I am not through this book, but by the 1st chapter it became clear that the mysterious backpropagation is a form of the gradient descent method. Applying differential calculus for the first time in 20 years is heartwarming.
By the end of the second chapter I have built a 2 layer neural network to solve a very practical and pressing problem: by the middle of the year, predict the EBIT in the current financial year, given the known monthly revenues Jan-Jun. This prediction is necessary to "negotiate" the yearly corporate tax prepayments with the tax authority in July.
To the revenues I also added the inflation rate as an additional 7th characteristic. The model is a 2 layer network with a (7x7) matrix of weights W1 and "biases" B1 (1x7) passed to a non-linear "sigmoid" function
then turned into a scalar by a matrix multiplication with another set of weights W2 (7x1) and adding "bias" B2 (1x1):
领英推荐
def sigmoid(x: ndarray) -> ndarray: return 1/(1 + np.exp(-x))
def predict(X_data: ndarray, weights: dict[str, ndarray]) -> float:
N1 = np.dot(X_data, weights['W1']) + weights['B1']
O1 = sigmoid(N1)
P = np.dot(O1, weights['W2']) + weights['B2']
return P.item()
Lessons learned: the variables in the training set should better be of the same order of magnitude, because otherwise the function quickly diverges instead of converging.
import numpy as np
from numpy import ndarray
def forward_loss(X_batch: ndarray,
y_batch: ndarray,
weights: dict[str, ndarray]) -> tuple[float, dict[str, ndarray]]:
assert X_batch.shape[0] == y_batch.shape[0]
assert X_batch.shape[1] == weights['W1'].shape[0]
assert weights['B1'].shape[0] == 1
assert weights['B2'].shape[0] == weights['B2'].shape[1] == 1
M1 = np.dot(X_batch, weights['W1'])
N1 = M1 + weights['B1']
O1 = sigmoid(N1)
M2 = np.dot(O1, weights['W2'])
P = M2 + weights['B2']
loss = np.mean(np.power(y_batch - P, 2))
forward_info: dict[str, ndarray] = \
{'X': X_batch,
'y': y_batch,
'M1': M1,
'N1': N1,
'O1': O1,
'M2': M2,
'P': P
}
return loss, forward_info
def loss_gradients(forward_info: dict[str, ndarray],
weights: dict[str, ndarray]) -> dict[str, ndarray]:
dLdP = -2 * (forward_info['y'] - forward_info['P'])
dPdB2 = np.ones_like(weights['B2'])
dLdB2 = (dLdP * dPdB2).sum(axis=0)
dPdM2 = np.ones_like(forward_info['M2'])
dLdM2 = dLdP * dPdM2
dM2dW2 = np.transpose(forward_info['O1'], axes=(1,0))
dLdW2 = np.dot(dM2dW2, dLdM2)
dM2dO1 = np.transpose(weights['W2'], axes=(1,0))
dLdO1 = np.dot(dLdM2, dM2dO1)
dO1dN1 = sigmoid(forward_info['N1']) * (1 - sigmoid(forward_info['N1']))
dLdN1 = dLdO1 * dO1dN1
dN1dB1 = np.ones_like(weights['B1'])
dLdB1 = (dLdN1 * dN1dB1).sum(axis=0)
dN1dM1 = np.ones_like(forward_info['M1'])
dLdM1 = dLdN1 * dN1dM1
dM1dW1 = np.transpose(forward_info['X'], axes=(1,0))
dLdW1 = np.dot(dM1dW1,dLdM1)
loss_gradients: dict[str, ndarray] = {
'W2': dLdW2,
'B2': dLdB2,
'W1': dLdW1,
'B1': dLdB1
}
return loss_gradients
def train(X_batch: ndarray,
y_batch: ndarray,
learning_rate = 0.1) -> tuple[float, dict[str, ndarray]]:
assert X_batch.shape[0] == y_batch.shape[0]
# Apply the permutation to shuffle data
permutation = np.random.permutation(X_batch.shape[0])
X_train = X_batch[permutation]
y_train = y_batch[permutation]
loss, forward_info = forward_loss(X_train, y_train, weights)
loss_grads = loss_gradients(forward_info, weights)
# gradient descent
for key in weights.keys():
weights[key] -= learning_rate * loss_grads[key]
return loss, weights
def init_weights(input_size: int, hidden_size: int) -> dict[str, ndarray]:
weights: dict[str, ndarray] = {}
weights['W1'] = np.random.randn(input_size, hidden_size)
weights['B1'] = np.random.randn(1, hidden_size)
weights['W2'] = np.random.randn(hidden_size, 1)
weights['B2'] = np.random.randn(1, 1)
return weights
scaling=10000.0
''' Jan Feb Mar Apr Mai Jun VPI 2020*10^2'''
X_batch = np.array(
[
[9944, 14794, 13152, 20178, 18125, 14435, 102.8*100],
[16703, 16423, 23350, 17141, 22658, 23456, 111.6*100],
[8835, 17689, 25607, 20914, 23648, 17342, 120.3*100],
[16081, 20815, 19549, 21915, 16515, 17444, 124.4*100]
])/scaling
y_batch = np.array(
[
[175406],
[211045],
[214053],
[219846]
])/scaling
weights = init_weights(X_batch.shape[1], X_batch.shape[1])
epoch: int = 1
loss = scaling
while epoch < 10000 and loss > 0.0001:
loss, weights = train(X_batch, y_batch,0.001)
print("Epoch: {0}, loss {1}".format(epoch, loss))
epoch += 1
print("Weights:", weights)
# test the model
input = np.array([...])
prediction = predict(input/scaling, weights)
print("RESULT:", prediction * scaling)
The analysis of the weights when the 2nd layer is just one "neuron" shows that the February and March are most significant / characteristic. Indeed, in January the economic activity is at the lowest, and the income fluctuates a lot.
Adding the number of FTE's working for the company led to an interesting result: doubling the workforce while keeping the same revenues month by month brought the predicted FY revenue down, which is wrong from both the practical and theoretical perspective.
The next step will be to add the expenses to account for salaries, administrative costs and depreciation and predict the EBIT, given that the salaries in Austria are bound to inflation.
Change and Transformation | Interim Manager | Digital Implementation & Execution | Project Management | Business Analysis | I
2 个月Unless you’re the android Kryten in ??Red Dwarf?? who becomes human …. https://youtu.be/0ofl_UP3apM