"Deep Learning from Scratch" review
by Seth Weidman

"Deep Learning from Scratch" review

The New Year's eve is a particularly productive time for research and development. This time I took on Deep Learning.

There are so many self-proclamed AI experts nowadays, but hardly anyone understands the math. At the university, we used to learn physics from the base principles. The book Deep Learning from Scratch from O'Reilly is not free from errors and omissions, but it takes the right approach, from my point of view. By far, I am not through this book, but by the 1st chapter it became clear that the mysterious backpropagation is a form of the gradient descent method. Applying differential calculus for the first time in 20 years is heartwarming.

By the end of the second chapter I have built a 2 layer neural network to solve a very practical and pressing problem: by the middle of the year, predict the EBIT in the current financial year, given the known monthly revenues Jan-Jun. This prediction is necessary to "negotiate" the yearly corporate tax prepayments with the tax authority in July.

To the revenues I also added the inflation rate as an additional 7th characteristic. The model is a 2 layer network with a (7x7) matrix of weights W1 and "biases" B1 (1x7) passed to a non-linear "sigmoid" function


Sigmoid function

then turned into a scalar by a matrix multiplication with another set of weights W2 (7x1) and adding "bias" B2 (1x1):

def sigmoid(x: ndarray) -> ndarray: return 1/(1 + np.exp(-x))

def predict(X_data: ndarray, weights: dict[str, ndarray]) -> float:
    N1 = np.dot(X_data, weights['W1']) + weights['B1']
    O1 = sigmoid(N1)
    P  = np.dot(O1, weights['W2']) + weights['B2']
    return P.item()        

Lessons learned: the variables in the training set should better be of the same order of magnitude, because otherwise the function quickly diverges instead of converging.

import numpy as np
from numpy import ndarray

def forward_loss(X_batch: ndarray,
                 y_batch: ndarray,
                 weights: dict[str, ndarray]) -> tuple[float, dict[str, ndarray]]:

    assert X_batch.shape[0] == y_batch.shape[0]
    assert X_batch.shape[1] == weights['W1'].shape[0]
    assert weights['B1'].shape[0] == 1
    assert weights['B2'].shape[0] == weights['B2'].shape[1] == 1

    M1 = np.dot(X_batch, weights['W1'])
    N1 = M1 + weights['B1']
    O1 = sigmoid(N1)
    M2 = np.dot(O1, weights['W2'])
    P  = M2 + weights['B2']
    loss = np.mean(np.power(y_batch - P, 2))

    forward_info: dict[str, ndarray] = \
        {'X': X_batch,
         'y': y_batch,
         'M1': M1,
         'N1': N1,
         'O1': O1,
         'M2': M2,
         'P': P
         }
    return loss, forward_info

def loss_gradients(forward_info: dict[str, ndarray],
                   weights: dict[str, ndarray]) -> dict[str, ndarray]:
    dLdP = -2 * (forward_info['y'] - forward_info['P'])
    dPdB2 = np.ones_like(weights['B2'])
    dLdB2 = (dLdP * dPdB2).sum(axis=0)

    dPdM2 = np.ones_like(forward_info['M2'])
    dLdM2 = dLdP * dPdM2
    dM2dW2 = np.transpose(forward_info['O1'], axes=(1,0))
    dLdW2 = np.dot(dM2dW2, dLdM2)

    dM2dO1 = np.transpose(weights['W2'], axes=(1,0))
    dLdO1  = np.dot(dLdM2, dM2dO1)
    dO1dN1 = sigmoid(forward_info['N1']) * (1 - sigmoid(forward_info['N1']))
    dLdN1  = dLdO1 * dO1dN1
    dN1dB1 = np.ones_like(weights['B1'])
    dLdB1  = (dLdN1 * dN1dB1).sum(axis=0)

    dN1dM1 = np.ones_like(forward_info['M1'])
    dLdM1  = dLdN1 * dN1dM1
    dM1dW1 = np.transpose(forward_info['X'], axes=(1,0))
    dLdW1  = np.dot(dM1dW1,dLdM1)

    loss_gradients: dict[str, ndarray] = {
    'W2': dLdW2,
    'B2': dLdB2,
    'W1': dLdW1,
    'B1': dLdB1
    }
    return loss_gradients

def train(X_batch: ndarray,
          y_batch: ndarray,
          learning_rate = 0.1) -> tuple[float, dict[str, ndarray]]:

    assert X_batch.shape[0] == y_batch.shape[0]

    # Apply the permutation to shuffle data
    permutation = np.random.permutation(X_batch.shape[0])
    X_train = X_batch[permutation]
    y_train = y_batch[permutation]

    loss, forward_info = forward_loss(X_train, y_train, weights)
    loss_grads = loss_gradients(forward_info, weights)
    # gradient descent
    for key in weights.keys():
        weights[key] -=  learning_rate * loss_grads[key]

    return loss, weights

def init_weights(input_size: int, hidden_size: int) -> dict[str, ndarray]:
    weights: dict[str, ndarray] = {}
    weights['W1'] = np.random.randn(input_size, hidden_size)
    weights['B1'] = np.random.randn(1, hidden_size)
    weights['W2'] = np.random.randn(hidden_size, 1)
    weights['B2'] = np.random.randn(1, 1)
    return weights

scaling=10000.0
'''     Jan     Feb     Mar     Apr     Mai     Jun     VPI 2020*10^2'''
X_batch = np.array(
    [
        [9944,	14794,	13152,	20178,	18125,	14435,	102.8*100],
        [16703,	16423,	23350,	17141,	22658,	23456,	111.6*100],
        [8835,	17689,	25607,	20914,	23648,	17342,	120.3*100],
        [16081,	20815,	19549,	21915,	16515,	17444,	124.4*100] 
    ])/scaling
y_batch = np.array(
    [
        [175406],
        [211045],
        [214053],
        [219846]
    ])/scaling

weights = init_weights(X_batch.shape[1], X_batch.shape[1])
epoch: int = 1
loss = scaling
while epoch < 10000 and loss > 0.0001:
    loss, weights = train(X_batch, y_batch,0.001)
    print("Epoch: {0}, loss {1}".format(epoch, loss))
    epoch += 1

print("Weights:", weights)

# test the model
input = np.array([...])
prediction = predict(input/scaling, weights)
print("RESULT:", prediction * scaling)        

The analysis of the weights when the 2nd layer is just one "neuron" shows that the February and March are most significant / characteristic. Indeed, in January the economic activity is at the lowest, and the income fluctuates a lot.

Adding the number of FTE's working for the company led to an interesting result: doubling the workforce while keeping the same revenues month by month brought the predicted FY revenue down, which is wrong from both the practical and theoretical perspective.

The next step will be to add the expenses to account for salaries, administrative costs and depreciation and predict the EBIT, given that the salaries in Austria are bound to inflation.

Adam Jacobi MBA ????????????

Change and Transformation | Interim Manager | Digital Implementation & Execution | Project Management | Business Analysis | I

2 个月

Unless you’re the android Kryten in ??Red Dwarf?? who becomes human …. https://youtu.be/0ofl_UP3apM

要查看或添加评论,请登录

Eugen Glasow的更多文章

  • Fotografie, auf das kleinstm?gliche Gewicht optimiert

    Fotografie, auf das kleinstm?gliche Gewicht optimiert

    Wer schon einmal auf einem Berg den Zustand erlebt hat, in dem die gesamte Glukose im Blut aufgebraucht ist (was in…

    1 条评论
  • Elektroauto kaputt, Batterie intakt

    Elektroauto kaputt, Batterie intakt

    Die Elektromobilit?t ist in aller Munde, die Erfahrung ist aber rar. Ich habe eine lehrreiche Geschichte zu erz?hlen.

    2 条评论

社区洞察

其他会员也浏览了