登录查看更多内容

"Deep Learning from Scratch" review

Eugen Glasow

Dynamics ERP Architect at ER-Consult GmbH

发布日期: 2025年1月2日

The New Year's eve is a particularly productive time for research and development. This time I took on Deep Learning.

There are so many self-proclamed AI experts nowadays, but hardly anyone understands the math. At the university, we used to learn physics from the base principles. The book Deep Learning from Scratch from O'Reilly is not free from errors and omissions, but it takes the right approach, from my point of view. By far, I am not through this book, but by the 1st chapter it became clear that the mysterious backpropagation is a form of the gradient descent method. Applying differential calculus for the first time in 20 years is heartwarming.

By the end of the second chapter I have built a 2 layer neural network to solve a very practical and pressing problem: by the middle of the year, predict the EBIT in the current financial year, given the known monthly revenues Jan-Jun. This prediction is necessary to "negotiate" the yearly corporate tax prepayments with the tax authority in July.

To the revenues I also added the inflation rate as an additional 7th characteristic. The model is a 2 layer network with a (7x7) matrix of weights W1 and "biases" B1 (1x7) passed to a non-linear "sigmoid" function

then turned into a scalar by a matrix multiplication with another set of weights W2 (7x1) and adding "bias" B2 (1x1):

领英推荐

Understanding How LoRA Adapters Work!

Damien Benveniste, PhD 9 个月前

The importance of a test set

Daniel Bourke 7 个月前

The Backpropagation Algorithm!

Damien Benveniste, PhD 9 个月前

def sigmoid(x: ndarray) -> ndarray: return 1/(1 + np.exp(-x))

def predict(X_data: ndarray, weights: dict[str, ndarray]) -> float:
    N1 = np.dot(X_data, weights['W1']) + weights['B1']
    O1 = sigmoid(N1)
    P  = np.dot(O1, weights['W2']) + weights['B2']
    return P.item()

Lessons learned: the variables in the training set should better be of the same order of magnitude, because otherwise the function quickly diverges instead of converging.

import numpy as np
from numpy import ndarray

def forward_loss(X_batch: ndarray,
                 y_batch: ndarray,
                 weights: dict[str, ndarray]) -> tuple[float, dict[str, ndarray]]:

    assert X_batch.shape[0] == y_batch.shape[0]
    assert X_batch.shape[1] == weights['W1'].shape[0]
    assert weights['B1'].shape[0] == 1
    assert weights['B2'].shape[0] == weights['B2'].shape[1] == 1

    M1 = np.dot(X_batch, weights['W1'])
    N1 = M1 + weights['B1']
    O1 = sigmoid(N1)
    M2 = np.dot(O1, weights['W2'])
    P  = M2 + weights['B2']
    loss = np.mean(np.power(y_batch - P, 2))

    forward_info: dict[str, ndarray] = \
        {'X': X_batch,
         'y': y_batch,
         'M1': M1,
         'N1': N1,
         'O1': O1,
         'M2': M2,
         'P': P
         }
    return loss, forward_info

def loss_gradients(forward_info: dict[str, ndarray],
                   weights: dict[str, ndarray]) -> dict[str, ndarray]:
    dLdP = -2 * (forward_info['y'] - forward_info['P'])
    dPdB2 = np.ones_like(weights['B2'])
    dLdB2 = (dLdP * dPdB2).sum(axis=0)

    dPdM2 = np.ones_like(forward_info['M2'])
    dLdM2 = dLdP * dPdM2
    dM2dW2 = np.transpose(forward_info['O1'], axes=(1,0))
    dLdW2 = np.dot(dM2dW2, dLdM2)

    dM2dO1 = np.transpose(weights['W2'], axes=(1,0))
    dLdO1  = np.dot(dLdM2, dM2dO1)
    dO1dN1 = sigmoid(forward_info['N1']) * (1 - sigmoid(forward_info['N1']))
    dLdN1  = dLdO1 * dO1dN1
    dN1dB1 = np.ones_like(weights['B1'])
    dLdB1  = (dLdN1 * dN1dB1).sum(axis=0)

    dN1dM1 = np.ones_like(forward_info['M1'])
    dLdM1  = dLdN1 * dN1dM1
    dM1dW1 = np.transpose(forward_info['X'], axes=(1,0))
    dLdW1  = np.dot(dM1dW1,dLdM1)

    loss_gradients: dict[str, ndarray] = {
    'W2': dLdW2,
    'B2': dLdB2,
    'W1': dLdW1,
    'B1': dLdB1
    }
    return loss_gradients

def train(X_batch: ndarray,
          y_batch: ndarray,
          learning_rate = 0.1) -> tuple[float, dict[str, ndarray]]:

    assert X_batch.shape[0] == y_batch.shape[0]

    # Apply the permutation to shuffle data
    permutation = np.random.permutation(X_batch.shape[0])
    X_train = X_batch[permutation]
    y_train = y_batch[permutation]

    loss, forward_info = forward_loss(X_train, y_train, weights)
    loss_grads = loss_gradients(forward_info, weights)
    # gradient descent
    for key in weights.keys():
        weights[key] -=  learning_rate * loss_grads[key]

    return loss, weights

def init_weights(input_size: int, hidden_size: int) -> dict[str, ndarray]:
    weights: dict[str, ndarray] = {}
    weights['W1'] = np.random.randn(input_size, hidden_size)
    weights['B1'] = np.random.randn(1, hidden_size)
    weights['W2'] = np.random.randn(hidden_size, 1)
    weights['B2'] = np.random.randn(1, 1)
    return weights

scaling=10000.0
'''     Jan     Feb     Mar     Apr     Mai     Jun     VPI 2020*10^2'''
X_batch = np.array(
    [
        [9944,	14794,	13152,	20178,	18125,	14435,	102.8*100],
        [16703,	16423,	23350,	17141,	22658,	23456,	111.6*100],
        [8835,	17689,	25607,	20914,	23648,	17342,	120.3*100],
        [16081,	20815,	19549,	21915,	16515,	17444,	124.4*100] 
    ])/scaling
y_batch = np.array(
    [
        [175406],
        [211045],
        [214053],
        [219846]
    ])/scaling

weights = init_weights(X_batch.shape[1], X_batch.shape[1])
epoch: int = 1
loss = scaling
while epoch < 10000 and loss > 0.0001:
    loss, weights = train(X_batch, y_batch,0.001)
    print("Epoch: {0}, loss {1}".format(epoch, loss))
    epoch += 1

print("Weights:", weights)

# test the model
input = np.array([...])
prediction = predict(input/scaling, weights)
print("RESULT:", prediction * scaling)

The analysis of the weights when the 2nd layer is just one "neuron" shows that the February and March are most significant / characteristic. Indeed, in January the economic activity is at the lowest, and the income fluctuates a lot.

Adding the number of FTE's working for the company led to an interesting result: doubling the workforce while keeping the same revenues month by month brought the predicted FY revenue down, which is wrong from both the practical and theoretical perspective.

The next step will be to add the expenses to account for salaries, administrative costs and depreciation and predict the EBIT, given that the salaries in Austria are bound to inflation.

Adam Jacobi MBA ????????????

2 个月

Unless you’re the android Kryten in ??Red Dwarf?? who becomes human …. https://youtu.be/0ofl_UP3apM

2 次回应

要查看或添加评论，请登录

Eugen Glasow的更多文章

Fotografie, auf das kleinstm?gliche Gewicht optimiert

2023年8月12日

Fotografie, auf das kleinstm?gliche Gewicht optimiert

Wer schon einmal auf einem Berg den Zustand erlebt hat, in dem die gesamte Glukose im Blut aufgebraucht ist (was in…

1 条评论
Elektroauto kaputt, Batterie intakt

2021年12月22日

Elektroauto kaputt, Batterie intakt

Die Elektromobilit?t ist in aller Munde, die Erfahrung ist aber rar. Ich habe eine lehrreiche Geschichte zu erz?hlen.

2 条评论

"Deep Learning from Scratch" review

Eugen Glasow

Dynamics ERP Architect at ER-Consult GmbH

领英推荐

Eugen Glasow的更多文章

社区洞察

其他会员也浏览了

Machine Learning Algorithms: A Concise Overview of the Most Popular and Effective Ones

Deep learning predicts ABC's Bachelorette/Bachelor #TheBachelorette

Machine Learning Guide for Petroleum Professionals: Part 3

On the Combinatorics of Chemical Compounds...

??"The Goldilocks Zone of Learning Rates: Finding the 'Just Right' Value" ??

Navigating the Terrain of Machine Learning: Unraveling the Power of Gradient Descent

The Latent Posterior Distribution ??????

Bayesian Optimization

The best AI course I have ever come across ??

What Humans can Learn from Machines

领英推荐

Eugen Glasow的更多文章

Fotografie, auf das kleinstm?gliche Gewicht optimiert

Elektroauto kaputt, Batterie intakt

社区洞察

其他会员也浏览了

Machine Learning Algorithms: A Concise Overview of the Most Popular and Effective Ones

Deep learning predicts ABC's Bachelorette/Bachelor #TheBachelorette

Machine Learning Guide for Petroleum Professionals: Part 3

On the Combinatorics of Chemical Compounds...

??"The Goldilocks Zone of Learning Rates: Finding the 'Just Right' Value" ??

Navigating the Terrain of Machine Learning: Unraveling the Power of Gradient Descent

The Latent Posterior Distribution ??????

Bayesian Optimization

The best AI course I have ever come across ??

What Humans can Learn from Machines