登录查看更多内容

Building a Neural Network from Scratch: What I Learned

Ashish Narasimham

发布日期: 2024年9月30日

+ 关注

Intro

Have you ever scanned a PDF and then tried to search it?

Example of a page scanned from physical to digital

Typically a machine cannot read the content of that document because it now "looks like an image", rather than text. Reading the document as you intended was a hard problem for machines to solve until neural networks entered the scene. We are now able to train complex models that can understand these documents and essentially reinterpret the image back into machine-readable text. What's more - these same concepts can be used for a broad variety of tasks including anomaly detection, natural language generation, and time series forecasting. The same technology has such diversity in application that there are use cases for it no matter where you look. Let's go into how neural networks actually work so that you the reader can come away with a deeper understanding of what powers the massive models that we indirectly interact with more and more every day.

Rooted in Mathematics and Statistics

Neural networks - are they Computer Science? Tangentially yes; at the surface level we need to leverage the massive compute infrastructure we as a society have already built (CPUs, GPUs, and TPUs), and we do that through code. But more to the point, neural networks are a form of applied statistics. This fact certainly surprised me as I learned more about them. A neural network is a series of linear and non-linear transformations applied to a given data input (or batch of inputs). As data flows through the network, it gets multiplied by the weights of the network. Let's take an example where we are classifying images into one of multiple classes, an example we'll come back to throughout this post:

General structure of a Convolutional Neural Network, the type of network that can learn from images

This is a typical example of how a network flows. They work based on the statistical principle of Maximum Likelihood Estimation (MLE) - using all of the examples in the training dataset, our best way to predict the label of a novel example is to "take the average of what we've seen". In practice it is more involved than taking averages since there are complex non-linearities at work, but this is a good mental model to work with.

Back-propagation is where the learning happens - this process seemed like magic to me before understanding the math. Back-propagation is computational math at its finest - the process is comprised of repeatedly and recursively taking partial derivatives of a massively multivariate equation by leveraging the chain rule. The chain rule allows us to decompose functions of functions one segment at a time, allowing for a compositional approach that modern algorithms like PyTorch's computation graph use for optimized processing. Going back to our example of classifying images, let's look at the math.

Mathematically, a network can be represented as one loong equation:

To compute the change in weight, we need to take the partial derivative of the loss with respect to that weight:

领英推荐

Why AI Systems Still Confound Researchers

Quanta Magazine 1 年前

Decoding Neural Networks: Unraveling the AI Enigma

Karl Hirsch 11 个月前

Dissecting Backpropagation in Neural Networks

Saurav Prateek 1 个月前

We can continue this process step by step backwards through the network to compute the change in loss with respect to each parameter in the network. Computing the actual derivatives is beyond the scope of this blog post, so I will leave that as an exercise for the reader.

Sample network architecture and flow, forwards and backwards

The process of back-propagation generates a set of gradients - one for each tunable parameter, or weight, in the network. This gradient signifies the direction of travel along each dimension that maximizes the slope of the function. We can now understand how much the loss will change and in what direction when we change each parameter in the network - there can sometimes be billions for larger models, so this is great knowledge to have. This information is essential for us to nudge the model in the direction of the objective, typically exactly opposite to the way the gradient points. Next time through the network, we'll be just a little closer to the outcome that we desire.

Your Choice - Train Models... or Skip Training

Training models doesn't have to be a frustrating process in trial-and-error. Although iterative improvement is an unavoidable (and often enjoyable) part of the model deployment process, you can leverage managed services and pre-infused domain expertise to make things easier on your Data Science team. Google Cloud has distilled the best of our AI knowledge and learnings into our Data Science experimentation platform, Vertex AI. We've made complex model architectures available (Tabular Wide & Deep, Tabular for TabNet, Tabular for AutoML: more info on these here), put visualizations of training and deployment processes together with a point of view on the MLOps pipeline, and pre-trained a solid set of foundational models including the multimodal Gemini 1.5 Pro that should serve a broad variety of use cases well.

Taking it a step further, Google's AutoML technology does what the name implies - it automates your ML model building. AutoML can perform classification, forecasting, and regression on tabular data; it can additionally operate on images, text, or video; and also serves the edge with AutoML Edge. Give it a training budget and a little bit of labeled data, and it will find the best model architecture and hyperparameters for you.

Use a method above to build a model from scratch to learn the ins and outs, then scale your knowledge with Google Cloud technologies and let us worry about the details.

Leverage Google Cloud's Vertex AI Platform for models, training, MLOps, serving, and governance

Or - don't even worry about training a model. Use our managed offerings to leverage prebuilt assets and get going on the business innovation right away:

Go up the Vertex AI stack and don't worry about training a model - use existing ones and accelerate your work

Conclusion

Neural networks underlie LLMs and many other AI applications today - they are widely applicable and create powerful internal representations of the data without the operator having to provide much beyond the raw data. Research will continue to rapidly advance everything from self-driving cars to reconstructing barely-visible images of distant galaxies; this stuff isn't going away anytime soon! Given this, it's in your best interest (you, the reader) to skill up in this area and understand how they work so that you can come to your next AI conversation fully informed of the capabilities.

My parting call to action to you: I challenge you to train your own neural network, whether from scratch or using existing libraries, to learn more about how they work. Use this PyTorch tutorial to get started.

Amanda Cameron-Windsor

Google Cloud AI/ML, Gen AI Customer Engineer | Retail Supply Chain & Decision Science | PADI Divemaster

5 个月

Great job Ashish, this is some solid work!

Kenny Kwong

5 个月

You come a long way from testing out Data models from RaderLogic . Hope u are doing well

查看更多评论

要查看或添加评论，请登录

Ashish Narasimham的更多文章

Top AI Trends of 2025

2025年1月9日

Top AI Trends of 2025

Where do LLMs fit within the world of AI? The buzz of the past two years has been focused on that small maroon dot…

4 条评论
Behind the Generative AI Buzz: Transformers, LLMs, and the Future of Business

2024年7月8日

Behind the Generative AI Buzz: Transformers, LLMs, and the Future of Business

Intro Generative AI has quite a buzz these days; by giving a broad overview of the world of transformers and the LLMs…

2 条评论

Building a Neural Network from Scratch: What I Learned

Ashish Narasimham

Intro

Rooted in Mathematics and Statistics

领英推荐

Your Choice - Train Models... or Skip Training

Conclusion

Ashish Narasimham的更多文章

社区洞察

其他会员也浏览了

The Approximation Power of Neural Networks (with Python codes)

Neural Networks & Large Language Models

A Comprehensive Overview of Graph Neural Networks (GNNs)

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

How to create a Neural Network

Use Neural Networks to be more Precise in Technical Analysis

Understanding Neural Networks

Gradient Descent

Neural Network and it’s Industry Use Cases !!

Industry use cases of Neural Networks

Intro

Rooted in Mathematics and Statistics

领英推荐

Your Choice - Train Models... or Skip Training

Conclusion

Ashish Narasimham的更多文章

Top AI Trends of 2025

Behind the Generative AI Buzz: Transformers, LLMs, and the Future of Business

社区洞察

其他会员也浏览了

The Approximation Power of Neural Networks (with Python codes)

Neural Networks & Large Language Models

A Comprehensive Overview of Graph Neural Networks (GNNs)

Grokking: A Deep Dive into Delayed Generalization in Neural Networks

How to create a Neural Network

Use Neural Networks to be more Precise in Technical Analysis

Understanding Neural Networks

Gradient Descent

Neural Network and it’s Industry Use Cases !!

Industry use cases of Neural Networks