Building a Neural Network from Scratch: What I Learned
Intro
Have you ever scanned a PDF and then tried to search it?
Typically a machine cannot read the content of that document because it now "looks like an image", rather than text. Reading the document as you intended was a hard problem for machines to solve until neural networks entered the scene. We are now able to train complex models that can understand these documents and essentially reinterpret the image back into machine-readable text. What's more - these same concepts can be used for a broad variety of tasks including anomaly detection, natural language generation, and time series forecasting. The same technology has such diversity in application that there are use cases for it no matter where you look. Let's go into how neural networks actually work so that you the reader can come away with a deeper understanding of what powers the massive models that we indirectly interact with more and more every day.
Rooted in Mathematics and Statistics
Neural networks - are they Computer Science? Tangentially yes; at the surface level we need to leverage the massive compute infrastructure we as a society have already built (CPUs, GPUs, and TPUs), and we do that through code. But more to the point, neural networks are a form of applied statistics. This fact certainly surprised me as I learned more about them. A neural network is a series of linear and non-linear transformations applied to a given data input (or batch of inputs). As data flows through the network, it gets multiplied by the weights of the network. Let's take an example where we are classifying images into one of multiple classes, an example we'll come back to throughout this post:
This is a typical example of how a network flows. They work based on the statistical principle of Maximum Likelihood Estimation (MLE) - using all of the examples in the training dataset, our best way to predict the label of a novel example is to "take the average of what we've seen". In practice it is more involved than taking averages since there are complex non-linearities at work, but this is a good mental model to work with.
Back-propagation is where the learning happens - this process seemed like magic to me before understanding the math. Back-propagation is computational math at its finest - the process is comprised of repeatedly and recursively taking partial derivatives of a massively multivariate equation by leveraging the chain rule. The chain rule allows us to decompose functions of functions one segment at a time, allowing for a compositional approach that modern algorithms like PyTorch's computation graph use for optimized processing. Going back to our example of classifying images, let's look at the math.
Mathematically, a network can be represented as one loong equation:
To compute the change in weight, we need to take the partial derivative of the loss with respect to that weight:
领英推荐
We can continue this process step by step backwards through the network to compute the change in loss with respect to each parameter in the network. Computing the actual derivatives is beyond the scope of this blog post, so I will leave that as an exercise for the reader.
The process of back-propagation generates a set of gradients - one for each tunable parameter, or weight, in the network. This gradient signifies the direction of travel along each dimension that maximizes the slope of the function. We can now understand how much the loss will change and in what direction when we change each parameter in the network - there can sometimes be billions for larger models, so this is great knowledge to have. This information is essential for us to nudge the model in the direction of the objective, typically exactly opposite to the way the gradient points. Next time through the network, we'll be just a little closer to the outcome that we desire.
Your Choice - Train Models... or Skip Training
Training models doesn't have to be a frustrating process in trial-and-error. Although iterative improvement is an unavoidable (and often enjoyable) part of the model deployment process, you can leverage managed services and pre-infused domain expertise to make things easier on your Data Science team. Google Cloud has distilled the best of our AI knowledge and learnings into our Data Science experimentation platform, Vertex AI. We've made complex model architectures available (Tabular Wide & Deep, Tabular for TabNet, Tabular for AutoML: more info on these here), put visualizations of training and deployment processes together with a point of view on the MLOps pipeline, and pre-trained a solid set of foundational models including the multimodal Gemini 1.5 Pro that should serve a broad variety of use cases well.
Taking it a step further, Google's AutoML technology does what the name implies - it automates your ML model building. AutoML can perform classification, forecasting, and regression on tabular data; it can additionally operate on images, text, or video; and also serves the edge with AutoML Edge. Give it a training budget and a little bit of labeled data, and it will find the best model architecture and hyperparameters for you.
Use a method above to build a model from scratch to learn the ins and outs, then scale your knowledge with Google Cloud technologies and let us worry about the details.
Or - don't even worry about training a model. Use our managed offerings to leverage prebuilt assets and get going on the business innovation right away:
Conclusion
Neural networks underlie LLMs and many other AI applications today - they are widely applicable and create powerful internal representations of the data without the operator having to provide much beyond the raw data. Research will continue to rapidly advance everything from self-driving cars to reconstructing barely-visible images of distant galaxies; this stuff isn't going away anytime soon! Given this, it's in your best interest (you, the reader) to skill up in this area and understand how they work so that you can come to your next AI conversation fully informed of the capabilities.
My parting call to action to you: I challenge you to train your own neural network, whether from scratch or using existing libraries, to learn more about how they work. Use this PyTorch tutorial to get started.
Google Cloud AI/ML, Gen AI Customer Engineer | Retail Supply Chain & Decision Science | PADI Divemaster
5 个月Great job Ashish, this is some solid work!
You come a long way from testing out Data models from RaderLogic . Hope u are doing well