登录查看更多内容

Deciphering handwritten numbers with a Neural Network

Harpreet Singh

I write about Entrepreneurship, AI, and Meditation | Head of AI, VP Product@CloudBees | co-CEO, co-Founder, Board Member of Launchable | Ex-Atlassian/Sun Microsystems | Founder with Exit | Startup Advisor | Author

发布日期: 2018年2月21日

I got around to building a NN using multilayer perceptrons (MLP) to recognise handwritten numerals. This is a classic beginners NN problem. In this blog, I will focus on comparing the predictions of the NN with different parameters.

The Problem/Data

The MNIST database was used for the problem. MNIST has about 70,000 different handwritten numbers. The data is well arranged; gray-scale; in 28x28 matrix; centred in the matrix; that makes this a "relatively" easy problem to tackle.

As you can see from the serif on 1 that the numbers are slightly different and classifying these right is the gist of the problem. The NN views each image as a grayscale image i.e 0-255 bitmap.

The NN Architecture

I have built a 2 layer NN that takes in 784 (28x28) input nodes. The 784 is "flattened 28x28" into a single row because MLPs cannot understand multidimensional inputs. The NN outputs 10 nodes (0-9). The final activation function is a softmax as in it gives us a probability of a number being either 0, 1 ...9. The architecture looks like following:

The Impact of the hyper-parameters

I ran about 10 experiments to see the impact of the hyper-parameters on the prediction capabilities of the NN.

The boxes in green show the parameters that were tuned.

The accuracy w/o training is the number without training the network. As you can see, without training most NN's performed close to 10% that is the NN might as well be guessing the answer (0-9 i.e 1/10 probability of estimating it right).

The prediction test data is how the NN performed after the training on the training data. This is what we are after.

The validation error is how close the NN is to the actual answer on every run (or epoch). This is a key parameter to observe if you want to make sure that the data is not over-fitting. I think of overfitting as the student who gets the paper before the exam is extremely well prepared for the exam but when he goes to the real world, he is in trouble.

Interesting observations:

Increasing the batch size or the rows of data that the NN can digest increases the speed of learning.
Changing the activation function has a good chunk size impact on the accuracy of the NN. Sigmoid's are not in flavour and we can see that the NN lost about 1 percent accuracy.
My hypothesis that I could increase the accuracy of the NN if I increased the number of nodes or the depth of the NN was disproven. I am not quite sure why.
Changing the optimiser has the biggest impact on the performance of a NN. This isn't surprising at all. Optimisers are the functions that perform the gradient descent to converge to the solution and gradient descent is what makes a NN work. Choosing an inefficient gradient descent is going to nuke your results.

Summary

Finding the right hyper-parameter within a NN is key to its performance and finding the numbers is based on experiments rather than on theory.

I am starting to enjoy NN with libraries such as Keras. Building a simpler NN in Python was like getting a root canal :-) and I don't think I could do the MNIST data in Python.

I am beginning to be dazzled by NNs. If you had told me a week earlier that I could sit down and write a program to understand handwritten numbers to predict what they were - I wouldn't have believed you. And here I started off by saying this problem was "relatively" easy to solve. Amazing!

Disclaimer: Most of the source code was provided by Udacity - I filled along the key NN architecture. Here is the complete source from Udacity.

要查看或添加评论，请登录

Harpreet Singh的更多文章

Learning to thrive as a minority

2022年3月21日

Learning to thrive as a minority

We have a tradition at Launchable to have a virtual beer together at the end of the week on Friday. It's a way to…

12 条评论
Building a startup - 9 lessons from my hike to Everest Base Camp

2022年2月24日

Building a startup - 9 lessons from my hike to Everest Base Camp

(originally published on Launchable blog & harpreet.io) We were completely tired after a 6 hour hike to Phakding (8.

3 条评论
Get Sh*t Done days - making time for deep work

2022年2月23日

Get Sh*t Done days - making time for deep work

I sighed with resignation as I looked at my completely packed calendar. “Quality” work delivered between meetings or…

8 条评论
Manage focus with Pomodoro Technique

2018年7月19日

Manage focus with Pomodoro Technique

I have long believed and now I know that is backed by research that multi-tasking doesn’t quite work. When you switch…

4 条评论
Four leadership principles to become a good leader

2018年4月17日

Four leadership principles to become a good leader

“Look at what you have built – did you ever imagine that this was going to be so big?” A colleague asked me this…
The superhero anti-pattern in scaling teams

2018年3月15日

The superhero anti-pattern in scaling teams

You know her; you have seen him in action; and too often you have been one — a superhero who flew down, laid yourself…

6 条评论
DevOptics - Deliver business value from DevOps investment

2018年3月8日

DevOptics - Deliver business value from DevOps investment

So you got sold Digital Transformation from your consultants as a means of winning in the market. While your…
How do they detect faces in pictures?

2018年3月6日

How do they detect faces in pictures?

I was fascinated when Facebook launched the feature where it put a box around a human head (and a bit creeped out when…

1 条评论
Recipe for running a successful Product Portfolio planning summit

2018年3月2日

Recipe for running a successful Product Portfolio planning summit

As product managers, product roadmapping and planning is one of the most rewarding activities that we do in our jobs…

1 条评论
Predicting admissions into UCLA using a Neural Network

2018年2月15日

Predicting admissions into UCLA using a Neural Network

This blog documents my journey as I learn Deep Learning through Udacity. My personal challenges I have been a Java guy…

1 条评论

See all articles

Deciphering handwritten numbers with a Neural Network

Harpreet Singh

I write about Entrepreneurship, AI, and Meditation | Head of AI, VP Product@CloudBees | co-CEO, co-Founder, Board Member of Launchable | Ex-Atlassian/Sun Microsystems | Founder with Exit | Startup Advisor | Author

Harpreet Singh的更多文章

社区洞察

其他会员也浏览了

The Hierarchical Temporal Memory (HTM) Algorithm

Read our Editor's Choice Article "On Sequential Bayesian Inference for Continual Learning"

Neural networks and Finance

Breakthrough: Zero-Weight LLM for Accurate Predictions and High-Performance Clustering

The Emergence of Machine Learning in Forecasting– a Field Where Statistical Models Dominate

Power of Regularization: Simplifying L1 and L2 Math for Everyone

How Computers See Images: Turning Colors into Numbers

Layer Normalization

Harpreet Singh的更多文章

Learning to thrive as a minority

Building a startup - 9 lessons from my hike to Everest Base Camp

Get Sh*t Done days - making time for deep work

Manage focus with Pomodoro Technique

Four leadership principles to become a good leader

The superhero anti-pattern in scaling teams

DevOptics - Deliver business value from DevOps investment

How do they detect faces in pictures?

Recipe for running a successful Product Portfolio planning summit

Predicting admissions into UCLA using a Neural Network

社区洞察

其他会员也浏览了

The Hierarchical Temporal Memory (HTM) Algorithm

Read our Editor's Choice Article "On Sequential Bayesian Inference for Continual Learning"

Neural networks and Finance

Breakthrough: Zero-Weight LLM for Accurate Predictions and High-Performance Clustering

The Emergence of Machine Learning in Forecasting– a Field Where Statistical Models Dominate

Power of Regularization: Simplifying L1 and L2 Math for Everyone

How Computers See Images: Turning Colors into Numbers

Layer Normalization