Chapter 2: Transformer architecture simplified: Neural Networks.

Chapter 2: Transformer architecture simplified: Neural Networks.

Continuing on from my first article:

https://www.dhirubhai.net/pulse/transformer-architecture-simplified-sort-kai-bergin-abo1e/?trackingId=qxGQ00kETbiXi8BPLZN9XQ%3D%3D

I'll now try and explain the second motor of the transformer architecture: The neural network. The neural network tackles a huge challenge in our goal to create fluent and frictionless interaction between humans and computers: "Human mess".

I think we can all agree that we are a generally messy species, we don’t really operate with our surroundings in a structured way, and we certainly don’t have anything resembling the logic gates or pathways of a circuit printed on silicon. A computer on the other hand, is built on a solid foundation of mathematics, machine code and programming languages.

For a computer to be able to interact successfully with humans it needs to be able to: Understand the subtleties of human language & It needs to be able to find structure and patterns in our "out of context noise".

Here's a wonderful series of films on how a computer works if you want to dig into how differently a computer is structured. If you have a month or two to kill: Highly recommended:

For humans, it's a lot more complicated. Unfortunately, you might have to reserve a lifetime of study, and still be fine with the fact that you die without finding any answers. However, for people who enjoy the journey more than the destination: this is a great series of philosophers and scientists talking about the concept of complexity and consciousness.

Whether it's by accident or design we humans just don’t “do” structure very well and this is exactly where neural networks shine, they are amazing tools for dealing with the incredible amounts of unstructured data humans produce. ??

Side note before we continue: For anyone that might think that a "neural" network actually works like a human brain: No, they do not” and to my knowledge no one working in AI or neurology thinks they do.

Lets start with an everyday challenge to show how a neural network untangles our mess.

The challenge: Tagging family members in an online photo album #mum and #dad.

(Facial recognition by any other name).

This is a really handy feature for organizing all those thousands of photos we take every day, let's kick things off.

Here’s a photo of my favorite pensioners:

taken in a gorgeous national park in West Australia

so this picture is taken on a new iPhone, its high resolution, HDR, there are a lot of objects in this photo (Australian national parks aren't very structured places), there are millions of pixels and colors and to make life even more difficult for our neural network there are two people that it's never seen before.

The network starts the journey by asking for some human help. It asks you, the user, to identify and tag a set of photos of faces that it thinks line up with the people in the photo.

Once you have done that the neural network has input (the photo) and an outcome (photos + #mum and #dad), now all it has to do is try and learn the rules and representations that made that outcome possible.

If you are technical, here are some great videos on this subject:

https://youtu.be/HGwBXDKFk9I?si=KC6dZE75wCwRWpSu (Intro maths of a CNN)

https://youtu.be/N_W4EYtsa10?si=yfpK_peYXb6148mt (Python face recognition walkthrough)

Before we continue, we need to define the basics, otherwise the next steps won't make any sense at all. There are three layers of abstraction that we need to go through before we get to neural networks.

Francois Chalot’s fantastic book “Deep learning with Python” gives us this start diagram:


Level 1: Artificial Intelligence:

We’ll start in 1956 with John McCarthy using the term “AI” at a conference.

I always think of this as a term from an American 1950’s sci-fi novel or film. I mean that was also the general vibe in the 2nd half of the 50’s, AI embodies a combination of techno-optimism + we won the war + space flight + Aliens from Hollywood. ?Now it's: “an umbrella term for computer software that mimics human cognition in order to perform complex tasks and learn from them”.

Some examples of 50s sci-fi

This is just my humble opinion: but Intelligence is a concept we barely understand, humans being able to fluently interact with a computer is revolutionary enough.

2. Machine Learning

The second step in the puzzle is machine learning. ?The first key characteristics of machine learning is it's all about data and the second is that it isn’t programming, it’s training. And what do we train? We train a model to meaningfully transform our data, to become adaptable by learning rules, patterns and representations from our data.

So What’s a Representation? It’s just a way of looking at data.

For example, a personal budget for the month of January could be represented as a table in an excel sheet, it could be represented graphically as a pie chart or it could be represented in audio form:

Machine learning is the part of the puzzle where we see patterns in data, correlations, probabilities. If you’ve worked with a data team or large amounts of data for work, study or just for kicks chances are you’ve used machine learning. It’s also great for sorting through huge chunks of “human mess” and finding patterns that would be impossible for us to discover.

3. Deep Learning (Deep Neural Network):

?If you want to look at the basic math: These YouTube films are legendary.

If you don't, well let's get started :) This is how a deep neural network is drawn in thousands of textbooks:

?First up we have 3 kinds of layers:

  1. Input Layer: The layer that receives the input data.
  2. Hidden Layer: This can be one or many layers and they perform computations on the data. By doing this each layer learns to transform its input data into a slightly more abstract and composite representation.
  3. Final result of prediction in the output layer: This is where successful classification or backpropagation begins.

Neurons: The dots. These are the little processing units of our network; they make the calculations and help decide whether a piece of data is going to move forward into the next layer. ?

Weights: The Lines. These determine the strength or direction of the influence one neuron has on another.

The word “Deep” refers to the passage of information through the layers of the neural network.

And the word “Learning” is a combination of two processes:

  1. Activation: Information is allowed to pass forward into the next layer of the network. The information is considered valuable and is allowed to influence the final result of the network.
  2. Backpropagation: Weight (line) adjustment based on error in its output. The network calculates it error based on the value of the output and the calculated output of the network. The network then goes backwards (from output back to input) and adjusts the weights to try and reduce the margin of error.

Got it? Okay now let's go back to our photo and try and use the network to recognise the faces.

Layer 1: we start by flattening every pixel around the faces in the photo into a series of numbers on a grid. (I'm going to skip bounding boxes and CNN's, - not the news sender- this is a non-technical, high-level example)

Ever see a pixel art coloring in book that managers use to avoid a burnout? that's kind of how the network wants to see the image. We go from a photo to pixels to numbers.

What are those numbers based on? It could be a lot of things but let's say for now that it’s a shade of grey, something between black and white.

0 is white or nothing

0.1 is light grey

0.2 is grey

0.3 is darker grey

Bla bla bla

1 is black

Once we have the numbers, we stack them vertically and we officially have our first layer.

(example: MNIST, its not our use case but it gives you an idea of that first step)

Layer 2: looks for combinations of black pixels or darker numbers next to white pixels or light pixels, (positive or negative space for the art students out there) which can be assumed to make up the edge of an object.

  • If the value in the neuron is 0 then the network ignores it and concentrates on other numbers.

Once we have the edges of objects, we start to see the beginning of a pattern in all those numbers.

Layer 3: looks for shapes on the grid, so a line or a curve. Think of this layer as though you are playing a game of Battleship, 5a to 10a on the grid all have the same or very similar color value, it seems to be a line! (you sunk my cruiser!).

  • If the value of the pixel is lower than 0.01 then the network ignores it and concentrates on other numbers.

Now we don't just have the edges and outlines of objects, we also have shapes, lines and curves.

Layer 4: can go a step further and those pixels, edges and lines all get combined into more complicated representations of a nose, eyes, mouth etc. These more advanced combinations of numbers on a grid get close to the expected combination and so with 4 steps we are ready to test our network against the output layer.

The output layer has a photo that has been tagged by us, it knows the result already. (this is one of the key characteristics of machine learning and deep learning, we give them the input and output, we train them, we don't program them) it then looks at the numbered patterns of that output and what the network said the input photo was. How close did it get? 30%,60% ?? That’s not good enough, we click thumbs down and curse AI’s limitations…let’s go back and tweak some of those weights (and some biases) and see if we can improve the score. (and then do it again and again) ?

When people talk about training a neural network it’s this back and forth + tweaking and human feedback.

I know the neural network of my photo app is working when it can look at photos of my folks it’s never seen before and successfully tag them.

Hopefully you have a basic idea of a simple neural network and how information moves forwards through its layers and then back again, adjusting until it finally gets it right. This layered architecture works very well with human mess, each step allows it to get closer and closer to the structure it needs to be able to effectively and computationally operate.

Now what's that feed forward network that works in all those Matryoshka-like black boxes of a transformer architecture?


Have a guess…that’s right it doesn’t move backwards, it’s a pre-trained network, so it doesnt have to go back and forth and get it right. It picks up the embedding and it pushes it through pathways that have been worked out during the training of the model.

Which pathways those are and why is something I'll explain in the next chapter :-)

Hope this was clear and if you have any questions or remarks, let me know !

要查看或添加评论,请登录

Kai Bergin的更多文章

  • Entropy, Parrots and Probability

    Entropy, Parrots and Probability

    Anyone that has used a language model like ChatGPT has been a witness to both game changing technology and multiple…

  • The Inevitability of Bias: From Artificial Intelligence to the human brain

    The Inevitability of Bias: From Artificial Intelligence to the human brain

    Before the sparks and embers of the next generation of artificial intelligence ignite the mountains of data we generate…

    4 条评论
  • AI can you make Art ?

    AI can you make Art ?

    This is going to be a long list of links, some images and more questions than answers. Apologies in advance: Some of…

    13 条评论
  • The Dignity of Data

    The Dignity of Data

    This is an article about a Q&A film I watched called "Data Dignity and AI". Data in its rawest form is nebulous and…

    6 条评论
  • GitHub Copilot: AI Assisted Development

    GitHub Copilot: AI Assisted Development

    At least a year before ChatGPT launched and changed the way we work; one profession already had a head start on the…

    2 条评论
  • CoPilot: Improving MS Teams

    CoPilot: Improving MS Teams

    Microsoft Teams, the Office application that went from “Why would anyone in their right minds use this”? To: “My…

  • CoPilot = Instant Powerpoint

    CoPilot = Instant Powerpoint

    This is the big one in CoPilot, I’ve just had a look through the walkthroughs and tips that Microsoft is publishing on…

    3 条评论
  • Microsoft CoPilot Tips: Writing

    Microsoft CoPilot Tips: Writing

    So let's start working with CoPilot by waking up the great grandfather of this productivity suite: The Word Processor…

    2 条评论
  • Small Language Models:

    Small Language Models:

    Improving: Efficiency + Observability + Accessibility. In the realm of artificial intelligence language models have…

  • Transformer Architecture: Simplified (sort of).

    Transformer Architecture: Simplified (sort of).

    It's a sad fact of life that computers just don’t understand human beings. For the last 70+ years we human beings have…

社区洞察

其他会员也浏览了