Sticking it all together (part 1)

Sticking it all together (part 1)

In the world of artificial intelligence, the power of the technology is often shrouded in mystery. But as we have seen, the key to unlocking this power lies in the training of an intricate set of logical rules, derived from a set of exemplary data. These rules form the foundation of the neural network that drives most AI applications. The methods used to train these networks may vary, but the end result is always the same - a model, a configuration for the neural network, that is able to make sense of data similar to that on which it was trained.

For example, in the case of text recognition, we can start by feeding the system a set of scanned text fragments, the content of which is known to us. During training, the network will try different paths until it reaches the correct conclusions. Once the network is able to produce accurate text from training images, the configuration can be saved and tested in the real world.

However, this process requires tools for training, as well as the ability to save, edit and reuse network configurations. The data used for training must also be in a form that can be easily manipulated. In other words, important architectural decisions must be made before beginning the training process. Where do we begin? There is no easy or single answer to this question.

Let's try by understanding better the nature of data and results we would like to obtain from it. There are different ways to build and manage artificial neural networks and each way is more or less suited for the application at hand. Not always there's a way to understand this in advance, but some things can be quite easily foreseen. As AI has developed, researchers have identified traits that are more efficient than others for certain applications. Building on this research and various case studies one can compare different solutions and decide which way to go with relative clarity.

Let's go back to our example with text recognition, which by the way is commonly known as OCR (Optical Character Recognition). In this case we're going to work with images and the common wisdom is that CNN, or Convolution Neural Network, would be best suited for this application. This is not the only possible way and there are also different types of CNNs one can employ, but the decision making path is quite linear here. We need to work with images. CNN is the commonly used class of neural networks when working with images. So we shall use it.

OCR is a well known and well researched application - that's why it's a good example to start building my little exercise a few paragraphs ago. But more often than not some heavy lifting needs to be made to understand which class of neural network would fit best and how to feed the data into the system. Sometimes one needs to employ several classes of neural networks as modules of a higher class network - that's how complicated it may get. That's where modern AI toolchains that have been developed in the recent decades come in handy: a lot of work has gone into developing state of the art implementations of neural network engines as well as comfortable training labs for them. This allows for some stick poking experiments to determine empirically the best possible implementations. Be it TensorFlow – probably one of the most used engines at the moment - or the "synthetic petri dish" approach suggested by leading AI researchers - it is now possible to achieve astonishing results building on top of the already existing and diverse technological infrastructure.

In the face of the ever-growing and ever-powerful technological infrastructure at our disposal, even the most advanced methods are not without their limitations. The tireless work of researchers and engineers is focused on finding ways to optimise the computational complexity of neural networks, for even the slightest improvements in efficiency can yield significant results. The art of compromise is key, as obtaining a result that is accurate 95% of the time, but using half the power and time required to reach 96% accuracy, can be a victory in its own right. Much of this effort centres on the reduction of data volume through the reduction of entropy, as every bit and byte counts in the quest for optimal performance.

In very simple terms, entropy is the non-surprising part of any block of information. For example, reducing a two-hour movie to a one-page synopsis may seem a sacrilege to some, but for those seeking to build a movie classification engine that identifies certain plot twists, it is the very information they need. The task at hand guides us in determining the useful part of the information and devising a method to reduce the entropy and reveal the truth.

Let's go back to the OCR example now, as it is a great case to demonstrate entropy reduction in action. Let's start with a simple image that contains text. Be it a page of a book or a poster, it is basically a surface with some sparse letters on it. Letters are notoriously monochromatic, the change of colour within a single letter or a word doesn't change its meaning. It may imply additional information, but we are not looking for it in this case – we just want our program to be able to see that a is a and b is b. Turning a colour image into a monochromatic one will reduce its size 24-fold, while retaining all useful information intact. This is the other side of the "Information is in the eyes of the beholder" principle discussed in my previous piece: we only care about a certain part of what is being conveyed and the rest may simply be discarded.

Continuing our private battle against entropy, let's now look at the actual size of the image. All digital images are made of rows and columns of pixels. The number of pixels that correspond to a certain amount of physical space, for example an inch, may be considered an image resolution. Most scanned images contain 300 pixels per inch of paper scanned. This is quite a lot as average text height in written documents is 12 points, 1/6 of an inch or roughly 50 pixels. We now need to find a good compromise between precision and performance: as we half the resolution we loose some of the relevant information, but how much does it impact the precision of results we obtain? This can be established empirically running a few tests and it has already been done, so we can say here that we can reduce the resolution four-fold without compromising text-recognition results.

Through the process of entropy reduction, we have managed to condense the original data to a mere fraction, 1/96th to be precise. Yet, despite the drastic reduction in volume, the information remains intact and our system is now able to function with a newfound speed and efficiency. The method we have devised, can be transformed into a program that can be utilized to process incoming data, both for training and analysis purposes. Its replicability and efficiency make it a valuable asset to our architecture. As previously discussed, analyzing visual data is best achieved through the use of a CNN. We have now established the first and second parts of our architecture, the data ingestion and analysis. Next time, we shall delve further into the intricate details of how the various pieces of the puzzle come together to form a cohesive and intelligent machine.

要查看或添加评论,请登录

Alexander Karelin的更多文章

  • Not All That Ticks Is a Bomb

    Not All That Ticks Is a Bomb

    Every business drives success from a unique selling proposition. Whether it’s an innovative business model, exceptional…

  • The Balance Between Responsibility and Authority

    The Balance Between Responsibility and Authority

    In leadership, being able to delegate responsabilities is a critical skill that can make or break a team's success…

  • Cultivating Passion and Creativity: A Leadership Challenge

    Cultivating Passion and Creativity: A Leadership Challenge

    As leaders, we often find ourselves walking a tightrope, balancing the need for clear objectives with the importance of…

    3 条评论
  • All Intelligence Is Artificial

    All Intelligence Is Artificial

    All Intelligence Is Artificial Merriam-Webster dictionary defines artificial as: 1. humanly contrived 2.

    2 条评论
  • Why bother?

    Why bother?

    Frequently, I find myself besieged by inquisitive characters who grace me with messages on LinkedIn, offering this or…

    2 条评论
  • Write me a poem about 8013670419388

    Write me a poem about 8013670419388

    In the current moment, I find myself situated at the yearly PXM conference by Akeneo. This very instant marks the…

  • Now what?

    Now what?

    In the preceding chapters, we have delved into the inner workings of artificial intelligence, revealing some of its…

  • Sticking it all together (part 2)

    Sticking it all together (part 2)

    “Where would a wise man hide a leaf? In the forest. If there were no forest, he would make a forest.

  • Information is in the eyes of the beholder

    Information is in the eyes of the beholder

    Each culture has their own unique associations and interpretations of the senses, which can lead to confusion and…

  • Asking the right questions

    Asking the right questions

    In the ancient land of Cornycobania, there stood an idol of the goddess of wisdom, Ann. The local tribe had long…

社区洞察

其他会员也浏览了