Overview of Computer Vision

Overview of Computer Vision

Background

This article is for people who wonder what this Computer Vision is all about and why there is so much hype around the same. If you are already an expert practicing computer vision and related technologies on a day to day basis, you may not find it useful and interesting.

1. Introduction

Before we jump into the details of the technologies that power Computer Vision, let’s try to understand what it tries to achieve. 

Computer Vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images and videos. It seeks to understand and automate tasks that the human visual systems do. This is the wikipedia definition of Computer Vision. 

To put it in simpler terms, Computer Vision is an attempt to let computers mimic the human behavior of how as humans we view and analyze images and videos.

Typically computers help us automate a variety of tasks, where many of them being highly complicated. 

But what do you think about this problem of enabling computers to understand digital images and videos? 

How complex do you think it can become?

No alt text provided for this image

Just to understand the background and complexity of this problem, let’s look at this small example. Imagine as humans these two identity cards shown above in the picture are shown to some people. Let’s say there are two different ways of identifying someone uniquely from this. 

  • One way is to remember this long 32 digit numerical identifier
  • Other way is to remember through the faces of captured in the picture

As humans what do we prefer? It is far easier to remember through the faces rather than this long numerical identity right?

Now think about computers. Let’s say you are developing a small application to compare the details from two different cards and check if they are unique or not. 

As we know for computers to process any piece of information it has to be represented digitally and processed. Now imagine what it takes to represent and store this long numerical identifier and the picture. As we know, pictures are stored with the RGB values or grayscale values for every pixel they represent. As can be seen it is far more easier for the computers to store and process this numerical value rather than a small picture. 

Even if you manage to develop a piece of code that can compare the values from every single pixel, even a small variation in the picture at one single pixel can screw up the computation and the uniqueness identification. You can imagine how as humans we consider this simple task of identifying someone by face becomes a really hard one for computers to solve. 

Hopefully you get the complexity involved in this setup and making computers to mimic the human behavior on this.

2.Applications

Again before we jump into the details of the technology, it is good to understand some details of the applications of Computer Vision. 

As we have seen, enabling computers to mimic human behavior by understanding pictures and videos is a hard problem to solve. There have been several developments done in the fields of Computer Science and NeuroScience to let computers to mimic the human brain.  

But over a period of instead of going after this broad objective, researchers have focused on specific tasks and started achieving greater success. 

No alt text provided for this image

Picture above shows some possible applications which attempt to solve and have achieved some success already. For example a problem as simple as letting a computer program detect cat vs dog, requires training a deep learning algorithm with millions of images (i.e. labeled as cat or dog). Projects such as imagenet focus on building a repository of images for this purpose. With predefined labels and helping someone to train their deep learning algorithms with the datasets. 

For a moment ignore the bottom layers of the deep learning techniques which enable achieving them. Let’s try to understand how these building blocks help us solve various real world business applications.

Object detection and localization as a technique for example is used in several domains like,

  • Retail to detect what items are placed in the aisles of a supermarket and what are in the shopping cart of someone. Like what Amazon Go provides.
  • Autonomous driving to detect various objects, humans, obstacles, etc. present in the road ahead of a vehicle driven automatically. 
  • Industrial automation setup for the robots to detect various objects it encounters in the factory.
  • Medical and life science field by detecting complex patterns from the scan reports of cancer patients, etc. 

Similarly all the other techniques listed above in the picture find their applications in several real world business setups. For example video activity detection, intrusion detection, etc. helps in video surveillance through CCTV footage.

3.Technology

Hopefully with some background on computer vision let’s review and understand some of the technologies that are used behind the scenes. 

3.1.Neural networks  

Development of neural networks was the first major breakthrough in the attempt of mimicking the human brain. They helped us to put together a bunch of simple neurons in a network to solve some of the problems.

No alt text provided for this image

Picture above shows the, basic building block of neural networks

  • Typically used with structured and small dimensional data
  • There will be interconnection from every node in a layer to every other node in the subsequent layer
  • Raw data (features) will be fed into the input layer
  • Output layer can be either binary, multi-class or continuous value depending on the prediction problem
  • Compute power required increases drastically and exponentially as the input dimension grows

For more details on neural networks you can refer to one of my previous articles or many other resources available online.

3.2.Convolutional Neural networks

Convolutional neural networks (CNN) is one of the techniques used in training deep learning algorithms. 

Many of you might have an assumption that deep learning is just about training deep layered neural networks with large numbers of layers. While it is true to some extent that typically large numbers of layers are used to solve complex problems, deep learning is more than just about the number of layers. 

Techniques like Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) really make deep learning help us solve really complex problems. Like CNN with computer vision in our case with images and RNN with any problem with sequence matters in the data like NLP, videos, etc. You can think of RNN being used in conjunction with CNN in the case of video.   

In fact going deep with more layers is not just a problem of availability of computational power. We encounter challenges like vanishing gradients while training models with more layers.

Now let’s try to understand what CNN is and how it helps us solve problems like object detection, etc. with images. Typically representing and storing images requires a lot of storage. Even smaller dimension of images will have a huge number of pixels. 

Imagine if we have to use fully connected neural networks as can be seen above with each pixel feeding into a neuron in the network. We will end up creating millions of neurons in the network and resulting in a huge network. Moreover it doesn’t help us learn something from every single pixel separately. We end up in the problem of overfitting a model for the dataset. Rather the network should be able to detect smaller features of the images like edges, curves, etc. and learn from the same. 

Hence we use this mathematical operation called convolution which can be applied in the form of matrix computation on the image representation. It helps the network learn functions like edges, lines, etc.

No alt text provided for this image

4.CNN network architectures

There are several libraries like TensorFlow, opencv, etc. which have reusable functions to perform the data processing, setting up the network, training the model, etc. Choosing the right network architecture is really the biggest challenge in training the CNN models. There are several famous and widely used architectures like VGG, LeNet, AlexNet,  ResNet, Inception, etc. Which have proven to be working well on several challenges and problems. Hence it is worthwhile to try reusing some of these architectures instead of devising new architectures for problems where we have to train a CNN model.

Other sets of challenges with CNN will arise with the need for a large volume of datasets (labeled) and abundance of compute power to train them. The key point to emphasis here is to have the labeled dataset. As it takes a lot of effort to generate labeled dataset for computer vision problems. 

Hence it is always advised to recognize the business problem at hand and check if there is a real need for training a new model or any pre-trained model can be reused. 

Another really useful technique here is transfer learning. For example a pre-trained model which is trained using a large volume of data from an image net can be reused with a transfer learning approach on a completely different setup with different dataset. 

Conclusion

With this article I have just scratched the surface and provided some high level overview. The idea is to get someone introduced to the Computer Vision technology. Recognize the usage and applications of them when there is a real world problem at hand to solve. 

I would strongly advise to review the lectures from deeplearnig.ai to learn further details on the same.

Harnish Modi

Data Analyst @ MISO | Purdue Data Science Alum

2 年

Great article. Very helpful!

回复
Suraj Rajan

Cloud | Infrastructure | DevSecOps | Architect | CCSP | Open source evangelist | Microservice | Everything as Code | Automate | 2x Kubernetes | 5x AWS | 2x Azure | Docker | Containers | IaC

4 年

Great work Vivek Murugesan !

Pradeep Sekar

Data Engineer | UAE| Big Data | Spark | Kafka Streaming | Cloud | Docker | Kubernetes | Python | Scala

4 年

Awesome Vivek Murugesan

回复
Mano Prakadeesh Venkadasamy

Solution Lead/Senior Software Engineer. Worked on banking, insurance, retail and Healthcare domains.

4 年

Thanks for sharing the article about computer vision Vivek. Wish you a happy new year :)

回复
Raheel Khan

CBO | CRO | Salespreneur | Innovator | Investor | Mentor

4 年

Great article Vivek. Thanks for sharing. Wishing you and your family a happy 2021.

回复

要查看或添加评论,请登录

Vivek Murugesan的更多文章

  • CISC, RISC and GPU architecture

    CISC, RISC and GPU architecture

    Introduction If you are working on building machine learning, deep learning and applications that leverage these…

    2 条评论
  • Astonishing numbers from the game of chess

    Astonishing numbers from the game of chess

    As you are aware the game of chess has been played for several centuries. But every time you play, you may end up…

    3 条评论
  • Our MLOps journey

    Our MLOps journey

    This article is a continuation of the previous article on Overview of MLOps. Here we will go through the details on how…

    2 条评论
  • Introduction to MLOps

    Introduction to MLOps

    ML Ops is a set of practices that combine efforts from Machine Learning, DevOps and Data Engineering teams to get the…

    6 条评论
  • Introduction to NoSQL systems

    Introduction to NoSQL systems

    I am writing this article, As an extension to my previous article on NoSQL systems. While I focused on some specific…

    9 条评论
  • An introduction to Event Driven Architecture

    An introduction to Event Driven Architecture

    Event Driven Architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of…

    7 条评论
  • Part2: Does math really help with coding?

    Part2: Does math really help with coding?

    This is a continuation of the article I published a few days ago. Following are items I promised to capture in the…

  • Does math really help with coding?

    Does math really help with coding?

    Idea behind this article is to talk about the importance of mathematical models/functions and their importance in…

    2 条评论
  • Evolution of Eventual Consistency

    Evolution of Eventual Consistency

    Consistency is one of the really critical aspects of the legacy, Database systems. But some of the modern day…

    6 条评论
  • Scalable Graph Computation for Data Science

    Scalable Graph Computation for Data Science

    1. Background Typically aspiring Data Scientists and some of the experienced Data Scientists as well, overlook the…

    2 条评论

社区洞察

其他会员也浏览了