Deep Learning Mechanisms in Applications
Alfred David
Tech Innovation Alchemist | AI-to-Blockchain Strategist | Building World-Class Engineering Teams | Future-First Leader
Deep learning is ravenously raved about right now, for those of us unaware of AI and its terminologies; well it is a branch of machine learning which in turn is a subset of AI. Deep Learning is nothing new, it is just that researchers and application developers have found a cool new way to utilize the concepts of deep learning to solve long-standing important problems, yielding at times, unexpected, but state of the art results all the same. This has contributed to its sudden new found popularity.
I was recently looking at a very interesting visualization of how deep learning startups have garnered the trending novelty factor from the IOS/Android application startups not only in the acquisition space but also in the hiring space.
Why is Deep Learning making all the splashes now? It was there some 40 years ago when researchers did all the research on the same and for many years it was at a dead end, and suddenly everybody is talking about it as if it were chanced upon by a stroke of genius.
Deep learning gained a lot of the limelight when some companies started playing with its concepts for computer vision and were able to generate many improved results tackling newer problems in that realm.
So what is the zing that Deep learning is bringing to the table? As I see it there three clever tricks that Deep learning is leveraging and utilizing far better than other computational techniques.
These so called clever tricks turn complex problems into simpler ones by breaking them into smaller logical chunks
- Variational Methods - which formulate intractable problems as approximate convex optimization problems, and then apply well-understood optimization algorithms which yield good performance and often have fast parallel and streaming variants.
- Distant supervision - self-training, or weak supervision, for starting with an insufficient data set and incrementally bootstrapping your way into sufficient data for supervised learning. Using these methods, you may or may not have some labeled data, you definitely have a bunch of unlabeled data, and you have a ‘function’ (read: a dirty yet clever hack) that assigns noisy labels to the unlabeled data — once you have lots of data with noisy labels, you’ve turned the problem into vanilla supervised learning.
- Transfer Learning — applying knowledge learned from one problem to a different but related problem. Transfer learning is especially exciting because we can learn from one data-rich domain with a totally different feature space and data distribution, and apply those learnings to bootstrap another domain where we may have much lesser data to work with.
Real World Problems Space
For many real-world problems, it is unfortunately rather expensive to get well-labeled training data. To elaborate on this issue, let’s consider two hypothetical cases:
- Medical vision: if we want to build a system which detects lymph nodes in the human body in Computed Tomography (CT) images, we need annotated images where the lymph node is labeled. This is a rather time-consuming task, as the images are in 3D and it is required to recognize very small structures. Assuming that a radiologist earns 100$/h and can carefully annotate 4 images per hour, this implies that we incur costs of 25$ per image or 250k$ for 10000 labeled images. Considering that we require several physicians to label the same image to ensure close to 100% diagnosis correctness, acquiring a dataset for the given medical task would easily exceed those 250k$.
- Credit scoring: if we want to build a system that makes credit decisions, we need to know who is likely to default so we can train a machine learning system to recognize them beforehand. Unfortunately, you only know for sure if somebody defaults when it happens. Thus a naive strategy would be to give loans of say 10k$ to everyone. However, this means that every person that defaults will cost us 10k$. This puts a very expensive price tag on each labeled data point.
Obviously, there are tricks to lower these costs, but the overall message is that labeled data for real-world problems can be expensive to obtain.Application builders are using Pre-training and Fine-tuning as an important offset to reduce the costs I’ve illustrated above
Pre-training: cheap large data sets on the related domain; These can be acquired as pre-trained models such as ImageNet, Model Zoos etc, public databases also could be leveraged to build models; in the absence of both data crawling tools such as scrapy, parsehub could be used to extract data from the web.
Fine-tuning: expensive well-labeled data is hard to get by and very expensive to generate with human annotations required and usually is in short supply in terms data set sizes; by trying to find a large weakly labeled data set without expensive human annotations would reduce cost. This weakly labeled data could be pre-trained on a neural network and then fine-tuned on a smaller set of well-labeled data set. This will result in a performance boost compared to just training on a small dataset
Today Deep Learning has risen like a phoenix into all the areas of the application space because of these factors:
1. Training Deep Networks - Researchers finally figured out how to train very deep networks. While it was assumed that “many back propagation layers == better /faster/smarter”, but at times networks with many layers stubbornly refused to be trained. This impacted research on neural networks and field almost went into hibernation, but work on deep networks have since allowed very deep networks to finally realize their potential and bring this field back from the thaw.
2. Large labeled datasets were created. Large networks need lots of data to train to become effective. In the last few years, more and more datasets became open to the public — ImageNet is one of the biggest, with over a million images and over 1,000 object categories (amusingly about 120 of which are breeds of dogs). Datasets for speech, video, human poses, and many others have also been published by universities and companies alike while the proliferation of smartphones (along with their cameras + sensors) provide incomprehensibly large datasets for the tech giants.
3. Easy to use frameworks - Deep learning frameworks like Tensorflow, Torch, Theano, Keras and Caffe have taken over the basic jobs, freeing up researchers to spend more time on interesting problems and less time reinventing the wheel.Added to these newer frameworks such as H20, Neon, Dmlc-maxnet, chainer, etc. are coming into the picture with the ability to do specific tasks more efficiently and also utilizing cloud platforms, this allows the researchers a gamut of choice to target tools and frameworks specific to the problem they are trying to resolve.On the Application space side, mobile and web frameworks have rapidly built plumbing interfaces to be able to use deep learning models, making prototyping and the whole end to end visualizing and solutions faster and viable. This makes sense when the go to the market timeframe is considerably reduced for solutions employing deep learning in the background.
4. Cheaper, faster processing power - Large networks can take an intimidating amount of computing power to crunch their numbers.With cloud technologies maturing and public clouds becoming more adoptable in the enterprise, researchers have taken confidence to utilize large amounts of compute power on a tap in the cloud. Compute power not only comes from CPU nodes but also GPU nodes and also a combination thereof on the cloud.
Where is this going from here?
I think the time is ripe where deep learning is gearing up to becoming a commodity by itself, like how mobility and IoT traded trends prior to this. There are products being specifically designed to leverage deep learning to solve business problems. just like how GPUs accelerate video games and scientific work like proteins folding, deep learning can gain similar boosts. Most deep learning frameworks now utilize GPUs and some companies like Cerebras are going even further and using programmable chips or creating special hardware whose sole purpose is to train neural nets. Existing web application such as browsers have also evolved to an extent where neural network computation is happening on the browsers such as MILWebDNN, speeding up the computation from conventional frameworks; It utilizes WebGPU though it is currently supported only in Safari, other browsers will soon catch up and make it more widely available. This brings the possibility that soon we could use an iPad (handheld device ) to work on neural networks completely eliminating complexity and allowing developers to focus on building and solving more complex real world problems in real time.
As computing power continues to get cheaper and smaller, networks that currently require supercomputers may soon fit in your HoloLens, smartwatch, AirPods, or any other wearable computing devices that we might use in the future and content like AR and VR would become intrinsically hemmed into our day to day technologies life size.
To illustrate pre training and fine tuning I’ve built a small iOS mobile application using tensorflow incorporating imageNet model; The application harnesses the device camera to be able to identify the object on the camera and give plausible labels from the graph generated using imageNet model and reinforced learning.
The source can be found here.
A good write up Alfy, with real life examples. Are you seeing any particular trend in adoption of TPUs (Tensor Processing Units) while developing customized chips for NN?
Tech Innovation Alchemist | AI-to-Blockchain Strategist | Building World-Class Engineering Teams | Future-First Leader
7 年but if you were running a startup and you wanted such labeled life sciences data what would you do? you still have to get that massive exercise underway of roping in doctors to label it and then collate it; if you did it with only one region based experts there would be a regional bias, and why would a doctor do free labeling for you, he will charge at consultation prices; That would add to your immediate burn rate
Senior Manager | AI & Data | Ex Deloitte | UAE Golden Visa Holder
7 年I understand that training data sets are expensive to get, however it is a one time cost. Further since it is training itself on live data, the past data sets can also help a great deal, solving the underlying data problem. I am not sure about life sciences example you provided, I am quoting only from the BFS stand point. Very interesting article, Alfy