Deep Learning - Different Frameworks
Sanchit Tiwari
Associate Partner at McKinsey & Company I Senior Principal at QuantumBlack, AI by McKinsey
Many research areas are getting impacted and transformed with the increase of new computing resources/ techniques and large datasets, deep learning is getting used almost everywhere in our decision-making process. Researchers and practitioners like me are getting exposed to multiple Deep Learning Frameworks and based on my learning, I found that each framework is developed in a specific manner for specific objectives. In this article, I thought to put my high level learning around different deep learning framework to share the details of which framework will be most suitable for solving which problem statement and how that can be utilized to build your deep learning models.
Before I go in the details for the framework, just for refresher let’s start with what is Deep Learning(DL)? Basically “DL is a subset of Machine Learning(ML) that makes the computational multi-layer neural network feasible, typical DL architectures are deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GAN), and many more”. In recent times due to amazing performance in various domains DL based approaches have received significant attention in both theoretical and practical domains. DL algorithms provide learning strategies that outperform conventional ML algorithms and it has shown remarkable performance in various application areas. DL is a proven technique for analyzing big data of high variety, variability, dimensionality, and for modeling complex spatial and temporal correlations. DL makes use of Artificial Neural Networks (ANN) containing multiple hidden layers. Training can be both unsupervised and supervised. In supervised training, standard supervised methods can be used to train the network. Whereas, in unsupervised training, higher-level representations are created from incoming data.
An exciting application of deep learning is within the field of autonomous driving, where neural networks are used in scene segmentation, object detection, and route planning etc. Keep in mind that neural networks are considered mature and reliable but they have the disadvantage of being computationally expensive to train and use. In computationally resource-constrained environments such as embedded systems for autonomous vehicles, it becomes paramount to optimize the networks with respect to both memory and inference time. Thus, in domains where achieving high levels of accuracy is vital and where resources are limited, methodologies to make neural networks more compact and efficient are essential where these frameworks help us to solve these challenges in DL.
As now we have some basic understanding of deep learning and some of the applications and challenges around those so let’s jump to the framework and start with that why do we need these DL frameworks?
In my understanding, coding the DL algorithm from scratch is good as then you learn all the math around the algorithm but in real-world focusing just on the coding of algorithm can distract you from the problem statement you are trying to solve through deep learning and that’s where DL framework helps us to work on particular business problem to solve through DL and by having already codified algorithm in that. In this article, I am listing down a few popular ones such as TensorFlow, MCNTK, Caffe, Caffe2, PyTorch, and MXnet. As we will go in further details of these frameworks you will see these are developed by the world’s largest software companies such as Google, Facebook &Microsoft and it makes sense as these companies possess huge amounts of data, high-performance infrastructures, human intelligence, and investment resources so they need it most.
You will also see that there are many high-level Deep Learning wrapper libraries built on top of the above mentioned Deep Learning frameworks such as Keras and I will share the details around that too.
TensorFlow:- Let’s start with the most popular one i.e. TensorFlow which is open-source, fast-evolving, supported by a company on which we all are dependent i.e. Google..:-)
So TensorFlow is an open-source software library for numerical computation using data flow graphs, TensorFlow was created and is maintained by the Google Brain team within Google’s Machine Intelligence research organization for ML and DL.
TensorFlow is designed for large-scale distributed training and inference. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The distributed TensorFlow architecture contains distributed master and worker services with kernel implementations.
These include 200 standard operations, including mathematical, array manipulation, control flow, and state management operations are written in C++. TensorFlow was designed for use both in research, development and production systems. It can run on single CPU systems, GPUs, mobile devices and large-scale distributed systems of hundreds of nodes it is used for setting up data flow graphs. As you know neural networks can be set up in the form of a graph so TensorFlow works in the way that you assemble a graph and define the nodes, computational nodes in the graph and then invoke a session to execute the computations in the nodes.
TensorFlow workflow phases:-
- Phase 1: Assemble a graph
- Phase 2 : Use a session to execute operations in the graph
TensorFlow programming interfaces include APIs for Python and C++ and is also supported in Google and Amazon cloud environments. TensorFlow is efficient in multi-GPU settings, mobile computing, high scalability of computation across machines and huge data sets.
Learn more at their website https://www.tensorflow.org/learn
Keras:-
Keras is Tensorflow’s official high-level API for building and training deep learning models, it greatly reduces the programming complexity as even though Tensorflow makes coding easier especially for implementing some of the deep learning algorithms, Keras provides one more layer on top so that gives us a very simple way of implementing some of the more popular deep learning architectures.
Keras is multi backend multi-platform which allows fast prototyping, state of the art research, production and run seamlessly on CPU and GPUs. It has three style APIs:-
The sequential model:-
- Most simple – only for single input, single output, sequential layer attacks
The functional API:-
- Complex one – multi input, multi output, arbitrary static graph topology
Model subclassing:-
- Maximum Flexibility
Learn more at their website https://www.tensorflow.org/guide/keras
MCNTK:-
?Microsoft Cognitive Toolkit (CNTK) is a commercial-grade distributed DL framework with large-scale datasets from Microsoft Research. It implements efficient DNNs training for speech, image, handwriting and text data. Its network is specified as a symbolic graph of vector operations, such as matrix add/multiply or convolution with building blocks (operations). CNTK supports FFNN, CNN, RNN architectures and implements stochastic gradient descent (SGD) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK is running on both 64-bit Linux and Windows operating systems using Python, C#, C++, and BrainScript API. In addition, we can use the CNTK model evaluation functionality from your Java programs.
CNTK is also one of the first deep-learning toolkits to support the Open Neural Network Exchange ONNX format, an open-source shared model representation for framework interoperability and shared optimization which allows to easily transform models between CNTK, Caffe2, PyTorch, MXNet and other DL tools. ONNX is co-developed by Microsoft and Facebook.
Learn more at their website https://docs.microsoft.com/en-us/cognitive-toolkit/
Caffe:-Caffe is a DL framework made with expression, speed, and modularity in mind. It is developed by Yangqing Jia at BAIR (Berkeley Artificial Intelligence Research) and by community contributors. DNNs are defined in Caffe layer-by-layer. The layer is the essence of a model and the fundamental unit of computation. Data enters Caffe through data layers. Accepted data sources are efficient databases (LevelDB or LMDB), Hierarchical Data Format (HDF5) or common image formats (e.g. GIF, TIFF, JPEG, PNG, PDF). Common and normalization layers provide various data vector processing and normalization operations. New layers must be written in C++ CUDA, although custom layers are also supported in Python (but are less efficient).
Pretrained networks are available in the Caffe Model Zoo for finetuning and it is good for image processing with CNNs.
You can learn more at https://caffe.berkeleyvision.org/, but development is not as active as previously and Static model graph definition does not fit many RNN applications which need variable sized inputs.
If you like coding in C++ then this is the framework for you as any custom layers must be written in C++.
Caffe2:- Caffe2 is a lightweight, modular, and scalable DL framework developed by Yangqing Jia and his team at Facebook. Although it aims to provide an easy and straightforward way to experiment with DL and leverage community contributions of new models and algorithms, Caffe2 is used at the production level at Facebook while development is done in PyTorch. Caffe2 differs from Caffe in several improvement directions, namely by adding mobile deployment and new hardware support (in addition to CPU and CUDA). It is headed towards industrial-strength applications with a heavy focus on mobile. The basic unit of computation in Caffe2 is the operator, which is a more flexible version of Caffe’s layer.
There are more than 400 different operators available in Caffe2 and more are expected to be implemented by the community. Caffe2 provides command-line Python scripts capable of translating existing Caffe models into the Caffe2. However, the conversion process needs to perform a manual verification of the accuracy and loss rates. It is possible to convert Torch models to Caffe2 models via Caffe.
It also supports the Open Neural Network Exchange (ONNX) format, which allows to easily transform models between CNTK, Caffe2, PyTorch, MXNet and other DL tools.
Learn more at their website:- https://caffe2.ai/docs/tutorials
PyTorch:-
PyTorch is a Python library for GPU-accelerated DL). The library is a Python interface of the same optimized C libraries that Torch uses (FYI.. Torch is a scientific computing framework with wide support for ML algorithms based on the Lua programming language).
It has been developed by Facebook’s AI research group since 2016 and it is written in Python, C and CUDA. The library integrates acceleration libraries such as Intel MKL and NVIDIA (cuDNN, NCCL). At the core, it uses CPU and GPU Tensor and NN backends (TH, THC, THNN, THCUNN) written as independent libraries on a C99 API.
PyTorch supports tensor computation with strong GPU acceleration, and DNNs built on a tape-based autograd system. It has become popular by allowing complex architectures to be built easily. Typically, changing the way a network behaves means to start from scratch.
PyTorch uses a technique called reverse-mode auto-differentiation, which allows to change the way a network behaves with small effort (i.e. dynamic computational graph or DCG). It is mostly inspired by autograd and Chainer. The library is used by both the scientific and industrial communities. An engineering team at Uber has built Pyro, a universal probabilistic programming language that uses PyTorch as backend. The DL training site fast.ai announced that their courses will be based on PyTorch rather than Keras-TensorFlow. The library is freely available under a BSD license and is supported by Facebook, Twitter, NVIDIA, and many other organizations. Strong points – Dynamic computational graph (reverse-mode auto-differentiation). – Supports automatic differentiation for NumPy and SciPy. – Elegant and flexible Python programming for development.
PyTorch also supports the Open Neural Network Exchange (ONNX) format, which allows to easily transform models between CNTK, Caffe2, PyTorch, MXNet and other DL tools.
Learn more at their website:- https://pytorch.org/tutorials/
MXNet: Apache MXNet is a DL framework designed for both efficiency and flexibility. It is developed by Pedro Domingos and a team of researchers at the University of Washington. It allows mixing symbolic and imperative programming to maximize efficiency and productivity.
At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on-the-fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient.
MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines. It also supports an efficient deployment of trained models in low-end devices for inference, such as mobile devices (using Amalgamation), IoT devices (using AWS Greengrass), Serverless (using AWS Lambda) or containers. MXNet is licensed under an Apache-2.0 license and has a broad API language support for R, Python, Julia and other languages.
Learn more at their website:-https://mxnet.apache.org/api
Theano:-
Theano is a pioneering DL tool supporting GPU computation whose development started in 2007. It is an open-source project released under the BSD license. It is actively maintained (although no longer developed) by the LISA group [now MILA Montreal Institute for Learning Algorithms ] at the University of Montreal.
At its heart, Theano is a compiler for mathematical expressions in Python to transform structures into very efficient code using NumPy and efficient native libraries like BLAS and native code to run as fast as possible on CPUs or GPUs. Theano supports extensions for multi-GPU data parallelism and has a distributed framework for training models.
This library assists in the optimization and utilization of CPU and GPU, which further enhances the overall performance of data-intensive computation. Theano code is being written to take advantage of how a computer compiler works. The library, in fact, works as the neural networks’ building block. A professional can straightforwardly utilize this library if he needs suppleness as well as fine-grain customization.
Learn more at their website:-https://deeplearning.net/software/theano/
Now as you can see that each framework got developed in a specific manner for specific objectives so depending on your problem statement and situation you need to choose the respective framework. I would say if you are just starting with DL then start with Keras and as you go for deeper development then Keras with Tensorflow, for production and research based on the functionalities and flexibility I would recommend Tensorflow/Pytorch. Some researchers I know are using Torch for implementing their research paper as Torch makes debugging easier so as I said it depends on your work and situation.
In the end, I believe like everyone else that it will be TensorFlow which will be preferred by most and that is just based on my interaction with different people from our DS community.
This post is having content mostly from different frameworks's websites which I have listed and found those as a very good resource of learning so I will highly recommend exploring those.