Top AI/ML frameworks and libraries
Hari Gottipati
Senior Technology Executive | Digital and Data Transformation Strategist | Data/AI/ML/Cloud Expert | Writer | Speaker | Transforming Ideas into Impactful Solutions | Driving Innovation and Growth Through Technology
Artificial Intelligence (AI) is the top technology trend that everyone is talking about. More and more developers are trying to get their hands dirty with AI frameworks as they see the growing demand in this field. Machine Learning (ML) and Deep Learning (DL) are part of a broader AI umbrella that is witnessing tremendous growth. When you start exploring this field, you must be asking yourself, "what AI/ML/DL frameworks and libraries that I should be familiar with?". Well, here is the list of frameworks and libraries.
Tensorflow
It is Google's open-source end-to-end machine learning platform to help you develop, train, validate, and deploy models in large production environments. A JavaScript version (Tensorflow.js) supports training and deploying models in the browser and on Node.js. A lighter version, TensorFlow Lite, is a lightweight library for deploying models on mobile and embedded devices. It supports many classification and regression algorithms, and more generally, deep learning algorithms. Python is the primary language used to develop with TensorFlow; however, the community is adding more interfaces to enable development using JavaScript, Java, Go C++, C #, and Julia. TensorFlow supports static graphs for performance over flexibility, which means you should first define the graph entirely and then inject data to run, aka define-and-run.
AI vs. ML vs. DL. Source: Nvidia
PyTorch
Based on the Torch library, It's Facebook's open-source machine learning framework that can be leveraged from researching prototypes to production deployments. With extensions of tools and libraries, PyTorch supports a wide variety of use cases in NLP and Vision. It is supported on major cloud platforms (AWS, GCP, Azure, and Alibaba Cloud) to get up and running quickly. The most significant advantage of PyTorch is dynamic computational graphs, which means the graph structure is defined on-the-fly via the actual forward computation, aka define-by-run.
Caffe
Yangqing Jia created Caffe as a Ph.D. project at Berkeley AI research, and it stands for Convolutional Architecture for Fast Feature Embedding. It's written in C++ with an interface of Python. Caffe can process more than 60 million images per day with a single NVIDIA K40 GPU, and according to Berkely's vision, it is among the fastest convolutional neural network implementations available. Heavily driven by configuration (without hard-coding) and switching between CPU and GPU can be done by setting a single flag.
Caffe2
Caffe2 is a new lightweight, modular, and scalable deep learning framework and successor to the original Caffe. It was created at Facebook when its creator Yangqing Jia worked at Facebook (he is now at Google) and later got merged into PyTorch. While the Caffe2 APIs will continue to work, Facebook encourages us to use the PyTorch APIs. It is aimed to deliver AI on mobile devices to process, create, and improve models using massive data sets at speed with a lightweight framework.
Microsoft Cognitive Toolkit
Formerly known as Microsoft CNTK, Microsoft Cognitive Toolkit is a free, easy to use, open-source, commercial-grade distributed deep learning library designed to support robust datasets and algorithm. It offers efficient scalability from a single CPU to GPUs on multiple machines without compromising speed and accuracy. It's available as a library to use in Python, C#, C++, or Java (model evaluation functionality) programs, or it can be used as a standalone ML tool with its own model description language called BrainScript.
Scikit-learn
Built on the Python libraries such as NumPY, Matplotlib, and SciPy libraries, Scikit-learn is a simple and efficient machine learning library for working with the AI problems as well as data mining and analysis. It supports both supervised and unsupervised machine learning, including classification, regression, and clustering. It also supports dimensionality reduction (reducing the number of random variables to consider), model selection (comparing, validating and choosing parameters and models), and preprocessing (feature extraction and data normalization) out of the box.
Keras
It's a high-level neural network API that runs on top of popular low-level libraries such as Tensorflow, Microsoft Cognitive Toolkit, and Theano (this project is no more active). Aiming at simplifying the complexity with other libraries, Keras is reduced to single-line functions; however, this makes Keras a less configurable environment than low-level frameworks. Inspired by Torch APIs, Keras provides intuitive APIs and is becoming a fast-growing framework. It runs on both CPUs and GPUs and supports both convolutional networks and recurrent networks, as well as combinations of the two.
Java based
Though most popular AI/ML frameworks are based on Python language, there are a couple of options available for Java Developers.
MXNET is a flexible, ultra-scalable, and efficient deep learning library. It supports Julia, Scala, Clojure, C ++, Perl, R, and Java, along with the deep integration into Python. With a robust ecosystem of tools and libraries, MXNet supports computer vision, NLP, time series, and many more use cases.
Eclipse DeepLearning4J is a commercial-grade, open-source, distributed deep-learning library written for Java and Java Virtual Machine (JVM) and includes a Scala API. It brings AI to business environments for use on distributed GPUs and CPUs and integrates with Hadoop and Apache Spark.
Intel BigDL is a distributed deep learning library for Apache Spark. BigDL allows running deep learning applications as standard Spark programs on Spark/Hadoop clusters
Apache Mahout is a project of the Apache Software Foundation, and it is a distributed linear algebra framework to produce free implementations of distributed and scalable machine learning algorithms. In the past, it leveraged Apache Hadoop platform; however, Apache Spark is the recommended out-of-the-box distributed backend and can be extended to other distributed backends. It implements popular machine learning techniques such as classification, clustering, and recommendation.
These are some of the popular AI/ML frameworks and libraries serving different algorithms and approaches. What framework to use is entirely dependent on your use case and the goals you want to achieve.
Cover image source: Google Images
A version of this article was originally published on AZIndiaTimes.com (online and printed edition).
Note: Opinions expressed here are solely my own and do not express the views or opinions of my employer.
Head of Service Lines at T-Systems (Deutsche Telekom Group)
4 年Thanks Hari Frame works and Libraries (FW/ LIB) - For simple understanding - by looking at all options each one has its core advantages. At the end of the article it was mentioned - it is all depending on your use case. However - even if one decide to use different features of many Options, each FW/ LIB has its own limitations to work with FW/ LIB. Do you have any suggestion or advise to the people who are trying to try this first time - how to decide what to use and where to start ...
Doctorate in Data Science | Wells Fargo | Charles Schwab | American Express | Morgan Stanley | Bertelsmann AG
4 年Thanks for sharing the great article Hari. You always share very thought provoking products and insights. I sincerely appreciate it. Quick question: Caffe seems fascinating. I see Intel embedding AI in its process chipset. Do you know if Caffe exploits hardware embedded AI capabilities? PyTorch seems very interesting due to its offering of dynamic computational graphs. Do you know if these products integrate with legacy modelling tools or offer code conversion feature to ease migration to these offerings?