登录查看更多内容

Top 10 Essential Machine Learning Libraries of 2020

Ashish Airon

VC | Entrepreneur | BlitzBusiness Podcast | Forbes 30U30 Asia | Mentor @ Niti Aayog (Govt. of India)

发布日期: 2020年4月29日

In this article I will discuss 10 most used Machine Learning Libraries. we’ll be discussing libraries that can handle most of the machine learning tasks along with their relevant Pros & Cons.

If you are just starting out in Machine Learning / Data Science, I will highly recommend to give a quick glance. Or if you have been in this game for long, comment in the below section which libraries you frequently use/like.

1. NumPy

NumPy stands for Numerical Python. It is one of the most basic (yet advance) Python libraries available for scientific computing and can be used as a multi-dimensional container for data. One can perform Linear Algebra computations which are necessary for Machine learning Algorithms like Linear Regression, Logistic Regression, Na?ve Bayes and so on. It is mostly written in C language (low-level language), due to which it is faster.

Pros

NumPy uses less memory.
Faster as compared to lists in Python.
Mathematical operations can be performed using Numpy unlike Lists.

Cons

NumPy arrays are homogeneous contiguous blocks of memory i.e., it will hold only one kind of data.

2. Pandas

Pandas is one of the most important statistical library mainly used in the field of Statistics, Finance, Economics and Data Analysis. It is similar to Excel. It is known for processing large datasets.

Pros

Pandas creates fast and effective Data frame objects with pre-defined and customized indexing.
Can be used to manipulate large datasets. It deals with the missing values present in the dataset as well.
Provides in-built features for creating Excel charts and performing complex data analysis task like data wrangling, data transformation and so on.

Cons

Pandas does not persist data.
Can only handle results that fit in memory, and is easy to fill.

3. Matplotlib

Matplotlib is one of the most popular libraries for data visualization. It provides support for a wide variety of graphs like histogram, bar charts, scatter plot, pie charts and so on. It is basically a two-dimensional graphical library which produces very concise and clear graphs that are important for exploratory analysis. Nowadays I am getting more inclined to Plotly also :)

Pros

Easy to plot graphs using matplotlib by providing functions for choosing suitable line styles, different font styles, changing and formatting axes.
Easy to understand these graphs.
Contains the Pyplot module that provides a basic interface, similar to MATLAB interface.
Provides Object-Oriented API module that will help in integration of graphs into applications and tools.
Matplotlib produces research quality graphs as it uses vectors instead of pixels.

Cons

Graphs in Matplotlib are not interactive.
Limited variety of visualization.

4. Scikit-learn

One of the most effective library for machine learning, data modelling and model evaluation. It is built on the top of SciPy. It contains lot of functions for the purpose of model creation. Consisting of Supervised and Unsupervised Machine Learning algorithms.

Pros

Provides a set of standard datasets, to help people to get started machine learning. E.g: Iris dataset.
It has in-built functions to carry out both supervised and unsupervised learning which includes Clustering, Classification, Regression, Data Mining, Anomaly detection and so on.
It consists functions for Feature Extraction and Feature Selection, which helps in identifying significant attributes or variables in data.
It also has functions for Cross-Validation to estimate the performance of the model.
It integrates really well with NumPy.

Cons

Sometimes, it becomes really slow, especially during the training of models.
Less flexible.
Limited model tuning capabilities for few Machine Learning Model

5. Tensorflow

A Deep Learning maintained by Google. It is a library that is used for building string and precise Neural Networks (algorithms are inspired by the structure of the brain). To represent Tensorflow programs tensor data structure is used. It supports programming languages like C++, Python and R. It can used in Natural Language Processing (NLP), forecasting, text summarization, image/video analytics and handwriting recognition. It is known for faster deployment of algorithms while retaining the same APIs.

Pros

Tensorflow allows to train multiple neural networks which help to accommodate large datasets.
It provides functions and methods that provide basic statistical analysis.
It also provides layered components that perform layer operations on weights and bias.
Tensorflow also comes with a visualization tool called tensorboard.
Pre-trained models availability
Multi-GPU deployment support

Cons

Can get really slow if you don't know what is happening under the hood.
Due to its structure, debugging can get difficult.
Steep learning curve (although it has become better since when I started)

6. Keras

Keras provides full support for creating, analyzing, evaluating and improving neural networks. It is built on top of Theano or Tensorflow libraries which provide additional features to complex and large-scale deep learning models.

Pros

Keras provide support for building all types of neural networks.
Lightweight and easy to use.
It is really straightforward when it comes to building a deep learning model by stacking multiple layers – Keras in a nutshell.
It has several pre-processed datasets and trained models like MNIST.
It is easily extensible and provide support to add new modules.
Fran?ois Chollet is the primary author and maintainer (great guy, follow him on twitter)

Cons

Errors are difficult to debug.
It is difficult to customize your layer because Keras already have pre-configured layers.

7. PyTorch

PyTorch is an open-source Python-based scientific computing library, mainly used to implement deep learning techniques and neural networks on large datasets. It competes with Tensorflow. This library is developed by Facebook's AI Research lab (FAIR).

Pros

It provides easy to use APIs.
Efficient due to underlying scripting language LuaJIT and C/Cuda implementation.
Can creates dynamic computation graphs.
Good documentation and community support.

Cons

It lacks visualization such as tensorboard in Tensorflow. Therefore, third-party is needed.
API server needed for production.

8. XGBoost

XGBoost stands for eXtreme Gradient Boosting. It was written in C++. This library was named after XGBoost algorithm. It is an application of gradient boosted decision trees created for speed and performance.

Pros:

XGBoost has faster execution speed.
It provides better model performance.
The core XGBoost algorithm is parallelizable and it can use the power of multi-core computers.
It can process very large datasets and it can work across a network of datasets.
It also provides internal parameters to perform cross-validation for performing regularization, cross-validation, handling missing values and so on.

Cons:

It is computationally expensive.
It is less interpretable.

9. OpenCV

OpenCV (Open Source Computer Vision) is a library used for computer vision. It supports Python, C++ and Java. In OpenCV, all the images are converted to NumPy arrays as this process will make it easier to integrate with other libraries that uses NumPy.

Pros

OpenCV is written in C++, making it fast.
Portable library

Cons

OpenCV lacks in memory management.

10. NLTK

NLTK (Natural Language Toolkit) is a leading library for building python programs to work with human language data and it also provides easy to use interfaces.

Functions performed using this library are classification, tokenization, parsing and so on. Some of its applications are text processing, recommendation system and sentiment analysis.

There many more libraries which are doing great work if you are looking to work with text data like SpaCy & Gensim.

Pros

It supports the maximum number of languages as compared to other libraries.

Cons

Slow speed.
Can get difficult to use.
It directly splits the sentences without analyzing the semantic structure of the sentence.

Conclusion

For most of the standard and out-of-the-box stuff Keras stands way ahead. If you are looking to go deeper and build your own layer go for PyTorch or Tensorflow. For other Machine Learning specific tasks like Text analytics there are great libraries like NLTK, or SpaCy.

So make the wise decision based on the current requirement when using the Machine Learning libraries for personal projects or for your Company. I will conclude by quoting Occam's razor principle

"The simplest solution is most likely the right one"

Shubham Goyal

Data Scientist, Quantitative Analytics, AML, Fraud Detection & Financial Crime

4 年

Nice article Ashish Airon , additionally do you think, pandas will soon replaced by libraries like Dask beacause of unavailability of parallel processing? Just a thought

1 次回应

查看更多评论

要查看或添加评论，请登录

Ashish Airon的更多文章

Navigating Options: Decisiveness in a World Without Rules

2025年2月28日

Navigating Options: Decisiveness in a World Without Rules

Every decision we make is a fork in the road. Whether it’s choosing between two job offers, deciding to pivot a…

1 条评论
The Power of an Abundance Mindset: Why It Always Works Out

2025年2月2日

The Power of an Abundance Mindset: Why It Always Works Out

In a world obsessed with overnight success, it’s easy to get disheartened when things don’t immediately fall into…
The Entrepreneur’s Dance with Uncertainty

2025年1月8日

The Entrepreneur’s Dance with Uncertainty

It starts with an idea, a spark of inspiration that refuses to leave your mind. You see a problem, an opportunity, and…

3 条评论
From "What If" to "Even If": Changing How You Think

2024年12月24日

From "What If" to "Even If": Changing How You Think

When I was 21, I stood in a room ready to give my first big presentation. I was pitching my first startup idea .

2 条评论
Why I Ignore Most Startup Advice—And Why You Should Too

2024年12月14日

Why I Ignore Most Startup Advice—And Why You Should Too

Startup founders are drowning in advice. Some of it’s good, most of it’s noise.
What Chess Taught Me About Building a Startup

2024年11月18日

What Chess Taught Me About Building a Startup

Back again with a new article / new analogy - mixing my learning from real world to startup world !! After spending…

6 条评论
Sunrise or Sunset?

2024年11月3日

Sunrise or Sunset?

In the startup world, we often speak of "sunrise" and "sunset" industries, of businesses rising and falling, of markets…

7 条评论
From Right to Left: How Adapting to Driving in the US Taught Me Startup Lessons

2024年10月20日

From Right to Left: How Adapting to Driving in the US Taught Me Startup Lessons

As an entrepreneur moving from India to the US, one of the biggest adjustments I faced was switching from right-hand to…

1 条评论
Above and Below the Line: A Mindset Shift for Success

2024年9月18日

Above and Below the Line: A Mindset Shift for Success

Where Do You Stand on the Line? Throughout life, and particularly in the entrepreneurial journey, we often find…
The Need for a Shift from Rent-Seeking to Profit-Seeking Ventures in India

2024年9月3日

The Need for a Shift from Rent-Seeking to Profit-Seeking Ventures in India

In the course of doing business, you inevitably meet a wide range of people, each engaged in different kinds of…

4 条评论

See all articles

Top 10 Essential Machine Learning Libraries of 2020

Ashish Airon

VC | Entrepreneur | BlitzBusiness Podcast | Forbes 30U30 Asia | Mentor @ Niti Aayog (Govt. of India)

1. NumPy

2. Pandas

3. Matplotlib

4. Scikit-learn

5. Tensorflow

6. Keras

7. PyTorch

8. XGBoost

9. OpenCV

10. NLTK

Conclusion

Ashish Airon的更多文章

社区洞察

其他会员也浏览了

7 Data Science Trends for 2023, Top ODSC Recordings from 2022, and Python Constants

Data Science Machine Learning Full Stack Roadmap??

The Data Science Course: Complete Data Science Bootcamp 2024 training

Move Faster your ML Pipeline

Top 12 Python Skills Every Data Scientist Should Learn

Non-linear Functional Data Analysis

Python Practice Project : Netflix Stock Data Analysis | Investing Insights | Patterns | Trends | Forecasting

Tools for Data Collection and Processing: Integrating Python, AI, and Machine Learning

Mastering XGBoost: From Basics to Advanced Techniques with a Complete Use Case

Python vs R – Who Is Really Ahead in Data Science, Machine Learning?

1. NumPy

2. Pandas

3. Matplotlib

4. Scikit-learn

5. Tensorflow

6. Keras

7. PyTorch

8. XGBoost

9. OpenCV

10. NLTK

Conclusion

Ashish Airon的更多文章

Navigating Options: Decisiveness in a World Without Rules

The Power of an Abundance Mindset: Why It Always Works Out

The Entrepreneur’s Dance with Uncertainty

From "What If" to "Even If": Changing How You Think

Why I Ignore Most Startup Advice—And Why You Should Too

What Chess Taught Me About Building a Startup

Sunrise or Sunset?

From Right to Left: How Adapting to Driving in the US Taught Me Startup Lessons

Above and Below the Line: A Mindset Shift for Success

The Need for a Shift from Rent-Seeking to Profit-Seeking Ventures in India

社区洞察

其他会员也浏览了

7 Data Science Trends for 2023, Top ODSC Recordings from 2022, and Python Constants

Data Science Machine Learning Full Stack Roadmap??

The Data Science Course: Complete Data Science Bootcamp 2024 training

Move Faster your ML Pipeline

Top 12 Python Skills Every Data Scientist Should Learn

Non-linear Functional Data Analysis

Python Practice Project : Netflix Stock Data Analysis | Investing Insights | Patterns | Trends | Forecasting

Tools for Data Collection and Processing: Integrating Python, AI, and Machine Learning

Mastering XGBoost: From Basics to Advanced Techniques with a Complete Use Case

Python vs R – Who Is Really Ahead in Data Science, Machine Learning?