登录查看更多内容

Exploring new dimensions in Data Science

Sakshi Jain

Data Analyst | Transitioning to Data Science | A/B Testing & Statistical Analysis Expert | Passionate About Data-Driven Insights

发布日期: 2019年6月24日

Big data, hadoop, Apache Spark , MongoDB all are funny but at the same time are scary words. In my journey as a Data Scientist with limited computer language knowledge, I thank Python libraries a lot to make my voyage so easy. In this article, I am describing the most common Python libraries used in data analytics.

A Python library is nothing but a collection of functions and methods that allow performing lots of actions without writing any code. These libraries have built in modules which provide different functionalities and can be used directly. Python has extensive libraries that offer a broad range of facilities. The best part is all of the libraries are open sourced. We can divide the Python libraries into three main groups.

1. Scientific Computing Libraries: The first group is “scientific computing libraries”. It is a collection of software specifically designed for scientific computing in Python. One of the most used packages to manipulate, to aggregate and to analyze data (Better known as “data wrangling”) is Pandas. Pandas is a perfect tool to use on structured data that has columns and rows with labels.

The other commonly used library is NumPy. Here, arrays are used as its inputs and outputs. The same can be extended to objects for matrices with minor coding changes for faster array processing. The beauty of NumPy is it’s ability to extend python into a high-level language for manipulating numerical data, similar to MATLAB.

Another famous package is SciPy that includes functions for some advanced math problems.

2. Python Visualization Libraries: Visualization is the best way to tell a story based on complicated numbers and processes. Python libraries make it easy to create graphs, charts and maps in a single line code. The Matplotlib package is the most known library for making highly customized graphs and plots. Another highly used visualization library is Seaborn. It is based on Matplotlib and used to create plots such as heat maps, time series violin plots etc. It is mostly used for visualization of statistical models.

3. Algorithmic libraries: Machine learning algorithms are used to develop data models for predictions. These libraries can manage from basic to complex machine learning tasks. The most used library is Scikit-learn library. It contains tools for statistical modeling, including regression, classification, clustering and so on. This library is built on NumPy, SciPy and Matplotib.

Statsmodels is also a Python module that allows users to explore data, estimate statistical models and perform statistical tests.

In the next post, we will explore these libraries in detail with some examples to enjoy data science magical world.

Piyush Agarwal, Ph.D.

Principal Scientist at Pfizer | University of Waterloo | IIT Madras

5 年

Perfect! This will surely help me with my transition from MATLAB to Python. Looking forward to your next post Sakshi :)

1 次回应

要查看或添加评论，请登录

Sakshi Jain的更多文章

From Data Analyst to Data Scientist... The Essential Role of Basic Statistics

2024年12月1日

From Data Analyst to Data Scientist... The Essential Role of Basic Statistics

About a year ago, I thought I could just jump straight into building machine learning models—because who needs stats…
C for confidence interval and C for confusion

2021年3月5日

C for confidence interval and C for confusion

“We are 95% confident that the population mean falls within the confidence interval.” I am very sure you have seen the…

3 条评论
AWS, Azure and my analytical journey to explore both players

2019年8月11日

AWS, Azure and my analytical journey to explore both players

Amazon Web Services (AWS) and Microsoft Azure are two of the biggest names in public cloud computing. The question I…

6 条评论
Python: Data type and methods at a glance

2019年7月5日

Python: Data type and methods at a glance

It is important to understand the properties of data type. Choosing the right type of data structure helps in retention…

Exploring new dimensions in Data Science

Sakshi Jain

Data Analyst | Transitioning to Data Science | A/B Testing & Statistical Analysis Expert | Passionate About Data-Driven Insights

Sakshi Jain的更多文章

社区洞察

其他会员也浏览了

Accessing Columns in PySpark: A Comprehensive Guide

Exploring Python’s Advanced Basics for Data Science

Dask vs Spark

SQL vs. Python: The Dynamic Duo of Data Science

How to use PySpark on your computer

Spark Tidbits - Lesson 8

Top Python Tools for Data Engineering

Data Science Tools for Beginners: What You Need to Know

PySpark: INTRODUCTION

Numpy For Data Science

Sakshi Jain的更多文章

From Data Analyst to Data Scientist... The Essential Role of Basic Statistics

C for confidence interval and C for confusion

AWS, Azure and my analytical journey to explore both players

Python: Data type and methods at a glance

社区洞察

其他会员也浏览了

Accessing Columns in PySpark: A Comprehensive Guide

Exploring Python’s Advanced Basics for Data Science

Dask vs Spark

SQL vs. Python: The Dynamic Duo of Data Science

How to use PySpark on your computer

Spark Tidbits - Lesson 8

Top Python Tools for Data Engineering

Data Science Tools for Beginners: What You Need to Know

PySpark: INTRODUCTION

Numpy For Data Science