Data Analysis In Python

Data Analysis In Python

Data has been an integral part of business in their day-to-day operations to keep track and derive insights from it. Can you imagine your company without data? — Impossible.

But where does this start? I mean, how do you start handling such complex data , interpret it and derive insights from it? — Data Analysis is the answer!

Python is the top most useful programming language to perform data analysis. Powerful libraries like Pandas and NumPy makes it easier to perform efficient data analysis.

In this article, we will export its features and see its usage using Python code.

Now let’s get started!

Prerequisites

An Introduction to NumPy

NumPy — which stands for Numerical Python is a an open source library for performing numerical computing and data manipulations including linear algebra, statistics and fourier transforms.

Key Features of NumPy

  1. Includes a wide range of mathematical operations such as sum, mean, median, and standard deviation.
  2. If you compare NumPy arrays and Python lists, NumPy arrays is a sure short winner due to optimized memory usage and speed.
  3. NumPy provides functions to solve matrix multiplication, linear equations and much more!

Implementing NumPy

Importing NumPy In Python Code

To import this library, simply use this import in Python and the name of the library you wish to import.

import numpy as np        

Create A NumPy Array

In Python, NumPy array is created using array() of numpy library.

The array() takes an array of integers.

data = np.array([1, 2, 3, 4, 5])        

Perform operations

It’s time to perform some operations! The numpy library offers methods like mean() to calculate the mean, std to calculate standard deviation etc. Let’s see this in action. Here is the full code.

Screenshot taken by author

We are simply passing the data` variable which has the array defined.

Output

Screenshot taken by author

See how simply it is, isn’t it? ??

Let’s move forward to see what are Pandas.

Introduction to Pandas

Many a times, we are in a situation where you want to analyse your data and perform manipulations based on business requirements — Pandas is the solution.

It provides two primary data structures:

  • A one-dimensional labelled array known as “Series” that can hold data of any type.
  • A two-dimensional labelled data structure known as “DataFrame” that has columns of different types.

Key Features of Pandas

  1. Good candidate for handing duplicate and inconsistent data.
  2. Can perform operations like merging, grouping and pivoting.
  3. Can read from and write to various destination file formats like CSV, Excel, JSON etc)
  4. Are also integrated with NumPy arrays for numerical operations.

Implementing Pandas

Importing Pandas in Your Python Code

Pretty much similar how we imported NumPy.

import pandas as pd        

Creating A DataFrame

In order to create a DataFrame, we would first create a dictionary where keys represent column names with its corresponding data as values.

Next, pandas provide a method called DataFrame() that converts the dictionary into a structured table.

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Salary': [50000, 60000, 70000]}
df = pd.DataFrame(data)        
So in short, DataFrame means a structured table

Performing Operations

Screenshot taken by author

We would specifically access the Salary column and calculating the mean of the data provided, we would analyse our data using describe().

Output

This is how the mean would look like, followed by describe that describes all the parameters required.

Screenshot taken by author

Don’t get confused with the term mean. It literally means average. And we all know how to calculate average of numbers.

Screenshot taken by author

Combining NumPy and Pandas for Data Analysis

Having learnt the basics about NumPy and Pandas, it’s now time to see an example of combination of both.

NumPy is excellent for numerical computation. Pandas can be used for data manipulation and visualization.

Screenshot taken by author

  • Here, we have first imported required libraries. In our case, it is numpy and pandas.
  • Created a dictionary data with three columns “A,B, and C”.
  • Notice, the column B contains a NaN which means Not a Number. This represents missing data.
  • DataFrame() of pandas converts this dictionary into a table structure i.e., DataFrame.
  • The nanmean() from numpy calculates the mean provided in column B and ignores the NaN values.
  • The fillna(), which is again from numpy is a useful method that replaces the NaN value with calculated mean value that we obtained from the result of nanmean().
  • Finally, the sum() calculates the sum of each row and added as a new column Sum in the DataFrame and its value is printed.

Output

Screenshot taken by author

Common Use Cases Of These Libraries

These open source libraries in Python are widely used for various tasks which I have listed below:

Screenshot taken by author

This table summarizes some of the common use cases in day-to-day business operations.

Conclusion

Data Analysis is very vast topic in itself. I have provided you the basic explanation of how to implement it. Your minds would explode if I put everything into one — which I really don’t want.

I would further write more articles that demonstrates a real-world example, but I felt it is necessary to understand the basics first.

Pandas and Numpy are great tools to tackle a wide range of data-related tasks. Whether you are cleaning data, performing statistical analysis, or preparing data for visualization, these libraries will make your work more efficient and enjoyable.


Enjoyed This Article? ??

If you enjoyed this article and if you feel I was able to teach you some basics, please show your appreciation with a like.

Feel free to leave a comment below with your thoughts, experiences, or any additional tips on setting boundaries.

要查看或添加评论,请登录

Rohan Rao的更多文章

  • Top 10 Tricks to Master JavaScript

    Top 10 Tricks to Master JavaScript

    When it comes to the most powerful and versatile programming languages in the world, especially for web development…

    2 条评论
  • Top 15 JS Useful Array Methods

    Top 15 JS Useful Array Methods

    JavaScript is a very popular scripting, single-threaded, dynamic, object-oriented language used to create interactive…

  • Building A CRUD Application In Angular

    Building A CRUD Application In Angular

    Angular, a type-script based open-source single-web application framework developed by Google has gain its popularity…

社区洞察

其他会员也浏览了