Getting Started With Data Analysis in Python

Pavithra Nagaraj

6G, AI Researcher | Founder- Paaru Wireless | Director- Women in 6G?

发布日期: 2018年10月4日

Python is a great programming language for doing data analysis , primarily because of the fantastic ecosystem of data-centric packages. Pandas is one of those Python packages aimed to provide fast and flexible data structures designed to make working with data much easier and intuitive.

Do you want to load an .csv or excel file and easily manipulate the data in it?

Do you want to replace missing values on your data or ignore them all together?

Do you want a quick statistic summary of your data?

Well, pandas got it all covered. It provides a set of tools to make working with data simple and efficient.

The topics in this post will enable you to:

1.Load your data into a Python Pandas DataFrame.

2. Examine the basic statistics of the data.

3. Modify the values

4. Finally output the result to a new file.

Loading Data:

Loading data with pandas is quite easy. The library provides methods to load data from Excel files(xls, xlsx), csv, json and others. For this example i will be using the data available in .csv (comma seperated value) file.

In order to load the data, we'll need to use the .read_csv function. This function will take in a csv file and return a .DataFrame object, a table like data structure that will make it easier for us to manipulate the data set and extract information. From now on ufo will be the representation of our DataFrame.

Viewing data:

Pandas provides some methods to visualize the data we are working on.

ufo.head( )

Used to visualize the first few rows on our DataFrame, the default value is 5.

ufo.tail( )

Similar to df.head(), tail will return the last few rows on our DataFrame.

ufo.describe( )

Describe shows a quick statistic summary of your data, on the numeric columns.

Describe function can also be used to see some of the core statistics about a particular column. Select a column to describe using a string inside the [] braces, and call describe() as follows:

ufo.shape and ufo.ndim

The shape command gives information on the data set size – ‘shape’ returns a tuple with the number of rows, and the number of columns for the data in the DataFrame. Another descriptive property is the ‘ndim’ which gives the number of dimensions in your data, typically 2.

Selecting and Manipulating Data:

Selecting Columns

There are 3 main methods of selecting columns in pandas:

using a dot notation, e.g. data.column_name,
using square braces and the name of the column as a string, e.g. data['column_name']
using numeric indexing and the iloc selector, e.g. data.iloc[:, <column_number>]

Selecting Multiple Columns

Selecting multiple columns at the same time extracts a new DataFrame from your existing DataFrame. For selection of multiple columns, the syntax is:

using square-braces with a list of column names, e.g. data[['column_name_1', 'column_name_2']]
using numeric indexing with the iloc selector and a list of column numbers, e.g. data.iloc[:, [0,1,20,22]]

Selecting Rows

Rows in a DataFrame are selected, typically, using the iloc/loc selection methods, or using logical selectors (selecting based on the value of another column or variable).

The basic methods are:

numeric row selection using the iloc selector, e.g. data.iloc[0:10, :] – to select the first 10 rows.
label-based row selection using the loc selector (this is only applicably if you have set an “index” on your dataframe. e.g. data.loc[23, :]
logical-based row selection using evaluated statements, e.g. data[data["City"] == "Ithaca"] – select the rows where City value is ‘Ithaca’.

We can also filter multiple values, using the builtin function ufo.isin()

Removing or deleting the data

To delete rows and columns from DataFrames, Pandas uses the "drop” function.

Removing columns - To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1.

Removing rows -To delete a row, or multiple rows, use the label of the row(s), and specify the “axis” as 0.

We can also use pandas ufo.dropna() to remove incomplete data from our DataFrame.

To remove the rows with incomplete or missing data

To remove the columns with incomplete or missing data

Exporting and Saving Pandas DataFrame:

After manipulation, saving your data back to csv format is the next step. Data output in Pandas is as simple as loading data.

Pandas is really a powerful and fun library for data manipulation / analysis, with easy syntax and fast operations. This introductory article is just the tip of the iceberg, it is possible to do much more with pandas by exploring rest of the tools.

Happy Learning!! Happy Coding!!

To stay up-to-date on my posts and Articles, click the follow button at the top of this post.

If you like my articles, do share your thoughts in the comments section below as I learn just as much from you as you do from me.

Getting Started With Data Analysis in Python

Pavithra Nagaraj

6G, AI Researcher | Founder- Paaru Wireless | Director- Women in 6G?

Loading Data:

Viewing data:

Selecting and Manipulating Data:

Exporting and Saving Pandas DataFrame:

更多精彩文章

社区洞察

其他会员也浏览了

Basic Data Preparation with Python (Stock Price Data)

Python Data File Formats – How to Read CSV, JSON, and XLS Files

Do You Read Excel Files with Python? There is a 1000x Faster?Way

Creating Word Clouds Using Python

Dynamic ways of creating variables in python

Python Tools for a Beginner Data Scientist

Data Structures in Python

Dictionaries in Python

Python

Day 10: Basic Python Programming for Data Science

Loading Data:

Viewing data:

Selecting and Manipulating Data:

Exporting and Saving Pandas DataFrame:

Celebrating Historic Firsts: Women Steer the Course of Telecom Standardization Organizations and Regulatory Bodies.

2023年11月7日

Breaking the 6G Ceiling: Men in 5G/6G offer their Advice for Women Aiming to Conquer 6G Careers.

2023年10月6日

Women In 6G Share Advice And Encouragement For Those Following In Their Footsteps

2023年9月25日

Gender Gap in 6G related Careers: Why and What can be done?!

2023年9月10日

No age is the right age for a woman to be in a leadership position?!

2023年7月5日

5G Ecosystem Cycle

2020年1月16日

One of its kind, 5G Glossary- which helps you cut through the jargon and dive deep into 5G technology!!

2019年12月30日

5G New Radio: Frame Structure Overview

2018年8月10日

5G: A Real Game Changer For Smart Cities?!

2018年7月5日

LinkedIn Hacks/Tips - Here's How you can score a All-Star Rating and network effectively!!

2018年7月2日

社区洞察

其他会员也浏览了

Basic Data Preparation with Python (Stock Price Data)

Python Data File Formats – How to Read CSV, JSON, and XLS Files

Do You Read Excel Files with Python? There is a 1000x Faster?Way

Creating Word Clouds Using Python

Dynamic ways of creating variables in python

Python Tools for a Beginner Data Scientist

Data Structures in Python

Dictionaries in Python

Python

Day 10: Basic Python Programming for Data Science