Python AutoViz : Data exploration made it Easy !
In this article we will explore a powerful library in Python “AutoViz”, with this library you can help visualize any dataset of any size.
You can easily and quickly generate insightful visualizations for your data. Whether you’re a beginner or an expert in data analysis with just a single line of code!
Autoviz allows us to :
Let’s try Autoviz in Python
The first thing is to install AutoViz and import the necessary libraries of this example. %matplotlib inline is needed to display charts inline.
#Installation
pip install autoviz
#Necessary libraries
import pandas as pd
import numpy as np
#Load Autoviz
from autoviz import AutoViz_Class
%matplotlib inline
AV = AutoViz_Class()
Load the data, you can download the CSV file HERE, you can use your own dataset, then we need to specify the path of the dataset in the filename variable and put our target variable.
If you don’t have a target variable you can leave it blank.
filename = "Cars Data.csv"
target_variable = "Horsepower"
Now let’s start the visualization. I will explain the parameters below.
dft = AV.AutoViz(
filename,
sep=",",
depVar=target_variable,
dfte=None,
header=0,
verbose=2,
lowess=False,
chart_format="svg",
max_rows_analyzed=500,
max_cols_analyzed=20,
save_plot_dir=None
)
Note: you can store your file in a DataFrame and use it too
import pandas as pd
import numpy as np
#Load Autoviz
from autoviz import AutoViz_Class
%matplotlib inline
df = pd.read_csv("Cars Data.csv")
df
AV = AutoViz_Class()
filename = df
target_variable = "Horsepower"
dft = AV.AutoViz(
"",
sep=",",
depVar=target_variable,
dfte=df,
header=0,
verbose=2,
lowess=False,
chart_format="svg",
max_rows_analyzed=500,
max_cols_analyzed=20,
save_plot_dir=None
)
Let’s look at the results
First, you have the Data Quality report with different reports (image below)
领英推荐
You also have different visualizations as you can see in the image below
You can also save the chart format in HTML to have an interactive visualization. You will find the results in a folder (picture below). When you click on a file, you will be redirected to a page that allows you to have a dynamic and interactive visualization.
If you want only to see the Data Quality issue you can write.
from autoviz import data_cleaning_suggestions
data_cleaning_suggestions(df)
Autofix
You can auto-fix the data quality issues by using FixDQ from AutoViz
from autoviz import FixDQ
fixdq = FixDQ()
You can quickly access the problems (example of a duplicated row)
If you want a quick fix you can just write, with just one line of code!
fixdq.fit_transform(df)
We come to the end of the article, AutoViz is an excellent library that not only allows the visualization of data in an interactive and dynamic way but also shows data quality issues in addition to offering a quick fix to it!