Python AutoViz : Data exploration made it Easy !
Pyython Autoviz.Image source : https://twitter.com/_jaydeepkarale/status/1631655422208917504

Python AutoViz : Data exploration made it Easy !

In this article we will explore a powerful library in Python “AutoViz”, with this library you can help visualize any dataset of any size.

You can easily and quickly generate insightful visualizations for your data. Whether you’re a beginner or an expert in data analysis with just a single line of code!

Autoviz allows us to :

  • With just one line of code, you can effortlessly generate multiple informative plots, streamlining the visualization process and eliminating the need for lengthy code.
  • AutoViz can handles large datasets by intelligently sampling data, ensuring quick and efficient visualization generation without compromising insights.
  • User-friendly for beginners and non-experts, abstracting complexities of plotting libraries.
  • AutoViz automatically identifies and addresses Data Quality issues, accelerating the transition from insights to actionable decisions.

Let’s try Autoviz in Python

The first thing is to install AutoViz and import the necessary libraries of this example. %matplotlib inline is needed to display charts inline.

#Installation
pip install autoviz

#Necessary libraries

import pandas as pd
import numpy as np

#Load Autoviz
from autoviz import AutoViz_Class
%matplotlib inline

AV = AutoViz_Class()        

Load the data, you can download the CSV file HERE, you can use your own dataset, then we need to specify the path of the dataset in the filename variable and put our target variable.

If you don’t have a target variable you can leave it blank.

filename = "Cars Data.csv"
target_variable = "Horsepower"        

Now let’s start the visualization. I will explain the parameters below.

dft = AV.AutoViz(
    filename,
    sep=",",
    depVar=target_variable,
    dfte=None,
    header=0,
    verbose=2,
    lowess=False,
    chart_format="svg",
    max_rows_analyzed=500,
    max_cols_analyzed=20,
    save_plot_dir=None
)        

  • sep: File separator (comma, semi-colon, tab, or any column-separating value). I used “,” because it’s a CSV file.
  • depVar: Target variable in your dataset; leave empty if not applicable.
  • dfte: Input data frame for plotting charts; leave empty if providing a filename. (example below)
  • header: Row number of the header row in your file (0 for the first row).
  • verbose: 0 for minimal info and charts, 1 for more info and charts, or 2 for saving charts locally without display.
  • chart_format: ‘svg’, ‘png’, ‘jpg’, ‘bokeh’, ‘server’, or ‘html’ for displaying or saving charts in various formats, depending on the verbose option.
  • lowess: Use regression lines for each pair of continuous variables against the target variable in small datasets; avoid using large datasets (>100,000 rows)
  • max_rows_analyzed: Limit the max number of rows for chart display, particularly useful for very large datasets (millions of rows) to reduce chart generation time. A statistically valid sample will be used.
  • max_cols_analyzed: Limit the number of continuous variables to be analyzed.
  • save_plot_dir: Directory for saving plots. The default is None, which saves plots under the current directory in a subfolder named AutoViz_Plots. If the save_plot_dir doesn’t exist, it will be created.

Note: you can store your file in a DataFrame and use it too

import pandas as pd
import numpy as np

#Load Autoviz
from autoviz import AutoViz_Class
%matplotlib inline

df = pd.read_csv("Cars Data.csv")
df

AV = AutoViz_Class()

filename = df
target_variable = "Horsepower"

dft = AV.AutoViz(
    "",
    sep=",",
    depVar=target_variable,
    dfte=df,
    header=0,
    verbose=2,
    lowess=False,
    chart_format="svg",
    max_rows_analyzed=500,
    max_cols_analyzed=20,
    save_plot_dir=None
)        

Let’s look at the results

First, you have the Data Quality report with different reports (image below)

Data Quality.Image source Walid Soula

You also have different visualizations as you can see in the image below

Autoviz.Image source Walid Soula

You can also save the chart format in HTML to have an interactive visualization. You will find the results in a folder (picture below). When you click on a file, you will be redirected to a page that allows you to have a dynamic and interactive visualization.

Types of data visualization.Image source: Walid Soula


If you want only to see the Data Quality issue you can write.

from autoviz import data_cleaning_suggestions
data_cleaning_suggestions(df)        

Autofix

You can auto-fix the data quality issues by using FixDQ from AutoViz

from autoviz import FixDQ
fixdq = FixDQ()        
fixdq.Image source: Walid Soula


You can quickly access the problems (example of a duplicated row)

fixdq.Image source: Walid Soula


If you want a quick fix you can just write, with just one line of code!

fixdq.fit_transform(df)        

We come to the end of the article, AutoViz is an excellent library that not only allows the visualization of data in an interactive and dynamic way but also shows data quality issues in addition to offering a quick fix to it!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了