AUTOVIZ - Python package

AUTOVIZ - Python package

Introduction:

AutoViz is a Python package that is designed to simplify the process of data visualization. It is an open-source package and is a part of the AutoML suite of tools. AutoViz is specifically designed to make it easier for data scientists, machine learning engineers, and data analysts to quickly and easily visualize their data without the need for extensive coding or knowledge of visualization techniques.

The main goal of AutoViz is to automate the visualization process so that data scientists and other users can focus on the insights provided by the data rather than the technicalities of creating a visualization. This is particularly useful for those who are new to data visualization or do not have the time or expertise to create custom visualizations from scratch.

AutoViz is designed to work with any type of data, including structured and unstructured data, and can be used with any machine learning or data analytics tool. It is also compatible with popular Python libraries such as Pandas, Numpy, and Scikit-learn.

Automatically visualize any dataset, any size with a single line of code. Officially released on Aug 5, 2019 with version 0.0.1. From version 0.1.50 onwards, AutoViz now automatically analyses dataset and provides suggestions for how to clean the data set. It detects missing values, identifies rare categories, finds infinite values, detects mixed data types, and so much more. This will help in tremendously speed up the data cleaning activities.

AutoViz Features:

Automatic Visualization: One of the main features of AutoViz is its ability to automatically generate visualizations based on the data provided. This means that users do not need to manually create visualizations or choose which type of visualization to use. AutoViz will analyse the data and automatically choose the most appropriate visualization based on the data type, distribution, and other factors.

Wide Range of Chart Types: AutoViz supports a wide range of chart types, including scatter plots, line charts, histograms, box plots, heat maps, and more. This makes it easy to explore and visualize different types of data and to compare data across different dimensions.

Customization: Although AutoViz is designed to automate the visualization process, it also allows for customization. Users can modify the visualization settings and styles, such as the color scheme, title, and axis labels. This makes it possible to create custom visualizations that fit the specific needs of the data and the analysis.

Data Exploration: In addition to visualizing the data, AutoViz also provides tools for exploring the data. This includes summary statistics, correlations, and data distribution analysis. These features can help users to identify patterns and trends in the data and to gain a deeper understanding of the data set.

Time-Saving: AutoViz is designed to save time and effort in the data visualization process. By automating the visualization process, users can focus on the insights provided by the data rather than the technicalities of creating a visualization. This can be particularly useful for those who are new to data visualization or do not have the time or expertise to create custom visualizations from scratch.


How to use AutoViz:

AutoViz is easy to use and can be installed and imported into Python just like any other Python package. Once installed, users can simply import the AutoViz class and pass their data set to the AutoViz constructor. AutoViz will automatically generate a set of visualizations based on the data provided.

How to install AutoViz:

pip install autoviz

Usage:

from autoviz.AutoViz_Class import AutoViz_Class

AV = AutoViz_Class()

Load a dataset (any CSV or text file) into a Pandas dataframe or give the name of the path and filename you want to visualize. If you don't have a filename, you can simply assign the filename argument "" (empty string).

Call AutoViz using the filename (or dataframe) along with the separator and the name of the target variable in the input.

filename = ""

sep = ","

dft = AV.AutoViz(filename, sep=",", depVar="",

  dfte=None, 

  header=0,

  verbose=0,

  lowess=False,

  chart_format="svg",

  max_rows_analyzed=150000,

  max_cols_analyzed=30,

  save_plot_dir=None)

No alt text provided for this image

AV.AutoViz is the main plotting function in AV. Depending on what chart_format you choose, AutoViz will automatically call either the AutoViz_Main function or AutoViz_Holo function.

Notes:

  • AutoViz will visualize any sized file using a statistically valid sample.
  • COMMA is assumed as default separator in file. But you can change it.
  • Assumes first row as header in file but you can change it.
  • verbose option
  • if 0, display minimal information but displays charts on your notebook
  • if 1, print extra information on the notebook and also display charts
  • if 2, will not display any charts, it will simply save them in your local machine under AutoViz_Plots directory under your current working folder.
  • chart_format option
  • if 'svg','jpg' or 'png', displays all charts or saves them depending on verbose option.
  • if 'bokeh', plots interactive charts using Bokeh on your Jupyter Notebook
  • if 'server', will display charts on your browser with one chart type in each tab
  • if 'html', will create bokeh interactive charts and silently save them under AutoViz_Plots directory or any directory you specify in the save_plot_dir setting.
No alt text provided for this image

Arguments

  • filename - Make sure that you give filename as empty string ("") if there is no filename associated with this data and you want to use a dataframe, then use dfte to give the name of the dataframe. Otherwise, fill in the file name and leave dfte as empty string. Only one of these two is needed to load the data set.
  • sep - this is the separator in the file. It can be comma, semi-colon or tab or any value that you see in your file that separates each column.
  • depVar - target variable in your dataset. You can leave it as empty string if you don't have a target variable in your data.
  • dfte - this is the input dataframe in case you want to load a pandas dataframe to plot charts. In that case, leave filename as an empty string.
  • header - the row number of the header row in your file. If it is the first row, then this must be zero.
  • verbose - it has 3 acceptable values: 0, 1 or 2. With zero, you get all charts but limited info. With 1 you get all charts and more info. With 2, you will not see any charts but they will be quietly generated and save in your local current directory under the AutoViz_Plots directory which will be created. Make sure you delete this folder periodically, otherwise, you will have lots of charts saved here if you used verbose=2 option a lot.
  • lowess - this option is very nice for small datasets where you can see regression lines for each pair of continuous variable against the target variable. Don't use this for large data sets (that is over 100,000 rows)
  • chart_format - this can be 'svg', 'png', 'jpg' or 'bokeh' or 'server' or 'html'. You will get charts generated (inline with verbose=0 or 1 option). Instead you can silently save them in multiple formats if you used verbose=2 option. The latter options are useful for interactive charts.
  • max_rows_analyzed - limits the max number of rows that is used to display charts. If you have a very large data set with millions of rows, then use this option to limit the amount of time it takes to generate charts. We will take a statistically valid sample.
  • max_cols_analyzed - limits the number of continuous vars that can be analyzed
  • save_plot_dir - directory you want the plots to be saved. Default is none which means it is saved under the current directory under a sub-folder named AutoViz_Plots. If the save_plot_dir does not exist, it creates it.
No alt text provided for this image

Sep-2022 Update: AutoViz now provides data cleansing suggestions! #autoviz #datacleaning

From version 0.1.50 onwards, AutoViz now automatically analyses the dataset and provides suggestions for how to clean your data set. It detects missing values, identifies rare categories, finds infinite values, detects mixed data types, and so much more. This will help you tremendously speed up your data cleaning activities. If you have suggestions to add more data cleaning steps please file an Issue in our GitHub and we will gladly consider it. Here is an example of how data cleaning suggestions look:

No alt text provided for this image

In order to get this latest function, you must upgrade autoviz to the latest version by:

pip install autoviz --upgrade

In the same version, you can also get data suggestions by using AV.AutoViz(......, verbose=1) or by simply importing it:

from autoviz import data_cleaning_suggestions

data_cleaning_suggestions(df)

Dec-23-2021 Update: AutoViz now does Wordclouds! #autoviz #wordcloud

AutoViz can now create Wordclouds automatically for your NLP variables in data. It detects NLP variables automatically and creates wordclouds for them. 

No alt text provided for this image

Dec-17-2021 AutoViz now uses HoloViews to display dashboards with Bokeh and save them as Dynamic HTML for web serving #HTML #Bokeh #Holoviews

Now you can use AutoViz to create Interactive Bokeh charts and dashboards (see below) either in Jupyter Notebooks or in the browser. Use chart_format as follows:

  • chart_format='bokeh': interactive Bokeh dashboards are plotted in Jupyter Notebooks.
  • chart_format='server', dashboards will pop up for each kind of chart on your web browser.
  • chart_format='html', interactive Bokeh charts will be silently saved as Dynamic HTML files under AutoViz_Plots directory
No alt text provided for this image

Dec 21, 2021: AutoViz now runs on Docker containers as part of MLOps pipelines. Check out Orchest.io

We are excited to announce that AutoViz and Deep_AutoViML are now available as containerized applications on Docker. This means that you can build data pipelines using a fantastic tool like orchest.io to build MLOps pipelines visually.

No alt text provided for this image


Naveen Jangra

Helping brands grow with impactful creative strategies | ?? Influencer | ?? Content Writer | Personal Branding

1 年

Thanks for sharing

回复
Abdul Salam

Sales And Marketing Specialist | Creative Agencies | Online Advertising | Collaboration | Brand Promotion | AI | Content Creator

1 年

Thanks for sharing

回复
Aakash Verma

Sharing insights on AI | Marketing |11K+ Followers Twitter(X) | | Top AI Voice | Under Top 100 Educational Content Creator | Open to Collaboration

1 年

???????????? ?????? ???????????????

回复

要查看或添加评论,请登录

360DigiTMG的更多文章

社区洞察

其他会员也浏览了