A Compilation of my articles on various Data Visualisation tools
Parul Pandey
Principal Data Scientist | Co-author of Machine learning for High-Risk Applications | Kaggle Grandmaster(Notebooks)
Don’t simply show data, tell a story with it!
I have been writing in the Data Science domain for quite some time now. In fact, it has been almost six months and during this time I have written on a multitude of topics. So, while going through my previous articles, I couldn’t but ignore the fact that I have a number of articles on Data Visualisation in my kitty. So, I thought of compiling them into an article of their own which will make it easier for me to locate them and also for others who’ll be interested in them.
The various articles can be grouped on the basis of the tools used for the visualisation purpose.
Data Visualisation with Python
1. PyViz: Simplifying the Data Visualisation process in Python.
If you work with data, then Data Visualisation is a vital part of your daily routine. And if you use Python for your analysis, you ought to be overwhelmed by the sheer amount of choices present in the form of Data Visualisation libraries.
This article is an overview of the PyViz ecosystem to make data visualizations in Python easier to use, learn and more powerful.
2. Visualising Machine Learning Datasets with Google’s FACETS.
A Machine Learning dataset sometimes consists of data points ranging from thousands to millions which in turn may contain hundreds or thousands of features. Additionally, real-world data is messy comprising of missing values, unbalanced data, outliers etc. Visualising the data can help in locating these irregularities and pointing out the locations where the data actually needs cleaning.
FACETS is an open source tool from Google to easily learn patterns from large amounts of data. This tool helps us to understand the various features of data and explore them without having to explicitly code.
3. Exploratory Data Visualisation with Altair
There are a few well-developed visualization packages in Python, but they often have very imperative APIs. This means the user is required to focus more on the mechanics of the visualization — axis limits, legends, etc. — rather than the important relationships within the data. Altair is a package designed for exploratory visualization in Python that features a declarative API, allowing data scientists to focus more on the data than the incidental details.
Altair is based on the Vega and Vega-Lite visualization grammars, and thus automatically incorporates best practices drawn from recent research in effective data visualization.
4. Visualising Geospatial data with Python using Folium
The beauty of using Python is that it offers libraries for every data visualisation need. One such library is Folium which comes in handy for visualising Geographic data (Geo data). Geographic data (Geo data) science is a subset of data science that deals with location-based data i.e description of objects and their relationship in space.
This article is an overview of the Folium library to visualize Geospatial data.
5. Animations with Matplotli
Animations are an interesting way of demonstrating a phenomenon. We as humans are always enthralled by animated and interactive charts rather than the static ones. Animations make even more sense when depicting time series data like stock prices over the years, climate change over the past decade, seasonalities and trends since we can then see how a particular parameter behaves with time.
The above image is a simulation of Rain and has been achieved with Matplotlib library which is fondly known as the grandfather of python visualization packages. Matplotlib simulates raindrops on a surface by animating the scale and opacity of 50 scatter points. Today Python boasts of a large number of powerful visualisation tools like Plotly, Bokeh, Altair to name a few. These libraries are able to achieve state of the art animations and interactiveness. Nonetheless, the aim of this article is to highlight one aspect of this library which isn’t explored much and that is Animations and we are going to look at some of the ways of doing that.
DataVisualisation with R
A Comprehensive Guide to Data Visualisation in R for Beginners
R is a language and environment for statistical computing and graphics. R is also extremely flexible and easy to use when it comes to creating visualisations. One of its capabilities is to produce good quality plots with minimum codes. This article is primarily meant for beginners and deals with the visualisation capabilities of R. The article begins with basic plots and moves on to more advanced ones later in the article.
Data Visualisation with Tableau
1. Data Visualisation with Tableau
Tableau is a data analytics and visualization tool used widely in the industry today. In this tutorial, I have explained how to analyze and display data using Tableau and make better, more data-driven decisions.
This is a pretty comprehensive tableau tutorial covering all major aspects of Tableau from a beginners' point of view.dealing with all its aspects.
2. Python with Tableau
Python is a widely used general-purpose programming language, and a large number of Python libraries are available to perform statistical analysis, predictive models or machine learning algorithms. Connecting Tableau with Python is one of the best approaches for predictive analytics. Tabpy is a package developed to do the same. Tableau can connect to the TabPy server to execute Python code on the fly and display results in Tableau visualizations.
3. R with Tableau
R is a popular statistical language used to perform sophisticated analysis and predictive analytics, such as linear and nonlinear modeling, statistical tests, time-series analysis, classification, clustering, etc. R functions and models can be used in Tableau by Source
creating new calculated fields that dynamically invoke the R engine and pass values to R. The results are then returned back to Tableau for use by the Tableau visualization engine. This is accomplished by the Rserve package.
4. SQL with Tableau
Apart from the various visualization advantages that Tableau offers, it also has an amazing out of the box connection capabilities. Tableau can easily integrate with DBMS like SQL. This offers increased advantages regarding functionalities and comes in handy for Data Scientists who are used to working in SQL. Tableau provides an optimized, live connector to SQL Server so that we can create charts, reports, and dashboards while working directly with our data.
5. Spreadsheets with Tableau
Excel is one of the analytics tool widely used in the industry and the de-facto for some of us when it comes to performing various financial, mathematical, and statistical operations. Tableau, on the other hand, is also a data analytics and a BI tool.
The focus of this article is not to compare the two but to use both in conjunction to bring out the best. A lot of people with great backgrounds in excel find it particularly hard to switch to a new tool for analysis. Through this article, I will aim to show how simple it is to replicate some of the most common functions of Excel in Tableau, thereby taking the analysis to the next level with flexible and responsive analytics.
6. Word Clouds in Tableau
A Word cloud, also known as a Tag cloud, is a visual representation of text data, typically used to depict keyword metadata (tags) on websites or to visualize free form text[Wikipedia]. Word clouds are a popular type of infographic with the help of which we can show the relative frequency of words in our data. This can be depicted either by the size or the color of the chosen fields in the data. They are a pretty powerful feature to draw attention to your presentation or story.
Tableau provides a native feature to create Word Clouds with a few mouse clicks and this is exactly the focus of this article. In addition, we shall also see some of the visual best practices with respect to using word clouds.
7. Recreating Gapminder in Tableau: A Humble tribute to Hans Rosling
In this article, I have tried to recreate Hans Rosling’s famous visualisation to analyse how Life Expectancy in years (health) and GDP per capita (wealth) have changed over time in the world for various countries.
Drag and Drop Visualisation tools
If you are just getting started in Data Visualisation, have no experience in art or graphic design, don’t want to code and want to start instantly making graphs or maps, then this article is just for you. This article tries also bring to light tools other than the popular one’s tools like Tableau Public, PowerBI and Google Charts, which are used quite commonly in the Data Science ecosystem.
So, here are 10 Free tools to get started with Data Visualisation-Easily & Instantly.
The following tools have been covered in the article
All the data and files related to the above articles have been consolidated into a single Github Repository titled Data-Visualisation-libraries.
Data Visualisation is not merely a tool, it's an art of storytelling. A story told with data can change the way we see the world, creating a conviction that may even call us to action. Hopefully, you will find the above articles and libraries of use to make more data-driven decisions.
Data and Business Analyst @ Coursera | Python | RStudio | Power BI | Tableau | SQL | Excel
5 年Thanks for the nice compilation.
Senior Software Test Engineer || Helping organisations build robust and efficient tests.
5 年Amazing compilation ??
Sr. Product Manager at Telstra
5 年Nice compilation Parul Pandey. Keep up the good work.
Python | Design | Arch | Data Science
5 年I am always amazed at Parul Pandey patience when it comes to writing so crisp articles. Take a bow!