Data Visualization -Gather, Analyse & Summarize Through Tableau Analytics
Since the dawn of civilization to the year 2003, humans have produced five exabytes of data as per Google and presently now we generate almost 2.5 exabytes of data every single day. The massive inflow of data (structured and unstructured) is very challenging in this current digital economy to make information more meaningful for our daily decision making.
Data visualization is the visual representation of exploring, making sense and communicating data. Quantitative data is more perceived and understood when values are displayed graphically than traditional table format and our brain most effectively works in that way.
Vision is our most dominant sense about perceiving the world around us. Vision works with our brain to perform data sensing and processing tasks to understand the current pattern and making all the future predictions.
Data exploration, sense making and thinking involve an intimate collaboration. Data needs to be presented visually to audience so they can see quantitative information in context and engage.
We will discuss few basics to advanced level visualization techniques below which you can use on your dynamic dashboard to make your viz more interesting and meaningful.
We can see below the sales values of a business per month from the year 2015 through 2018. In the broader level, the below bar chart does not show visually to detect the missing data for a month or which month's data is missing.
A good data analyst's job is to drill down the exiting data and investigate missing values and put commentaries with the rational behind it.
For the below example, we can see that few months' sales figures are missing. Tableau has the inbuilt functionality which is called 'show missing values' which have showed us those missing months where no data was captured while exported in the business intelligence software (Tableau).
You can take the help the process of data densification through 'running total' functionality where you can run a table calculation across a padded range and Tableau then in essence makes up data to fill in that placeholder. The sales figure does not change as there are no data. Therefore, it kept the previous month's last available data.
The below example is one area where I want to show that a proper visualization will give you a true picture. We generally presume that the more engagement rate on your company's website will lead to more prospective business or more membership.The objective of any business is to increase the number of users or potential customers through genuine engagement who will turn into a genuine revenue generating customer in the future.
As per the below viz, number of subscribers have been experiencing a continuous growth since Jan 13. However, if you look from the compound growth rate, the rate is declining over time. This is a very crucial visualization which will help you to identify the root-cause analysis behind this declining trend. This point of reference can lead you to revise your promotional, marketing, operational or strategic level decision making to turn the declining trend into an upward trend.
In the below text table, we can see the sales amount per segment for each region. We wanted to see the actual sales per segment and the fixed value for the region through the below LOD expression { FIXED [Region] : SUM([Sales])}.
Now let's discuss regarding one of the very famous and mostly used chart in our business world. The Pareto Principle is commonly known as 80–20 rule or the law of the vital few or the principle of factor sparsity. The name came from an Italian economist Vilfredo Pareto who discovered this principle from his observation that the 20% land owners owned 80% of the whole Italian land in the 19th Century.
This Pareto chart analysis could be an eye opener for decision makers as it will help you to identify unproductive investments or process while can be put more focus on productive or profitable factors through visualization.
Interactive Quadrant Chart with Parameter Actions is another very useful for segmenting data in different quadrants. In the below example, we have analysed profit ration vs discount applied to customers. Thereafter, we have created parameters and the rules for quadrants. We can change the value of those parameters to see the product lines profit ratio against the discount applied at different quadrants.
Now we are going to discuss on 'Butterfly Chart' (known as Tornado Chart as well) is a similar kind of bar chart where two sets of data series are displayed side by side to give you a quick glance of the difference between two groups from the central axis in a reversed position with same parameters.
In our below example, we have visualized the workforce population contribution percentage in a particular year grouped by different age segments and gender. We can see side by side of a particular age group's workforce contribution percentage in Australia by male and female.
In our next chart we will discuss regarding a most popular chart, Radar Chart also known as Spider Chart, Web Chart, Polar Chart, Star Plots.
A radar chart is a graphical presentation of multivariate data in a two dimensional chart with multiple quantitative variables represented on axes starting from the same point. The relative position and the angle of the axis are typically uninformative. All axes are arranged radically with equal distances between each other maintaining the same scale between all axes.
You can see from below chart which variables have similar values and which variables have value like outlier. Radar Chart is useful for seeing variables which are scoring high or low for displaying performance.
There is one big drawback of this chart is if there are multiple polygon, the top polygon covers all the other polygons underneath of this. Therefore, it is always advisable that radar chart need to keep simple limiting number of variables.
Bump charts are an effective way to present the ranking of dimensions across different measures. For an example, if you are trying to show the sales rank of your products based on various regions over the time (yearly, quarterly or monthly) bump chart is the best viz to choose. You will be able to analyse your data from different dimensions such as seasonality impact, advertising promotion campaign, various statistics regarding players or games.
Control Chart also known as Shewhart chart, statistical process control chart used to study how a process changes over time period. The below control chart has a central line, average, along with the upper control limit, and a lower line for the lower control limit. These lines are determined from historical data and by comparing the recent data from these lines draw a conclusion whether the profit for the quarter is consistent (or in control) or out of control (unpredictable due to special events). You can also see various scenarios changing standard deviation from one to three to understand the outliers.
Viola Chart is an another kind of advanced data visualization technique to see the performance or response grouped into different quin tiles over other dimensions. We can analyse the statistics and comparatives of different segment's performance or response of each quintile to understand which quintile has performed best or worst.
There is an another good viz like symbol map which represents quantitative values based on Latitude and Longitude coordinates or location names (if recognized by Tableau). Larger the symbol (circle), higher is the profit ratio. This map visualization clearly shows which countries have higher profit ratio and which countries have the consumer for all three segments for future market expansion potential. You can add all your data you would like to appear in the 'Tooltip' section.So, you can view on the each circle once you put your cursor on each desired circle on the map.
If you have a huge amount of data, you can cluster them through colour code to make them visually identifiable and distinguished as per your requirements.
When mapping large point datasets, it's common to see your points overlap one another. But what happens when most of the points within the map are covering one another? The map starts to look like noise, rather than an informative narrative about the data.
Hexbin Chart: Hexbin chart groups two sets of numeric values into groups of similar values in a simple way. This is an another great way to show concentration in an area.The Hexbin functions in Tableau provides a straightforward translation of the field values to the centre of the nearest hexagon in the visualization. There is a new function in Tableau v9, HexbinX and HexbinY simply calculate the values of the centroid of each hexagon – which allows you to use shape marks to display the actual hexagons.
Have a look at this blog to have a better understanding how Hexbin map works.
One of the most advanced visualization technique is Sankey diagrams which are named after the Irish engineer Captain Matthew H.R. Sankey. The directed flow (have a width proportional to the flow quantity visualized ) is always drawn between at least two nodes in the Sankey diagrams which shows not only flow values but also regarding the structure and distribution of the defined system. They are the best alternative to bar or pie charts. They are very popular in energy management, process engineering and various business data visualization.
Likert Scale , sometimes known as a rating scale visualization is the best for survey data visual presentation so far. Steve Wexler, who first introduced this technique on survey data. Likert Scale measures the ratings (scoring) of various survey data on various topics and pivot them to get the average score on various dimensions.
You can find few good viz on this Likert Scale charts in this link.
For accountants or business analysts, visualizing income statement month by month for net sales, net profits and profit margin can be an effective way to track the bottom line of each month's performance. We can track the profit margin performance by each month like below.
Tableau animation is a great feature which is mainly used for analyzing the seasonal trend or simply placing measures using time series such as months, quarters or years. Each interval can give you the ability of having an individual page. Just placing the time measure on the page section will produce animation for you.
Clustering and trend analysis: clustering is one of the viz, you can think about in your business analysis for segmentation purpose or analyzing through 'Why' questions until you identify the root-cause of any problem. Clustering analysis is a very effective approach for predictive and prescription analytics.
In the below example we have done clustering analysis based on the sales and profit ratio to see the significance of the relationship. In our below model, we have created four clusters. The fourth group of the cluster which was an extreme outlier, excluded from the visualization.
The linear regression analysis output:
R-square represents about how much variation is explained by the model. So 0.83 R-square means that this model explains 83% of variation within the data. The greater R-square the better the model which varies from 0 to 1. The p-value above tells us regarding the F statistic hypothesis testing of the "fit of the intercept-only model and this model are equal." So if the p-value is less than the significance level (usually 0.05 which is <.0001 in this model) which tells us that our model fits the data well.
What happens about the R-square and the p-value if we do not make any cluster after excluding all negative values. The R-square in this case (55%) did not improve from the previous model (84%), however the p value remains highly significant being less than .00001.