Data Visualisation: What do you want to Achieve?
Pic Credit: From Wikimedia Commons, the free media repository

Data Visualisation: What do you want to Achieve?

While working on numerous types of data analytics projects in the last 2 decades I have always wondered which visualization type is the best for the business users and why? What are the various factors that we can keep in mind while deciding upon one? And just few days ago while speaking with one of my close friends, I realized that there is still a gap, we don't have to have a certification to decide on basic visualization types. The trick is to select the one that will best represent your data’s message and story.

Small take on different visualization types (I am covering 6 types here in this write up). There are few that I have used really well in the past like Radar, Population Pyramid, Geospatial, Histogram and Venn Diagram etc.

Even before we start let us take note of few relevant questions, what do you want to achieve?

  1. Do you want to compare values?
  2. Do you want to show the composition of something?
  3. Do you want to better understand the relationship between value sets?
  4. Do you want to analyse the trends in your data set?
  5. Do you want to understand the distribution of your data??

Now let us get going with few of the charts/graphs type.

Type: Bar Chart; Functions: Comparative, Patterns; Also known as: bar graph or column graph

A bar chart displays categorical data with rectangular bars whose length or height corresponds to the value of each data point.

The classic Bar Chart uses either horizontal or vertical bars (column chart) to show discrete, numerical comparisons across categories. One axis of the chart shows the specific categories being compared and the other axis represents a discrete value scale.

Bar charts use volume to demonstrate differences between each bar. Because of this, bar charts should always start at zero. When bar charts do not start at zero, it risks users misjudging the difference between data values.

No alt text provided for this image

ways

  • Use independent categories of data.
  • Label each category of data.
  • When visualizing data between 0% and 100%, start tick marks at 0%.

Never

  • Omit the space between bars – otherwise the bar chart will appear to be a histogram.
  • Use three-dimensional (3D) graphics as they distort the visual calculation of volume.

Recommended

  • Use bar charts when comparing large changes in data values.
  • Limit the number of bars or else a bar chart becomes difficult to understand.

Type: Bubble Chart; Functions: Corelation, Comparisons, Data over time, Distribution, Patterns, Proportions, Relationships

A Bubble Chart is a multi-variable graph that is a cross between a Scatterplot and a Proportional Area Chart. A bubble chart consists of a series of values that are plotted on an x-axis and y-axis, with each axis representing a variable and each value represented as a dot. The third variable value is then used to proportionally scale each bubble or dot. Bubble charts often include an independent variable, such as years of education, a dependent variable, such as annual income, and a proportional variable, such as population. When the dots are plotted against these two axes, bubble charts communicate the strength, type, and proportion of the relationship that exists between these variables.

No alt text provided for this image


Always

  • Include a legend if more than one category of data is being visualized.
  • Ensure that smaller dots are visible when overlaping with larger dots. Either by placing smaller dots above larger dots or by making the larger dots transparent.

Never

  • Use symbols for point markers that do not have a proportionate width and height. A symbol that has an equal width and height is a more accurate way to present a given data value.
  • Include more than 3 sets of values in a static bubble chart.

Recommended

  • Use a Title that includes the unit of analysis.
  • Include point labels or markers for specific observations.

Not Recommended

  • Don’t use a bubble chart if there are an excessive number of values that result in the dots appearing illegible.

Type: Line Graph; Functions: Data over time, Patterns and when grouped used for Comparisons; Also known as: Line Chart

Line graphs, or line charts, are a simple but effective staple for representing time-series data. They are visually similar to scatterplots but represent data points separated by time intervals with segments joined by a line. This allows for quick observation of features like acceleration (when the line goes up), deceleration (when the line goes down), and volatility (when the line moves up and down erratically).

While the simple line graph shown represents a single dataset, more complex line graphs may overlay several lines to represent different data. This is useful for spotting correlations or deviation. A common example of a line graph in action is the measure of stock market behavior or resource costs over time, e.g. the price of gold over several years.

No alt text provided for this image

Always

  • Start the y-axis at zero.
  • Label each line if there is more than one line.
  • Ensure that each line is legible.

Never

  • Use a legend for a graph with a single line.

Not Recommended

  • Don’t use horizontal lines unless conveying exact amounts.

Type: Scatter Plot; Functions: Patterns, Relationships, Correlation, Distribution; Also known as: scatterplot, scatter graph, scatter chart, scattergram, scatter diagram

A scatterplot displays the relationship between two variables on an x- and y-axis. Each item of data is shown as a single point, creating the chart’s visual ‘scatter’ effect. When there are three interrelated data points (i.e., if there is a z-axis) 3D scatterplots are also possible.

Scatterplots are best used for large datasets where time is not a significant factor. For instance, a simple scatterplot might measure people’s weight against height. This would help identify any correlation between the two measures. However, because other factors affect the data (e.g., people’s weights are also related to their diet) scatterplots are best for inferring relationships between variables rather than drawing firm conclusions. Nevertheless, they are an excellent tool for hypothesis creation.

A common variant of the scatterplot is the bubble chart. Displaying different-sized circles (rather than single points), bubble charts represent three dimensions of data, rather than the usual two.

No alt text provided for this image

Always

  • Include a legend if more than one set of values is being visualized.

Never

  • Overlap labels for points.
  • Use complex symbols as point markers.
  • Include more than 2 sets of values.

Recommended

  • The title should explain the unit of analysis.
  • Include point labels or markers for specific observations.

Not Recommended

  • Do not use a scatter plot chart if there are an excessive number of overlapping values.

Type: Pie Chart; Functions: Comparative, Part to a whole, Proportions; Also known as: Circle Chart

Before using a pie chart, consider using a bar chart or displaying numeric values directly for improved usability.

Another visualization you may remember from school is the pie chart. While pie charts are similar to bar charts in that they represent categorical data, this is where the similarities end. The main difference (besides how they look) is that bar charts represent numerous categories of data, while pie charts represent a single variable, broken down into percentages or proportions.

Each ‘slice of the pie’ in a pie chart is proportional to the quantity it contributes to the whole, i.e. the entire circle. For this reason, pie charts are best suited to data that is split into about five or six categories…add more than that and it quickly becomes too complex to effectively represent the data.

No alt text provided for this image

Always

  • Include a legend or label slices directly.
  • Ensure that the values of the slices total to 100%.

Never

  • Use three-dimensional (3D) graphics as they distort the visual calculation of volume.

Not Recommended

  • Do not include large gaps between each slice, such as in an exploded pie chart.
  • Do not use multi-level pie charts as they are difficult to decipher.

Type: Stacked Bar Chart; Functions: Comparative

A stacked bar chart is a bar chart that includes subgroups of data in each bar.

The length of each bar communicates the total value of a group which is a sum of it’s subgroup values, and the length of each subgroup represents their individual values. Stacked bar charts are best used to compare data between groups and between subgroups.

Simple Stacked Bar Graphs place each value for the segment after the previous one. The total value of the bar is all the segment values added together. Ideal for comparing the total amounts across each group/segmented bar.

100% Stack Bar Graphs show the percentage-of-the-whole of each group and are plotted by the percentage of each value to the total amount in each group. This makes it easier to see the relative differences between quantities in each group.

No alt text provided for this image

Always

  • Begin the bar lengths at zero.
  • Include a legend along with labels.
  • Maintain an even amount of space between bars.

Never

  • Depict more than five subgroups.

Recommended

  • Order groups by their total value or the values of a selected subgroup.
  • Include no more than five subgroups.
  • Use a sequential or diverging color scheme if the subgroups have an order.
  • Use a qualitative color scheme if the subgroups do not have an order.

Not Recommended

  • Do not order groups by their alphabetical label.
  • Major flaw of Stacked Bar Graphs is that they become harder to read the more segments each bar has.
  • Also comparing each segment to each other is difficult, as they're not aligned on a common baseline.

Now the for the questions that were mentioned right at the beginning, let me try to align the charts against each one of those:

  1. Do you want to compare values? Charts and graphs are perfect for comparing one or many value sets, and they can easily show the low and high values in the data sets. Use these types of graphs: Bar; Pie; Line; Scatter Plot.
  2. Do you want to show the composition of something? Use this type of chart to show how individual parts make up the whole of something, like the device type used for mobile visitors to your website or total sales broken down by sales rep. Use these charts: Pie; Stacked Bar.
  3. Do you want to better understand the relationship between value sets? Relationship charts can show how one variable relates to one or many different variables. You could use this to show how something positively affects, has no effect, or negatively affects another variable. Use these charts: Scatter Plot; Bubble; Line.
  4. Do you want to analyse the trends in your data set? If you want to know more information about how a data set performed during a specific period, there are specific chart types that do extremely well. Use these charts: Line.
  5. Do you want to understand the distribution of your data? Distribution charts help you to understand outliers, the normal tendency, and the range of information in your values. Use these charts: Scatter Plot; Line; Bar.

Hoping you were able to align the questions to the type of graph that was eventually selected. Remember, you will come across many other types that are really good looking, but business is more inclined towards data and its representation, so simple is always the best visualization, don't over complicate your visualizations.

For good learning and exploring more visualization do refer to this link: https://datavizcatalogue.com/index.html

Graphs used above in the post are from: https://datavizcatalogue.com/index.html

要查看或添加评论,请登录

社区洞察

其他会员也浏览了