Data visualization demystified
Jack Lampka ??
AI keynote speaker | Advisor | Executive sparring partner | 27 years’ data & AI experience
Data visualization is nothing more and nothing less than representing data graphically for the human brain to process it quickly and efficiently. It started with cave drawings representing the location and number of mammoths for fellow tribe members and continued with building instructions for the Egyptian pyramids drawn on papyrus, one of the first portable media. A couple of thousands years later, a classic example of data visualization is the following drawing by Charles Joseph Minard describing the fate of Napoleon’s army in Russia, which depicts time, location, direction, temperature, and size of the army. According to Edward Tufte, the data visualization guru, this “may well be the best statistical graphic ever drawn”.
Today’s data visualization became such a hyped topic, almost as hyped as “big data”, where it’s easy to forget that data visualization starts with enabling humans to comprehend large amount of data in a short amount of time. It also represents the responsibility of business intelligence experts to visualize findings that facilitate sound business decisions. Data visualization is at the beginning as well as the end of the analytics value chain.
Any data analysis starts with gathering, cleaning, and comprehending data. Although that comprehension can be supported by machines through pattern recognition for example, many times a human is required to identify patterns or outliers from the business context perspective.
The amount of data should drive how the data is visualized. The three years of product sales for one category in example 1 below can be easily grasped in the tabular format and the corresponding chart is not required to comprehend the information. However, how about example 2? With 40 data points, which data representation allows for a quick comprehension of the unusual trend for category C in the years 9 and 10?
With hundreds, thousands, or even millions and more of data points, data visualization becomes imperative for data comprehension. This can be as simple as using Excel pivot charts that allow for quick manipulation to represent the same data from different angles. This is especially useful if the information behind that data is relatively well known. For data sets that are new to the analyst, more powerful visualization tools such as Tableau or Qlik Sense may be more effective allowing for a quick insight discovery through multiple data views.
While data visualization is needed to identify insight at the beginning of the analytics value chain, it’s even more crucial at the end when it comes to representing analysis results to guide business decisions. Since graphs can distort reality, whether intentionally or unintentionally, the business analyst needs to identify a visualization approach that will drive the right business decisions.
The following two charts represent the number of people by income tier in Russia and China in 2011 according to the Pew Research Center. Let’s say you would like to introduce a product that is targeted to middle or higher income tiers. Based on the chart on the left you would focus on Russia where 80% of the population falls into those income tiers, and you would probably ignore China where only about 20% of the population meets that criterion. However, based on the chart on the right you would realize that China has more than twice of the targeted population than Russia. Both visuals represent the same data, but will drive different business decisions.
Data visualization is not the grand solution to all business needs as the business news coverage would imply. And it didn’t just emerge with “big data”. Data visualization can be as simple as, literally, connecting the dots to draw a line to visualize a trend. And since it has the potential to distort reality, data visualization needs to be handled with caution by professionals who understand the business context.