Data Visualization the topping cream of Data Science

Data Visualization the topping cream of Data Science

The Importance of Data visualization

This post is to encourage and show what is the important of Data visualization. There are some of people who working in Big Data and Data science Fields is Missing Data visualization.

Data visualization is not a new science it some sort of art of science. It involves the creation and study of the visual representation of data, meaning information that has been abstracted in some schematic form, including attributes or variables for the units of information. Effective visualization helps users in analyzing and reasoning about data and evidence.

I will give you two practical examples to show that without data visualization we couldn't do the work “efficient “.

Example from Big Data

If any of you read the hadoop the definitive guide 4th edition there are a common example Tim White work on it in all the chapters it is about analysis on Weather data. The author need to partition the data weather into classes For example.

From -10 C to 0 Group A

From 0 to +10 Group B

From 10 to 20 Group C

But he found that the data is not uniform distributed and he wanted to divide the data fairly between reducers. The first thing he did is to make a Job to visualize the data distribution without this action he wasn’t be able to divide the data fairly the figure below show the data distribution. Without visualize the data we will not able to divided the data fairly because 59% from the data >=10 so the reducer that take this data will consume more time and affect on the job overall.

Example from Machine Learning

All people who working with machine learning know very well visualize the data is the important part to choose the “best” model or algorithm we will work with for example if you have data that come from two classes A and B after visualizing the data you see that the magic happen and there are separated or the distribution of them is well define so the other work you will do is the easy part. For example the figure below

Losses suffered by Napoleon's army

Final Example the Minard diagram shows the losses suffered by Napoleon's army in the 1812-1813 periods. Six variables are plotted: the size of the army, its location on a two-dimensional surface (x and y), time, direction of movement, and temperature. The line width illustrates a comparison (size of the army at points in time) while the temperature axis suggests a cause of the change in army size. This multivariate display on a two dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn." I attached the photo below

I can’t finish this post without refer to Professor Edward Tufte and his book “The Visual Display of Quantitative Information”  our professor Dr.Waleed Yousef named him Leonardo da Vinci of the data visualization

Finally visualize the data allowing you to get sense to what you are working with or what is the distribution of the data you will act with.

Ref:  https://shop.oreilly.com/product/0636920033448.do

https://en.wikipedia.org/wiki/Data_visualization

 

 

Alaa Abdelmonem

Technical Lead | Senior Software Engineer (Backend | PHP | Laravel | DevOps | AWS | CI/CD)

9 年

very good

回复

要查看或添加评论,请登录

Moustafa Mahmoud的更多文章

社区洞察

其他会员也浏览了