Quantitive Data Humanism with Pokemon
Nik Bear Brown
Quantitive Data Humanism (QDH) with Pokemon
In this project, I will explore extending the “data humanism” of Giorgia Lupi which is a reaction against the computer-generated, harsh-toned graphs that tend to diminish rather than peak interest in visualization. Giorgia Lupi describes data humanism in her manifesto Data Humanism, The Revolution will be Visualized
Image credit: giorgialupi.com
Quantitive Data Humanism (QDH)
The “noise” in the sketchy style can be used to convey important information in a graph. In particular, it naturally fits with representing uncertainty.
This article is an attempt to play with mapping the rough “sketchiness” of data humanism to common charts, like bar charts and bubble charts, where the quantitative display of uncertainty is usually not shown, often because it is not obvious how to visually display uncertainty in many charts.
Pokemon Data from Kaggle
I am using the Pokemon data from Kaggle because it is fun https://www.kaggle.com/rounakbanik/pokemon
This dataset contains information on all 802 Pokemon from all Seven Generations of Pokemon. The information contained in this dataset includes Base Stats, Performance against Other Types, Height, Weight, Classification, Egg Steps, Experience Points, Abilities, etc. The information was scraped from https://serebii.net/
Using ggrough to create a sketchy humanistic style
I will use the library ggrough to convert ggplot charts into a humanistic style. Note that this library was last updated years ago and has many bugs. It seems abandoned. It is an important library and needs to be rewritten and updated. A python version also needs to be created.
ggrough is an R package that converts your ggplot2 plots to rough/sketchy charts, using the excellent javascript roughjs library.
In this article, I will only show one example of its common use as it involves just adding a couple of lines of code after a ggplot2 graph is created. To show how it was used to create all of the graphs shown is beyond the scope of this article.
How to Install
One needs to use devtools to install ggrough
install.packages("devtools") # if you have not installed "devtools" package devtools::install_github("xvrdm/ggrough")
The other packages can be installed using install.packages.
install.packages(c("ggplot2", "dplyr", "showtext","scales"))
Playful Kids Color Scheme
I want to play with playful “kids art” style colors, but don’t want to rewrite the code in order to make changes, so I created a color swatch.
colorsKids <- c("#9B1E33", "#EEBD00", "#6A953F", "#9A6233", "#69359C", "#F7B0BE") show_col(colorsKids, labels = F, borders = NA)
Loading the Pokemon Data
The Pokemon data has a nice mix of categorical, continuous, and Boolean data so many standard charts can be created. I will extend this data set with some time-series, uncertainty information, and image data in the future to create a standard data set for all kinds of QDH visualization.
set.seed(5) pokemon <- read.csv("data/Pokemon.csv") pokemon %>% head(5) ## X. Name Type.1 Type.2 Total HP Attack Defense Sp..Atk ## 1 1 Bulbasaur Grass Poison 318 45 49 49 65 ## 2 2 Ivysaur Grass Poison 405 60 62 63 80 ## 3 3 Venusaur Grass Poison 525 80 82 83 100 ## 4 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 ## 5 4 Charmander Fire 309 39 52 43 60 ## Sp..Def Speed Generation Legendary ## 1 65 45 1 False ## 2 80 60 1 False ## 3 100 80 1 False ## 4 120 80 1 False ## 5 50 65 1 False
A Simple Histogram Example
I will use a histogram to show how a simple computer-generated, harsh-toned graph can be converted into a playful humanistic style where the roughness and sketchiness of the graph are quantitatively determined by parameters. From here on when I use code to create humanistic graphs whose roughness and sketchiness are determined by parameters I will refer to this as Quantitive Data Humanism or QDH.
# Classic ggplot part g1<- pokemon %>% ggplot(aes(x = Attack)) + geom_histogram(bins = 22, fill = colorsKids[5], alpha = 0.6 , color = "grey35") + ggtitle("Pokemon Attack Distribution") + xlab("Pokemon Attack") + ylab("Pokemon Attack Count") g1
This graph shows that nearly all Pokemon have HP around the mean but there are some with massive HP
Handwritten Typefaces Using showtext and Google Fonts
From a design perspective using handwritten typefaces with sketchy graphs tends to work better so we will show how to do this. For this, we will use Google fonts.
To use Google fonts, try the fantastic showtext package.
## Loading Google fonts (https://fonts.google.com/) font_add_google("Gochi Hand", "gochi") ## Automatically use showtext to render text showtext_auto() # Classic ggplot part g2 <- pokemon %>% ggplot(aes(x = Attack)) + geom_histogram(bins = 22, fill = colorsKids[5], alpha = 0.6 , color = "grey35") + ggtitle("Pokemon Attack Distribution") + xlab("Pokemon Attack") + ylab("Pokemon Attack Count") + theme( plot.title = element_text(family = "gochi", size=14, face="bold.italic"), axis.title.x = element_text(family = "gochi", size=14, face="bold"), axis.title.y = element_text(family = "gochi", size=14, face="bold") ) g2
Handwritten Fonts already add a fun aspect to a graph
Sketchy Style with ggRough
Note that ggrough is a broken library with many bugs that need to be rewritten. In theory, and after the library is rewritten, it will work as below. In practice, generating all of the graphs shown in this article is more complicated and beyond the scope of this article. After this only the ggplot2 portion of the Pokemon graph code will be shown.
# ggRough part options <- list( Background=list(roughness=4), GeomCol=list(fill_style="hachure",fill_weight=2, angle=60,angle_noise= 0.1 , bowing=0, roughness=6)) get_rough_chart(g2, options)
A couple lines of code can add a humanistic style to a graph
I will use the following parameters to extend ggpolt plots with varying degrees of sketchiness.
Fill Style — fill_style
Categorical variable with the following values: solid, hachure, cross, hatch, zigzag, and dots.
The default is solid.
Fill Weight — fill_weight
A continuous positive variable that reflects how densely a color of applied. Think of one coat of paint rather than 5 coats of paint.
The default is 4.
Roughness — roughness
A continuous positive variable that reflects how rough the element should be.
The default is 1.5.
Bowing — bowing
A continuous positive variable that reflects how much the axis are bowed.
The default is 1.
Gap — gap
A continuous positive variable that reflects the gap between each hachure line.
The default is 6.
Gap Noise — gap_noise
A percentage of noise to apply on the gap value. Use a value between 0 and 1. A gap_noise of 1 means that deviation up to 2 * gap are allowed.
The default is 0.
Angle — angle
The angle in degrees of the hachure lines ranging from 0 and 360 degrees.
Angle Noise — angle_noise
The angle_noise is a value between 0 and 1, equivalent to the percentage of possible deviation from the set angle. An angle_noise of 1 means that deviation up to 90° are allowed.
The default is 0.
Gallery
For the rest of this article, I will show the effect of the parameters on common EDA graphs and not the code details of how the graphs were created. The toolset needs to be updated, refactored, and rebuilt in order to make it easy for data visualization in R and Python to manipulate the following parameters: fill_style, fill_weight, roughness, bowing, gap, gap_noise, angle, and angle_noise. Potentially many other parameters could be added. In 1984, Cleveland and his colleague Robert McGill published the seminal paper that created a general hierarchy for the types of data people most accurately understand:
- Position along a common scale (bar chart, dot plots)
- Positions along nonaligned, identical scales (small multiples)
- Length, direction, angle (pie chart)
- Area (treemap)
- Volume, curvature (3-D bar charts, area charts)
- Shading, color saturation (heat maps, choropleth maps)
Creating QDH libraries for R and python will allow designers much more control over shading, color saturation, angle, etc.. This would allow data visualizers to more easily explore which types of graphical elements convey the most information to humans.
Conclusion
In this article, I explored the idea of extending the “data humanism” of Giorgia Lupi by using the roughness and sketchiness to quantitively display additional information. I call this approach Quantitive Data Humanism or QDH. The “noise” in the sketchy style can be used to convey important information in a graph. In particular, it naturally fits with conveying uncertainty. QDH has the benefit of not only diminishing the stylistic issues with computer-generated, harsh-toned graphs, it has the potential to add many additional parameters that can reflect statistical properties in the underlying data.
Design Notes
I’d like this to be a Medium and arXiv article in the not too distant future. The design is intentionally kept simple so it can essentially be cut and pasted into medium.com and fit seamlessly and easily adapted to the https://arxiv.org/ LaTeX format.
Helping Businesses Recruit & Hire the Best Global Talent – "If It Can Be Done Remotely, It Can Be Done Globally"
4 周Thanks for sharing Nik, just followed!
Global Chief Marketing & Growth Officer, Exec BOD Member, Investor, Futurist | AI, GenAI, Identity Security, Web3 | Top 100 CMO Forbes, Top 50 Digital /CXO, Top 10 CMO | Consulting Producer Netflix | Speaker
4 周Nik, thanks for sharing!
Prabhu Subramanian Please give me 10 minutes this Thursday to discuss this idea with the Gauguin Project group.
Ai Skunks. I'm thinking of incorporating this idea into the Gauguin project. Kartik Kumar, Abhishek Dabas, Abhishek Maheshwarappa, Jugal Joshi, Shruti Patil, Prabhu Subramanian, Ruisi Gu, Prashanthi M.atam, Nikunj Lad, Reema Yadav