Quantitive Data Humanism with Pokemon

Nik Bear Brown

Quantitive Data Humanism (QDH) with Pokemon

In this project, I will explore extending the “data humanism” of Giorgia Lupi which is a reaction against the computer-generated, harsh-toned graphs that tend to diminish rather than peak interest in visualization. Giorgia Lupi describes data humanism in her manifesto Data Humanism, The Revolution will be Visualized

No alt text provided for this image

Image credit: giorgialupi.com

Quantitive Data Humanism (QDH)

The “noise” in the sketchy style can be used to convey important information in a graph. In particular, it naturally fits with representing uncertainty.

This article is an attempt to play with mapping the rough “sketchiness” of data humanism to common charts, like bar charts and bubble charts, where the quantitative display of uncertainty is usually not shown, often because it is not obvious how to visually display uncertainty in many charts.

No alt text provided for this image

Pokemon Data from Kaggle

I am using the Pokemon data from Kaggle because it is fun https://www.kaggle.com/rounakbanik/pokemon

This dataset contains information on all 802 Pokemon from all Seven Generations of Pokemon. The information contained in this dataset includes Base Stats, Performance against Other Types, Height, Weight, Classification, Egg Steps, Experience Points, Abilities, etc. The information was scraped from https://serebii.net/

Using ggrough to create a sketchy humanistic style

I will use the library ggrough to convert ggplot charts into a humanistic style. Note that this library was last updated years ago and has many bugs. It seems abandoned. It is an important library and needs to be rewritten and updated. A python version also needs to be created.

ggrough is an R package that converts your ggplot2 plots to rough/sketchy charts, using the excellent javascript roughjs library.

In this article, I will only show one example of its common use as it involves just adding a couple of lines of code after a ggplot2 graph is created. To show how it was used to create all of the graphs shown is beyond the scope of this article.

How to Install

One needs to use devtools to install ggrough

install.packages("devtools") # if you have not installed "devtools" package
devtools::install_github("xvrdm/ggrough")

The other packages can be installed using install.packages.

install.packages(c("ggplot2", "dplyr", "showtext","scales"))

Playful Kids Color Scheme

I want to play with playful “kids art” style colors, but don’t want to rewrite the code in order to make changes, so I created a color swatch.

colorsKids <- c("#9B1E33", "#EEBD00", "#6A953F", "#9A6233", "#69359C", "#F7B0BE")
show_col(colorsKids, labels = F, borders = NA)
No alt text provided for this image

Loading the Pokemon Data

The Pokemon data has a nice mix of categorical, continuous, and Boolean data so many standard charts can be created. I will extend this data set with some time-series, uncertainty information, and image data in the future to create a standard data set for all kinds of QDH visualization.

set.seed(5)
pokemon <- read.csv("data/Pokemon.csv")

pokemon %>% head(5)
##   X.                  Name Type.1 Type.2 Total HP Attack Defense Sp..Atk
## 1  1             Bulbasaur  Grass Poison   318 45     49      49      65
## 2  2               Ivysaur  Grass Poison   405 60     62      63      80
## 3  3              Venusaur  Grass Poison   525 80     82      83     100
## 4  3 VenusaurMega Venusaur  Grass Poison   625 80    100     123     122
## 5  4            Charmander   Fire          309 39     52      43      60
##   Sp..Def Speed Generation Legendary
## 1      65    45          1     False
## 2      80    60          1     False
## 3     100    80          1     False
## 4     120    80          1     False
## 5      50    65          1     False

A Simple Histogram Example

I will use a histogram to show how a simple computer-generated, harsh-toned graph can be converted into a playful humanistic style where the roughness and sketchiness of the graph are quantitatively determined by parameters. From here on when I use code to create humanistic graphs whose roughness and sketchiness are determined by parameters I will refer to this as Quantitive Data Humanism or QDH.

# Classic ggplot part
g1<- pokemon %>% 
  ggplot(aes(x = Attack)) +
  geom_histogram(bins = 22, fill = colorsKids[5], alpha = 
0.6
, color = "grey35") +
  ggtitle("Pokemon Attack Distribution") +
  xlab("Pokemon Attack") +  
  ylab("Pokemon Attack Count")   
g1
No alt text provided for this image

This graph shows that nearly all Pokemon have HP around the mean but there are some with massive HP

Handwritten Typefaces Using showtext and Google Fonts

From a design perspective using handwritten typefaces with sketchy graphs tends to work better so we will show how to do this. For this, we will use Google fonts.

To use Google fonts, try the fantastic showtext package.

## Loading Google fonts (https://fonts.google.com/)
font_add_google("Gochi Hand", "gochi")
## Automatically use showtext to render text
showtext_auto()
# Classic ggplot part
g2 <- pokemon %>% 
  ggplot(aes(x = Attack)) +
  geom_histogram(bins = 22, fill = colorsKids[5], alpha = 
0.6
, color = "grey35") +
  ggtitle("Pokemon Attack Distribution") +
  xlab("Pokemon Attack") +  
  ylab("Pokemon Attack Count")  +
 theme(
plot.title = element_text(family = "gochi", size=14, face="bold.italic"),
axis.title.x = element_text(family = "gochi", size=14, face="bold"),
axis.title.y = element_text(family = "gochi", size=14, face="bold")
)  
g2
No alt text provided for this image

Handwritten Fonts already add a fun aspect to a graph

Sketchy Style with ggRough

Note that ggrough is a broken library with many bugs that need to be rewritten. In theory, and after the library is rewritten, it will work as below. In practice, generating all of the graphs shown in this article is more complicated and beyond the scope of this article. After this only the ggplot2 portion of the Pokemon graph code will be shown.

# ggRough part
options <- list(
  Background=list(roughness=4),
  GeomCol=list(fill_style="hachure",fill_weight=2, angle=60,angle_noise=
0.1
, bowing=0, roughness=6))


get_rough_chart(g2, options)

A couple lines of code can add a humanistic style to a graph

No alt text provided for this image

I will use the following parameters to extend ggpolt plots with varying degrees of sketchiness.

Fill Style — fill_style

Categorical variable with the following values: solidhachurecross, hatchzigzag, and dots.

The default is solid.

Fill Weight — fill_weight

A continuous positive variable that reflects how densely a color of applied. Think of one coat of paint rather than 5 coats of paint.

The default is 4.

Roughness — roughness

A continuous positive variable that reflects how rough the element should be.

The default is 1.5.

Bowing — bowing

A continuous positive variable that reflects how much the axis are bowed.

The default is 1.

Gap — gap

A continuous positive variable that reflects the gap between each hachure line.

The default is 6.

Gap Noise — gap_noise

A percentage of noise to apply on the gap value. Use a value between 0 and 1. A gap_noise of 1 means that deviation up to 2 * gap are allowed.

The default is 0.

Angle — angle

The angle in degrees of the hachure lines ranging from 0 and 360 degrees.

Angle Noise — angle_noise

The angle_noise is a value between 0 and 1, equivalent to the percentage of possible deviation from the set angle. An angle_noise of 1 means that deviation up to 90° are allowed.

The default is 0.

No alt text provided for this image

Gallery

For the rest of this article, I will show the effect of the parameters on common EDA graphs and not the code details of how the graphs were created. The toolset needs to be updated, refactored, and rebuilt in order to make it easy for data visualization in R and Python to manipulate the following parameters: fill_style, fill_weight, roughness, bowing, gap, gap_noise, angle, and angle_noise. Potentially many other parameters could be added. In 1984, Cleveland and his colleague Robert McGill published the seminal paper that created a general hierarchy for the types of data people most accurately understand:

  • Position along a common scale (bar chart, dot plots)
  • Positions along nonaligned, identical scales (small multiples)
  • Length, direction, angle (pie chart)
  • Area (treemap)
  • Volume, curvature (3-D bar charts, area charts)
  • Shading, color saturation (heat maps, choropleth maps)

Creating QDH libraries for R and python will allow designers much more control over shading, color saturation, angle, etc.. This would allow data visualizers to more easily explore which types of graphical elements convey the most information to humans.

See Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods William S. Cleveland and Robert McGill

No alt text provided for this image

Conclusion

In this article, I explored the idea of extending the “data humanism” of Giorgia Lupi by using the roughness and sketchiness to quantitively display additional information. I call this approach Quantitive Data Humanism or QDH. The “noise” in the sketchy style can be used to convey important information in a graph. In particular, it naturally fits with conveying uncertainty. QDH has the benefit of not only diminishing the stylistic issues with computer-generated, harsh-toned graphs, it has the potential to add many additional parameters that can reflect statistical properties in the underlying data.

Design Notes

I’d like this to be a Medium and arXiv article in the not too distant future. The design is intentionally kept simple so it can essentially be cut and pasted into medium.com and fit seamlessly and easily adapted to the https://arxiv.org/ LaTeX format.

Nick Esquivel

Helping Businesses Recruit & Hire the Best Global Talent – "If It Can Be Done Remotely, It Can Be Done Globally"

4 周

Thanks for sharing Nik, just followed!

回复
Hope Frank

Global Chief Marketing & Growth Officer, Exec BOD Member, Investor, Futurist | AI, GenAI, Identity Security, Web3 | Top 100 CMO Forbes, Top 50 Digital /CXO, Top 10 CMO | Consulting Producer Netflix | Speaker

4 周

Nik, thanks for sharing!

回复

Prabhu Subramanian Please give me 10 minutes this Thursday to discuss this idea with the Gauguin Project group.

Ai Skunks. I'm thinking of incorporating this idea into the Gauguin project. Kartik Kumar, Abhishek Dabas, Abhishek Maheshwarappa, Jugal Joshi, Shruti Patil, Prabhu Subramanian, Ruisi Gu, Prashanthi M.atam, Nikunj Lad, Reema Yadav

要查看或添加评论,请登录

社区洞察

其他会员也浏览了