Data Visualization of Nobel Prizes by Age using R in Kaggle

Data Visualization of Nobel Prizes by Age using R in Kaggle

In my previous article we have seen the data of Nobel Prizes by Gender, Countries and Categories. However we did not check the average age of the people awarded the Nobel prize. In the present article we will focus on the analyzing the same data by age.

In order to analyze the data by age we need to get the Nobel prize data and derive the age. Before that we need to call all the required libraries and then read the Nobel prize data from csv file.

library(readr)
library(sqldf)
library(dplyr)
library(lubridate)
install.packages("ggstatsplot")
library(ggstatsplot)
nobel <- read_csv('../input/nobel-prize/complete.csv')
nobel

Nobel Prize data

Here in the nobel prize data we have the following variables like date when the prize was awarded (dateAwarded) and date when the recipient was born (birth_date). Then the age is derived by derivation (dateAwarded-birth_date)/365.25, however in the data we see that for some of the subjects the dateAwarded is missing but the year when award was given is available, so I derived the dateAwarded assuming that every year in the month of October the award is given, so imputed the missing dateAwarded using 1st of October. The conclusion of taking the October as month of award is decided by checking the data over the years and months as displayed in the Graph 1.

nobel2 <- nobel %>% select(awardYear,dateAwarded,category,gender,birth_continent,birth_country,birth_date,givenName)


nobel3 <- nobel2 %>% mutate(newdate=ifelse(is.na(dateAwarded),paste0(awardYear,'-10-01'),as.character(dateAwarded)),newdate2=as.Date(newdate), age2=ifelse(!is.na(birth_date),(newdate2-birth_date)/365.25,birth_date),age=round(age2,digits=1))


maxage2 <- nobel3 %>% left_join(.,maxage,by=c('category','age')) %>% mutate(outlier=ifelse(flag==1,paste0(givenName,' ','Age=',age),flag),month=month(newdate2))


ggplot(maxage2 , aes(x=awardYear,y=month)) + geom_bar(stat="identity", width=.5, position = "dodge") +
 labs(title = "Month of every year when the Nobel Prizes are given out") +
  xlab("Year") +
  ylab("Month")


Graph 1

No alt text provided for this image

Once the age is derived as per the code above, we can use the age variable to generate the graphs. The best way to identify the average age of the recipients when they actually received the award by category is boxplot. Please find below the boxplot with average age by Categories. Also we can see I annotated the outliers using geom_label(), as a input dataset for the same, I used the source dataset and flagged the outliers in outlier variable which I used to label. The overall average age is ranging between 45 - 75.

maxage <- nobel3 %>% filter(!is.na(age)) %>% group_by(category) %>% arrange(category,age) %>% slice(c(1,n())) %>% mutate(flag=1) %>% select(flag,category,age)
maxage
maxage2 <- nobel3 %>% left_join(.,maxage,by=c('category','age')) %>% mutate(outlier=ifelse(flag==1,paste0(givenName,' ','Age=',age),flag),month=month(newdate2))

maxage2

ggplot(maxage2,aes(x=category,y=age,fill=category)) +  geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
 labs(title = "Average Age of Nobel Prize Recipients by Categories") +
  xlab("Category") +
  ylab("Age") +
theme(legend.position='none',axis.text.x = element_text(angle = 45,hjust = 1)) +  
geom_label(data=maxage2 %>% filter(flag==1 & category %in% c('Economic Sciences', 'Chemistry', 'Peace','Physics') & age>90 | age <20), aes(label=outlier),size=2)

Graph 2

No alt text provided for this image

Now lets see the average age of recipients by gender. Both male and female average age seems to be same, but for females it is 40-75 and males it is 45-75.

ggplot(maxage2 %>% filter(!is.na(gender)),aes(x=category,y=age,fill=category)) +  geom_boxplot(outlier.colour="red", outlier.shape=7,outlier.size=2) +
 labs(title = "Age wise Nobel Prizes by Category") +
  xlab("Category") +
  ylab("Age") +
theme(legend.position = "none",axis.text.x = element_text(angle = 45,hjust = 1)) +  
geom_label(data=maxage2 %>% filter(flag==1 & category %in% c('Economic Sciences', 'Chemistry', 'Peace','Physics') & age>90 | age <20), aes(label=outlier),size=2) +
facet_wrap(~gender) 

Graph 3

No alt text provided for this image

Note that I am not checking if the Recipient was alive when they received the award.

We can also confirm the average age of Nobel prize recipients by scatter plot as well as below, with scatter plot when compared with boxplot the average age is much clear it appears as 35-85

ggplot(maxage2, aes(x = age, y = category,col=category)) +
    geom_point(position='jitter') +
xlab('Age') +
ylab('Categories') +
guides(alpha=FALSE) + scale_x_continuous(breaks=seq(15,100,by=5)) + 
theme(legend.position='bottom',legend.box = "vertical",axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1)) +
scale_color_discrete(name="Country")  +
 labs(title = "Scatter Plot of Nobel Prize Categories by Age") 

Graph 4

No alt text provided for this image

We will also check by gender, the average age is same in both. But for males the data is more dense due to more awards compared to females.

ggplot(maxage2 %>% filter(!is.na(gender)), aes(x = age, y = category,col=category)) +
    geom_point(position='jitter') +
xlab('Age') +
ylab('Categories') +
guides(alpha=FALSE) + scale_x_continuous(breaks=seq(15,100,by=10)) + 
theme(legend.position='bottom',legend.box = "vertical",axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1)) +
scale_color_discrete(name="Country")  +
 labs(title = "Scatter Plot of Nobel Prize Categories by Age and Gender") +
facet_wrap(~gender) 

Graph 5

No alt text provided for this image

While writing this article I learned few new things in R, of how we can convert the numeric date to character date by as.character(), convert the character date to numeric date by as.Date(), use of ifelse() in mutate(), paste0() to concatenate the character variables, rounding of numeric variables to required digits while deriving the age variable round(x,digits=.), also the annotation in graphs to display the outliers.

P.S. The opinions and views expressed here are mine and not of anyone else's.

要查看或添加评论,请登录

Jagadish K.的更多文章

社区洞察

其他会员也浏览了