How to Visualise a Data Frame as a Heat Map in Python + R
The data science job to be done today is to visualise the monthly company registrations with the Australian Securities and Investment Commission using data downloaded from the ASIC company register . Data visualisation is one of the key skills for all data scientists to master no matter what language you code in. The code used to create visualisations in Python and R is generally quite different.
Let's first tackle the job of creating a heat map in R before we look as the very different way of performing a similar job using pandas in Python. A common task in both approaches is to divide the data in bins of years and month and to count the number of registrations in each. In R we will again leverage the timetk package and the summarise_by_time() function. This really is a handy function to have in your toolkit.
The above code generates the following plot. Creating heat maps in R is best performed using the ggplot package to achieve the desired effects.
To perform the same task in Python I have used a Jupyter Lab notebook because the pandas style formatter was not rendering for me using RMarkdown in RStudio.
领英推荐
The pandas code I have used to create a heat map is quite different thanks to the style accessor for data frames. There is no need to call on Matplotlib to complete this job thanks to the background_gradient() method. The main trick for the Python solution was to create the necessary custom data types and date type.
And finally here is the output from pandas using the background_gradient method with the minimal drawback of a lack of the usual graphical features of a title.
When I first generated this plot I was quite surprised by the increase in the number of monthly new company registrations in Australia over the last 20 years. It is apparent that new companies are most likely to be registered in June which makes sense given the end of the financial year in Australia is also in June. What stuck out most, however, was that 2021 was the biggest year to date in Australia's history for new company registrations. Possibly an effect of a shift in the workforce to more working from home and considering other means of earning an income since the start of the Covid pandemic in 2020. In each month in 2022 there has been a noticeable decrease in monthly company registrations which is most evident from April onwards. I expect accounting and legal firms have been enjoyed the surge in new company registrations.
As we continue to hone our data science skills it becomes essential to find new ways to maintain your interest and focus. Using toy datasets is extremely helpful but don't underestimate the motivation that you can derive from exploring datasets that you are curious about. To continue on your journey to data science mastery in the two biggest data science languages subscribe to the weekly Data Science Code in Python + R newsletter. In just 5 minutes every week you can receive a little reminder of what you might already know and what you might learn in Data Science.