How to Visualise a Data Frame as a Heat Map in Python + R
Raw Data Source: ASIC Company Register data.gov.au

How to Visualise a Data Frame as a Heat Map in Python + R

The data science job to be done today is to visualise the monthly company registrations with the Australian Securities and Investment Commission using data downloaded from the ASIC company register . Data visualisation is one of the key skills for all data scientists to master no matter what language you code in. The code used to create visualisations in Python and R is generally quite different.

Let's first tackle the job of creating a heat map in R before we look as the very different way of performing a similar job using pandas in Python. A common task in both approaches is to divide the data in bins of years and month and to count the number of registrations in each. In R we will again leverage the timetk package and the summarise_by_time() function. This really is a handy function to have in your toolkit.

Create a heatmap of monthly new company registrations in Australia in R

The above code generates the following plot. Creating heat maps in R is best performed using the ggplot package to achieve the desired effects.

Plot of new company registrations from R

To perform the same task in Python I have used a Jupyter Lab notebook because the pandas style formatter was not rendering for me using RMarkdown in RStudio.

The pandas code I have used to create a heat map is quite different thanks to the style accessor for data frames. There is no need to call on Matplotlib to complete this job thanks to the background_gradient() method. The main trick for the Python solution was to create the necessary custom data types and date type.

Code to plot a heat map from a data frame in Python

And finally here is the output from pandas using the background_gradient method with the minimal drawback of a lack of the usual graphical features of a title.

Heat map of new company registrations in Australia by month using Python

When I first generated this plot I was quite surprised by the increase in the number of monthly new company registrations in Australia over the last 20 years. It is apparent that new companies are most likely to be registered in June which makes sense given the end of the financial year in Australia is also in June. What stuck out most, however, was that 2021 was the biggest year to date in Australia's history for new company registrations. Possibly an effect of a shift in the workforce to more working from home and considering other means of earning an income since the start of the Covid pandemic in 2020. In each month in 2022 there has been a noticeable decrease in monthly company registrations which is most evident from April onwards. I expect accounting and legal firms have been enjoyed the surge in new company registrations.

As we continue to hone our data science skills it becomes essential to find new ways to maintain your interest and focus. Using toy datasets is extremely helpful but don't underestimate the motivation that you can derive from exploring datasets that you are curious about. To continue on your journey to data science mastery in the two biggest data science languages subscribe to the weekly Data Science Code in Python + R newsletter. In just 5 minutes every week you can receive a little reminder of what you might already know and what you might learn in Data Science.

要查看或添加评论,请登录

Matt Rosinski的更多文章

社区洞察

其他会员也浏览了