Loading Bank of England Base Rate to BigQuery

Loading Bank of England Base Rate to BigQuery

Data engineering is an integral part of the data lifecycle, playing a vital role in obtaining accurate and timely insights. In this post, I will explore the significance of data engineering by examining the example of UK average house prices and the Bank of England base rate (all the rage these days!).

Data engineering involves the process of acquiring data from diverse sources and consolidating it into a unified format. For our example, consider the average house prices in the UK and the Bank of England base rate. How do we get that data and load it into BigQuery? Well, usually we would find some kind of API or dataset readily available, but in this case BOE doesn't really provide a suitable API for historical data, so we would have to mock up a simple parser in Vertex AI managed notebooks.

I was using BeautifulSoup to parse the page and get the data from the table firstly into Pandas DataFrame and then into CSV for BQ load (so that data types can be inferred automatically, because setting DataFrame data types and converting them to BQ schema for DF load is a more challenging task than just dumping DF to CSV).

So I ended up with this notebook for the base rate extraction from 1975 till 2023.

No alt text provided for this image
https://github.com/Vadoid/VertexWorkbench

Now, upon running this simple notebook we get all historical base rate changes in BQ table. That simple!

No alt text provided for this image

We can immediately explore data with Looker Studio and build a graph like this:

No alt text provided for this image

Which (unfortunately) tells us that the current rate of 4.5% is on average still one of the lowest rates historically.

But how does the average house prices look in relation to base rate? Did base rate fluctuations historically dent average house prices in the UK? I don't know.

For that we need to get the data for average UK house prices from the Land Registry, using the similar methodology as before. The only difference - instead of parsing web page we are going to get CSV file (thanks Land Registry!).

Here's a much shorter notebook:

No alt text provided for this image
https://github.com/Vadoid/VertexWorkbench


Let's run it and get the data in the BQ table immediately:

No alt text provided for this image

The data can be easily and quickly plotted in Looker Studio the same way:

No alt text provided for this image

Now, we can blend two datasets together, report them side by side, or use the combo chart, doesn't matter. I usually prefer to look at the charts separately on the same page as it gives me more control.

But looking at the data this way allows us to see that there isn't much effect that a base rate has on the average house price growth trajectory (except for the few outliers). For instance, the last time the rate was at 4.5% (August 2005) the average house price in the UK was 159,431 GBP compared to 285,009 GBP today.

No alt text provided for this image

What else can we do? We can acquire inflation rates similarly and compare three different metrics. Now that you know the easiest way of acquiring data from various sources that do not provide easily accessible APIs, nothing stops you from experimenting.

If I wanted to productionalise this, I'd create a Dataflow pipeline[s] to gather, transform and write data to BigQuery which would have been a more robust method of acquiring data.

Thoughts?

要查看或添加评论,请登录

Vadim Zaripov的更多文章

社区洞察

其他会员也浏览了