Exploring Singapore's Open Data to find Market Trends: Insights from Data.Gov.SG and a Case Study of Pandemic Impacts

Exploring Singapore's Open Data to find Market Trends: Insights from Data.Gov.SG and a Case Study of Pandemic Impacts

Recently, I attended an insightful event Data.Gov.SG (GovTech) X DataScience SG hosted by DataScience SG meetup group where Kuo Ting, Cliff Chew presented an inspiring talk and demo on the potential of data.gov.sg's portal and how this wealth of information can be leveraged to learn, do analysis or discover valuable insights. Cliff Chew presented a compelling public housing case study driven by recent trends in million-dollar home transaction from the HDB resale data. His analysis revealed how media coverage can sometimes obscure the true picture. However when you actual take a "Look", you realize that the data is telling a different story. For a detailed look at Cliff Chew’s case study, visit his LinkedIn post here.

Inspired by the session, I set out to play around with the data myself. I had used URA private homes rentals data in the past for some personal project which ended up on a PowerBI report however it never made it into an article. I plan to publish that project in the future.

The Open Data Portal i.e. Data.gov.sg from SG government has 4000+ datasets from 70 different agencies and many of them are real time APIs. After browsing through many datasets, I finally chose the Job Vacancy data. My goal was to find trends in Job vacancies in Singapore over the last few years. I chose Python and Colab as my working environment. I went with Colab since it is free and provides multiple compute environments like CPUs and GPUs and is easier to get started with. Now let's dive into the code, I will provide the notebook link here and at the end of the article as well. Readers are encouraged to download and play around. You will need API Key to use the Gemini 1.5 which at the time of writing this is free so go ahead to aistudio.google.com/app/apikey and get your own key.

I am dividing the notebook into multiple sections and here I am giving a very high level overview of the code. Let me know if you want to know any details and I can walk through them in detail.

  1. Get Data and Prepare Data

The best part of data.gov.sg is that you can get the data with a simple API call without any key or authentication. It has the capability to page through the dataset when you download it. So in the first cell, I loop through the dataset while there are more rows to download. Once done, I construct a pandas dataframe and finally convert columns into proper data types and also fill the missing values with appropriate values. Once the dataframe is set then try to “Diagnose” and “Describe” the trends inherent in the data.

#@title Get Job Vacancy Data from Data.Gov.Sg
import requests
from urllib.parse import urlparse, parse_qs
resource_id='d_f3ce4cb8ec1910a4d4699f0e2ecaa21a'
recordsjson=[]
base_url = "https://data.gov.sg"
url = base_url + "/api/action/datastore_search?resource_id="+resource_id+"&limit=1000"
hasnext=True


while hasnext:
 response = requests.get(url)
 recordsjson+=response.json()['result']['records']
 url=base_url+response.json()['result']['_links']['next']
 parsed = urlparse(url)
 offset = parse_qs(parsed.query)['offset'][0]
 total=(response.json()['result']['total'])
 if int(offset)>total:
   hasnext=False        

  1. Diagnostic Analysis

In this code section, I tried to explore the data across multiple dimensions trying to Slice and Dice the data to find the trends. Once you plot the processed data on the graph it shows job vacancy trends across different industries from 2020 to 2024, highlighting notable spikes and trends.

Job Vacancies over the years for different Industries

A startling revelation to me was that the top 3 sectors which had most Job Vacancies during 2020-24 were Community, social and personal services, Financial and Insurance services and Information and communications. The Community, Social and Personal services industry include Health Workers among the other occupations and while it had a huge spike during 2006-2007 but showed a constant growth after that till 2020. After that there was a huge spike followed by 2 small peaks and then a fall from 2023-24 onwards. While the fall is still not over, there is some recovery in the FSI and IT sector which shows some healthy growth. Now we all know this was due to the Covid Pandemic but we need to back this with Data. In the next section I try to “Describe” this change by co-relating this to Covid infections data

  1. Descriptive Analysis

In this code section, I get the Covid infections timeseries data from John Hopkins CSSE github site. After some cleansing and grouping the data to match the time dimension of the Jobs dataset, I then go ahead and stack both the plots vertically to show the relation.

Job Vacancies relation with Covid infections

It is evident from the graph that Job Vacancies were affected by the Covid pandemic. While some sectors show hockey stick growth, there were many sectors where you could see slower growth and fall. But nonetheless, Covid changed the dynamics and we are still seeing some after effects in some sectors.

Finally, it will not be an effective closure without putting some elements of Gen AI.?

  1. Analyzing the Graph with Gemini 1.5

Before trying Gemini, I tried some open source LLMs like Gemma, Even Microsoft’s Florence which is more of a captioning, however one model that gave good results was OpenGVLab/InternVL2-1B. However it uses flash-attn. And this is a problem since Colab provides T4 GPUs which are not supported so I kept getting this error RuntimeError: FlashAttention only supports Ampere GPUs or newer.

Finally I went ahead with the Gemini 1.5 model as it was much simpler integration without any need for custom tuning etc. A simple API call with the image and the response was not bad.

import google.generativeai as genai
from google.colab import userdata

genai.configure(api_key=userdata.get("GEMINI_API_KEY"))

def upload_to_gemini(path, mime_type=None):
  file = genai.upload_file(path, mime_type=mime_type)
  print(f"Uploaded file '{file.display_name}' as: {file.uri}")
  return file

generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 64,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
)
files = [
  upload_to_gemini("covidvacancies.jpg", mime_type="image/jpeg"),
]
chat_session = model.start_chat(
  history=[{
      "role": "user",
      "parts": [
        files[0],
      ],
    },
  ]
)

response = chat_session.send_message("Describe the graph below")
print(response.text)        

Gemini was able to identify and describe much of the data points however it did miss some important points. Look at the response below. It may be different for you though.

The graph shows the effect of Covid on job vacancies by occupation over the years.?

The top graph shows the number of job vacancies by occupation from 2000 to 2023. The bottom graph shows the number of Covid infections from 2000 to 2023.

The top graph shows that the number of job vacancies in most occupations decreased slightly from 2000 to 2020. However, the number of job vacancies in all occupations increased dramatically from 2020 to 2023.

The bottom graph shows that the number of Covid infections was very low from 2000 to 2020. However, the number of Covid infections increased dramatically from 2020 to 2023.

The graph suggests that the increase in job vacancies from 2020 to 2023 is likely due to the Covid-19 pandemic. The pandemic has had a significant impact on the economy, leading to business closures, layoffs, and a decrease in consumer spending. This has led to a decrease in the demand for workers in many industries.

The graph also shows that the impact of the pandemic on different occupations has varied. For example, the number of job vacancies in the accommodation and food services sector has decreased significantly. This is likely due to the closure of restaurants and hotels during the pandemic. However, the number of job vacancies in the information and communications sector has increased. This is likely due to the increased demand for technology and online services during the pandemic.

Overall, the graph shows that the Covid-19 pandemic has had a significant impact on the job market. The pandemic has led to a decrease in the demand for workers in many industries, but it has also led to an increase in the demand for workers in some industries. The impact of the pandemic on different occupations has varied, but it is clear that the pandemic has had a major impact on the labor market.?

The description is much better however it can be improved with a better prompt. That's for next time though.

I hope this article inspires you to explore the datasets available on Data.gov.sg and apply similar analytical techniques. Engaging with open data can yield valuable insights and enhance our understanding of various trends.. I had fun while learning as well as experimenting with different datasets. As you can see a proper descriptive analytics require using multiple datasets to look at relation and be able to reason the data changes.

I welcome your feedback and suggestions for improvement. Please share your thoughts and any questions you may have.

Thanks for reading


Links

Notebook - https://github.com/Naginder/learninglab/blob/master/DA_JobVacancy.ipynb

Dataset - https://beta.data.gov.sg/collections/689/datasets/d_f3ce4cb8ec1910a4d4699f0e2ecaa21a/view

AI Studio - https://aistudio.google.com/app/prompts/new_chat

Covid data - https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv




Yi Zhuan Foong

Senior Product Manager at Open Government Products (data.gov.sg)

4 个月

Thanks so much Naginder Singh Virdi for sharing your analysis using the Ministry of Manpower data hosted on data.gov.sg Really interesting to see the trends!

Naginder, this is very insightful. I truly enjoyed reading your article, the thorough approach, from data collection, tooling, exploration and then to AI-assisted interpretation. It's a fantastic guide to anyone who's interested in exploring further the treasure trove that is data dot gov. Excellent job!

Anurag Singh Chauhan

Head of Data and AI

8 个月

Nice article ! Government data portals ( and data tools too) are quite interesting and as you rightly mentioned, are not explored much. apart from https://beta.data.gov.sg/, there are https://data.gov/, https://data.go.id/home and https://www.data.gov.in/ ( india datasets have different portal by state also like https://kerala.data.gov.in/ or https://sikkim.data.gov.in/) with wealth of data to explore.

要查看或添加评论,请登录

Naginder Singh Virdi的更多文章

  • Databricks Private Connectivity, Learnings from the field

    Databricks Private Connectivity, Learnings from the field

    As part of my role, I talk to a lot of customers especially in the FSI domain around security and private connectivity.…

    1 条评论
  • SQL Server DBA Toolkit

    SQL Server DBA Toolkit

    I guess SQL Server DBAs are still not extinct contrary to general belief. It is still a highly sought after role…

    4 条评论
  • Partitioning in Azure DB for PostgreSQL

    Partitioning in Azure DB for PostgreSQL

    PostgreSQL provides very efficient partitioning capability which provides lots of benefit for scenarios where the table…

    1 条评论
  • Online Migration of AWS RDS for PostgreSQL to Azure Database for PostgreSQL.

    Online Migration of AWS RDS for PostgreSQL to Azure Database for PostgreSQL.

    Migrating AWS RDS for PostgreSQL to Azure Database for PostgreSQL using Azure Database Migration Service. This article…

    5 条评论
  • Data analysis in Power BI using Python

    Data analysis in Power BI using Python

    Power BI team recently announced the addition of Python scripting and visuals inside Power Bi Desktop. I believe this…

    30 条评论
  • Microsoft IOT Central - 101

    Microsoft IOT Central - 101

    Today's announcement around Microsoft IOT Central got me excited. I had heard about it when it was first announced in…

    1 条评论
  • Migrating from Oracle on Linux to SQL Server on Linux using SSMA

    Migrating from Oracle on Linux to SQL Server on Linux using SSMA

    Interesting as it may sound, I was even more excited to see how this actually works. And believe me it went like a…

    2 条评论
  • SQL Server on Linux

    SQL Server on Linux

    P.S.

社区洞察

其他会员也浏览了