Information Extraction through NLP on Job Portals
NLP effectively optimizes the analysis of job market trends

Information Extraction through NLP on Job Portals

As the buzz around Natural Language Processing (NLP) technology continues to grow, organizations are beginning to realize its potential for streamlining tasks. While ChatGPT remains a hot topic, the capabilities of NLP extend far beyond just such applications. For example, businesses can benefit from efficient and accurate analysis of job market trends and applicant screening to find suitable candidates.?


Project Objective

Analyzing job market trends or finding suitable candidates is always a time-consuming task, with hundreds or even thousands of results when searching for a specific entity of interest. The goal of this project was to develop an NLP model that could extract information about occupational fields and employee skills based on job advertisements.

The implementation process for this project was divided into two steps:

  1. Web crawling of a job portal to gather data from over 100 companies.
  2. Creating a deep learning solution to recognize the targeted named entities of 30 DAX members.

?

Dataset

To train our machine learning text extraction algorithm, we used a dataset consisting of job ads obtained by a web crawl. This crawler can then be reused for further analyses, to detect new trends in the future.

Figure 1: Information extraction through NLP allows to filter data of interest from datasets.
Information extraction through NLP allows to filter data of interest from datasets.


Challenges of Information Extraction through NLP

There have been 3 major challenges in the information extraction process:

  • The first challenge is that web crawling is prohibited on many job platforms.
  • The second challenge is that the portal used is constantly changing its source code. As a result, the crawling process was made more difficult.
  • While training the algorithm, we found a sufficient number of words with a wide range of meanings. Since only certain entities were of interest, unnecessary ones were filtered out.


Methods of NLP Information Extraction

For efficient information extraction, we used a Python-based web crawler to collect job ads. This database can be downloaded as a CSV file. The following structured dataset contains descriptions of:

  • positions
  • company names
  • job locations
  • publication dates

We then used a text annotation?tool to label the occupational fields and skills, which were then downloaded in the JSON format. The labeled entities were then divided into training and test datasets.

  1. To train the algorithm, we utilized the spaCy library for NLP for information extraction. The trained model was then exported to be available at any time for subsequent analysis.
  2. The test dataset was evaluated using sklearn metrics with high accuracy (95.1%).

Finally, we also crawled job posts from the 30 DAX members and saved the named entities that corresponded to the occupational fields and skills we were looking into. A pre-defined list of stop words was automatically removed to increase the quality of the output data.


Project Results

A trained algorithm is able to extract information from text. For better visualization, a user-friendly dashboard presents the results of a query. According to individual needs, the user can choose and drill-down into:

  • a company
  • specific skills
  • occupational fields

Filtering by company, produces a tree map where the size of the rectangles corresponds to the number of vacancies in that company.
Filtering by company, produces a tree map where the size of the rectangles corresponds to the number of vacancies in that company.


Selecting a specific skill or experience will show the relative number of open positions across all companies. Further analysis, such as combined skills, can be carried out as well.
Selecting a specific skill or experience will show the relative number of open positions across all companies. Further analysis, such as combined skills, can be carried out as well.


By extending the created algorithm, it is possible to use information extraction, for example, to scan the vast number of job applications that a company receives. With information extraction through NLP, companies are able to save resources for future market research. Identifying current trends helps a wide range of industries, such as:

  • Car manufacturers
  • Construction companies
  • Food manufacturers
  • And many more

One look at the output data shows what kind of experts are in demand on the market, whether they are data scientists, 3D developers, or other specialists. Optimize processes within your organization and get in touch with our data science experts at Supper & Supper!

要查看或添加评论,请登录

Supper & Supper GmbH的更多文章

社区洞察

其他会员也浏览了