登录查看更多内容

Top 20 Software Dev Skills Over Time

Randy Moore

Software Engineering Manager

发布日期: 2021年5月1日

V2 of my last blog post. Last time job postings were collected every few weeks in real time from remoteok.io. An alternative is to use the Wayback Machine to collect the postings. The result:

Legend

# Posts: Total number of posts on remoteok.io saved at archive.org for that month
Total # NEs: The sum of counts of the top 20 Named Entities (skills)

Pipeline

Fetch the posts from the Wayback Machine
Process the posts generating top 50 named entities for each month
Manually reduce the top 50 named entities to the top 20 skills for each month
Run a script in Blender (2.83 LTS) on the top 20 skills to create the animation
Render the animation in Blender to a video file

Lessons Learned

Find Existing Sources of Data

The most obvious lesson. Instead of sampling over a long period of time seek existing sources of data. Here the objective is to discover what are the most in demand software development skills. Many preexisting sources are available for this kind of information.

Elbow Grease Makes a Difference

A different approach was taken compared to the last time. This time the top 50 named entities were generated, then reduced down to the top 20 skills by hand for each month.

Instead of ignoring a named entity if a substring matches an ignore word, only ignore it if the whole string matches an ignore string. This time named entities reflecting actual skills did not get discarded due to a substring match on an ignore word. It was not obvious valid skills were being discarded by the old script; only working with the data now the issue became visible.
Existing synonym map kept to help merging process but manual merging also done due to the copious variety of synonyms. Again here skills were miscounted in the old script (synonyms appearing beyond the 20th rank).
Discovered "Happiness Engineer" is a role / skill (seen in 2017-05). This alone was worth all the effort.
A much better sense of the data was acquired. For a one-time effort the manual approach is the right way. For a repeated process working with the data manually and then programming the know-how into a machine makes more sense.
All data science courses teach that you need to get your hands dirty to really understand the data, it is true.

Have Clear Rules for How To Interpret the Data

What counts as a skill? Manually working with the data exposed the nuance involved. Here, languages (English, German, ...) are included as skills but very broad terms (Engineer) are not. Perceived applicability of a named entity as a skill worth learning was the rule of thumb for this toy exercise. The detail and rigor of the rules should trend with the seriousness of the subject and audience. Discovering and refining rules is an iterative process, much like coding qualitative data.

Seek Large Amounts of Evenly Distributed Data

Using the WayBack machine seemed like a great idea. Plenty of data was available. Unfortunately the distribution was not ideal. Many months are missing posts, and the majority of the posts are clustered across a few months. This project went ahead anyway, partially out of curiosity to see how months with sparse data compared to those flush with data.

Fortunately the Wayback Machine website shows you the distribution of the data up front:

Use LTS Tool Versions

The Blender script does not work in the most recent version of Blender (>=2.9x). Fortunately an LTS version of Blender was used and so it was easy to download the latest LTS version and get up and running quickly again.

Write Clear Code

The original version of the processing code is not well written in terms of clarity. Specifically, returning tuples from functions is hard to reason about. The new version of the script is a bit less complex in terms of data flow within the program. Even if software is meant as a personal toy it makes sense to invest some time in quality, at least for the future self.

要查看或添加评论，请登录

Randy Moore的更多文章

Top 20 Skills Over Time - remoteok.io

2020年8月8日

Top 20 Skills Over Time - remoteok.io

Reposted from https://randalmoore.me/posts/top-20-skills-remoteok/ As a professional developer you must always be…
Keeping it DRY with OAS

2019年12月1日

Keeping it DRY with OAS

Reposted from my blog Don't Repeat Yourself (DRY) is a well known principle in software development. An Open API…
Concept Map: Humble, Powerful

2019年9月8日

Concept Map: Humble, Powerful

Reposted from my personal blog How To Create A Concept Map Begin with brainstorming the list of nouns which may be…
Django Deletion Dragons

2019年8月1日

Django Deletion Dragons

Reposted from my personal blog. Django models offers an ORM API that abstracts the database layer.

1 条评论
Exploring NLP Parsed Audit Documents

2017年11月26日

Exploring NLP Parsed Audit Documents

Learning more Python because my machine is slow Original post and the Named Entity Explorer (click "Audits" in upper…
Full Stack Walkthrough

2017年10月3日

Full Stack Walkthrough

Original post on the subject stack here. Summary High level development walk through for a toy example of a modern full…
Asynchronous Programming (and why it's all the rage for web services)

2017年6月21日

Asynchronous Programming (and why it's all the rage for web services)

Why Care? Browsing through job postings you often notice a job requirement along the lines of: Able to write highly…

1 条评论

See all articles

Top 20 Software Dev Skills Over Time

Randy Moore

Software Engineering Manager

Legend

Pipeline

Lessons Learned

Find Existing Sources of Data

Elbow Grease Makes a Difference

Have Clear Rules for How To Interpret the Data

Seek Large Amounts of Evenly Distributed Data

Use LTS Tool Versions

Write Clear Code

Randy Moore的更多文章

社区洞察

其他会员也浏览了

Tutorial: Build Any App in Minutes with GPTEngineer, no coding required

Utilizing Ruby Scripting in InfoWorks ICM to Extract Output Data and Generate SWMM5 Calibration Files with the Assistance of GPT-4

Developer’s Guide to API

API Explorer: Your Ultimate Guide to API Mastery

Code Assistance For Application Modernization/Migration: A Comprehensive Comparison

Google Apigee: Bridging API and Webhook communications effectively

The Spectrum of "Wrong": Navigating Disagreements & “Respectfully Disagreeing”

The New A.I. Software Process

Designing User-Friendly Overview Screens: Best Practices and Key Features

Integrating SAS? with GitHub

Legend

Pipeline

Lessons Learned

Find Existing Sources of Data

Elbow Grease Makes a Difference

Have Clear Rules for How To Interpret the Data

Seek Large Amounts of Evenly Distributed Data

Use LTS Tool Versions

Write Clear Code

Randy Moore的更多文章

Top 20 Skills Over Time - remoteok.io

Keeping it DRY with OAS

Concept Map: Humble, Powerful

Django Deletion Dragons

Exploring NLP Parsed Audit Documents

Full Stack Walkthrough

Asynchronous Programming (and why it's all the rage for web services)

社区洞察

其他会员也浏览了

Tutorial: Build Any App in Minutes with GPTEngineer, no coding required

Utilizing Ruby Scripting in InfoWorks ICM to Extract Output Data and Generate SWMM5 Calibration Files with the Assistance of GPT-4

Developer’s Guide to API

API Explorer: Your Ultimate Guide to API Mastery

Code Assistance For Application Modernization/Migration: A Comprehensive Comparison

Google Apigee: Bridging API and Webhook communications effectively

The Spectrum of "Wrong": Navigating Disagreements & “Respectfully Disagreeing”

The New A.I. Software Process

Designing User-Friendly Overview Screens: Best Practices and Key Features

Integrating SAS? with GitHub