Building a Data Science Portfolio: A Newcomer’s Guide
Lillian Pierson, P.E.
Global authority on AI-driven growth | Author of 'The Data & AI Imperative' - the playbook for scaling success | Fractional CMO transforming tech scaleups | Enabled 10% of Fortune 100 to innovate | Empowered 2M+ globally
The point of building a data science portfolio is to demonstrate your skills to prospective employers. So before going into too much detail, let’s identify what these prospects are actually seeking.
A Newcomer’s Guide to Building a Data Science Portfolio
Let’s be real. When it comes down to it, prospective employers are looking to hire data scientists who generate monetary value by either reducing waste, or increasing revenues. That’s why you want to make sure your CV is value-driven, and not just the normal litany of the 100s of skills you’ve acquired over the course of your career. But, this discussion is part of a more detailed instructional activity that I’ve reserved for my private group of proteges.
Although it’s hard to showcase the value you’ll add, you can showcase your valuable expertise and data science skills. Prospects are all looking for something a little different, but the good news is that there are some fundamental skills common to most data science roles. Those are:
- Programming in Python and/or R
- Data munging
- Predictive modeling
- SQL experience
- Data storytelling
- Personality attributes: Team-player, problem-solver, and tenacious
Just by taking the time to publish a coding portfolio, you’re showing that you’re committed to and passionate about the field. That helps to demonstrate that you have the personality attributes that prospective employers are looking for.
Deciding where to publish your portfolio
When it comes to building a data science portfolio, there are a few good options on where to go to publish your work. Personally, I prefer to publish Jupyter Notebooks on GitHub for Python and RPubs for R code. You can, of course, publish your code to Kaggle.
The best option, in my opinion, is to publish your portfolio on your blog, along with some explanation on the concepts you’re demonstrating. Doing this allows you to show-off your technical communication skills. People who can communicate technical concepts in plain-language are highly sought-after. You can use your blog and coding portfolio as a place to practice this through writing and videos.
As far as publishing code to your blog, that’s made very easy by using embedded viewers. The embeddable viewer for Jupyter Notebooks is called nbviewer, and for R is RPubs (here are the instructions for that).
Deciding what to publish when building a data science portfolio
Ultimately you want to be building a data science portfolio that concisely demonstrates your ability to carry out all of the data science tasks that’ll be required of you. To that end, I’d consider building a data science portfolio that shows people how to do:
- Data munging – In other words, show people how to clean, restructure, and reformat raw data into the form you need for use in modeling and analysis.
- Describing and inferring – Use statistical methods to describe and make inferences from your cleaned datasets.
- Data showcasing and story-telling – Here is where you show your proficiency at communicating data insights to different types of audiences.
- Predictive modeling and machine learning – Demonstrate how your able to use machine learning methods to make predictions (hopefully predictions that are relevant to business).
You can put these all together piecemeal, or build an end-to-end project that walks through each of the important components. The latter is probably the better bet.
Some excellent examples to inspire your portfolio
When you’re building a data science portfolio, it’s always nice to look at some examples. I have been quite impressed and inspired by the following data science ports:
You may notice that I left myself off the list. If you’re wondering, “What about you Lillian? Where’s your data science portfolio?” Well, as a matter of fact, I pretty much use my Lynda’s / LinkedIn Learning courses as a coding portfolio. Although I have published some demos on GitHub, RPubs, and my blog, I’ve been so busy with paid work, I haven’t had the time or interest to do more.
And while we’re on the topic of paid work, let me point out that paid data work is my goal for you too. My entire purpose of this recent series of blog posts was to hopefully get you moving in the right direction that you, too, can find your way into the same position as me (the position of having so many opportunities for paid work, that you no longer have the time that’s required for building a data science portfolio).
This was a short blog post, and there’s so many more pointers I have to share with ya! Make sure to sign-up for my newsletter so you can get some added motivation and inspiration to persevere your way into the career of your dreams.
Sales Customer development - B2B, SaaS
6 年Great article, thanks!
Engineer Manager at House of Code
6 年Nice article, thanks for the useful tips and the example portfolios.
ServiceNow Administrator
7 年Great article, congrats. Extremely helpfull
Electronics Engineer, Systems Installation and Integration Project Manager, Operations Manager, Quality Section Head, Digital Transformation Manager, Finance Technologist and IT Auditor.
7 年...
Data Engineer Multicloud | Data Architech | Data Governance | Snowflake | IA Enthusiast
7 年Great!!!!