So what makes a good data science profile

So what makes a good data science profile

Let's start with some stats

Data science was named the?fastest-growing job?in 2017 by LinkedIn, and in 2018 Glassdoor ranked data scientist as the?best job in the United States. Furthermore, a recent study by PriceWaterhouseCoopers states: “The best jobs right now in America include titles like data scientist, data engineer, and business analyst.”

  • 650% job growth since 2012 (source: LinkedIn).
  • An estimated 11.5 million new jobs by 2026 (source: U.S. Bureau of Labor Statistics).
  • An average annual average salary of $120,931with the job title of data scientist (source: Glassdoor).


Few years back when I heard about "Machine Learning", I found it very fascinating. Started reading articles, YouTube videos, started visiting different knowledge sites etc. This made me even more eager to know more about this field. I decided to go deeper into this, learn more.?

I attended some conferences, some live lectures, read various blogs and joined some discussions. The problem with these method was they were short, and on high level they'll talk about where industry is heading to, what are the use cases etc. This is important and gives you good pointers.

At that time there were very little authentic knowledge/content available on the internet. Now at time its other way round. It's called "information hell" and really difficult to choose the right content. So I thought to create this article for aspiring Machine Learning Engineer and Data Scientists.

How to start?

Searching for more material I came across Stanford University's published lectures of Mr. Andrew Ng. These lectures were generated around 2007 or so. These lectures had very good basic knowledge of the field. Later it was created as a course in Coursera founded by Prof. Andrew.

This would cover the essence of the field, Supervised/Unsupervised learning, Regression techniques, how the cost function is derived, how different models work in base form. And these are explained very well. The exercises are done in MatLab. This is a great learning. This is overall a very good foundation course but it misses the latest advancements and algorithms of the field.

This is good foundation course and will be good for everyone to start with.?

There are other very good courses in the market like Udemy course of "A-Z Machine Learning…". This course would cover all the good models, advanced models in Python and R language. You can choose the language you want to follow. This is a course which covers good breadth of the subject, but misses in depth explanation of how do you come to device an algorithm or derive different important formulas which might be very important if you want to make career in this area. This course can be a good course for a developer.

These two courses complement each other. But even if you combine these two still you miss a big chunk of essential things like Statistics, EDA, basics of any scripting language like python, R and most importantly SQL (or similar).

To completely take on the field, you need a thorough course. A lengthy and well covered course, which covers topics in depth, contents of the course should be recent, should cover all the advancements in the field. But this all is may be one way information transfer, what I mean is you are reading the material, watching the videos etc. To gain practical knowledge you need more. Some more dimensions to it like live interactions, more practical problems to solve and real world case studies are required.

There are various courses which provide such multi-dimensional knowledge in the field, many micro-degree courses are available like in EDX, many diplomas are available like one from IIIT-Bangalore. They'll give you programs from 6 months to 2 years where you can get expertise in

So in short if you are exploring your interest in the field, may be try taking courses which are foundation courses as I mentioned in the top. But once you have decided that you want to build your career in the field, you need a thorough course which gives you in depth knowledge, good cases studies, advancements of the field and good connections.

This is good when you are starting your career in this direction. But is this all you need? What about people who are already exploring the field.

Making your profile 360 degrees

Well this was about getting knowledge on Machine Learning. But when open up to the market and hunt for the job, you are required to know more. Now-a-days every organization is looking for a Superman or some sort of avengers :). ( I generally hear requirements like a person should be individual contributor and should be handling more than 30-40 member team. Should have Technical architect background and should be able to do resource management, project planning etc. Should be?very good at coding and designing and should be able to handle organizational activities. Should be playing Scrum master and should have expertise in multi cloud platforms… so on and so forth :) anyways that's a topic for later).

?When you are doing data science, you are definitely exploring the data, doing massaging of the data and making decisions on that. But who's going to get you the data??… Data Engineer.

So your profile is not really 100% complete unless you have good knowledge on the data engineering side of the world. Understanding ETL, ELT, data pipelines is an integral part of the journey. Here python, pyspark, some of the big data modules come in handy.?

As technology is evolving management of infrastructure is not that easy. Having big infrastructure to handle this big volume and variety of data on-premise is not that cost effective and feasible. Definitely security is other aspect. So cloud comes into the rescue. Every organization is trying to move their infrastructure, their data to the cloud depending on their need. This takes away their pain of maintenance, security and is also cost effective in long run. Quick to buy any server, deploy application and comes with various integrated services readymade.

When we talk of Machine Learning and Data science we are generally dealing with good volume of the data which we want to analyze. So it's an obvious choice to have all this built in Cloud platform in the form of ML pipelines. For making data pipelines you have ADF/Synapse/Databricks in Azure and Glue/StepFunctions/Databricks in AWS.

In AWS Sagemaker/Sagemaker Studio and in Azure you have ML Studios for data science and ML work. They all provide development environment using jupyther notebook and provide most of the python libraries and their own implementation of the algorithms too. These custom algorithms are tuned to be perform best in respective environments.

So even if we ignore the MLOps for time being we have Data Engineering + ML + Cloud knowledge essential to grow in the field. You can start with one and expand your knowledge to the other as you get opportunity and experience on different assignments but eventually you'll need all of these.

Hope this gives a good overview of the overall path to be followed by data scientists and machine learning enthusiasts. You can always reach out to me for discussion/guidance.

Thanks.

Ajit Pal Singh Wadhawan

Cyber Security Consultant and Trainer ? Data Protection ? Security Leadership ? Project Management ? Vendor Management ?Team Leadership ?Career Coach ? Content Writer

2 年

Nicely Articulated

Shruti Anwesha

Product Consultant (Technical)

2 年

Thank you for sharing this.Hope this is part-1 and soon we will get more posts on this topic.

要查看或添加评论,请登录

Raja Saurabh Tiwari的更多文章

  • The Hidden Cost of AI

    The Hidden Cost of AI

    Artificial Intelligence (AI) is revolutionizing industries, enhancing automation, and creating new possibilities for…

    3 条评论
  • Agentic AI - My take

    Agentic AI - My take

    Introduction In recent months, Agentic AI has emerged as a focal point in the technology sector, captivating both…

    16 条评论
  • Large Language Models vs Small Language Models

    Large Language Models vs Small Language Models

    Before directly jumping to LLM, a quick recap on AI and Machine Learning. We all have been seeing the below image which…

    2 条评论
  • Don't let your fear win

    Don't let your fear win

    Once Krishna and Balarama got late playing in the forest. They decided to rest in there over the night and thought to…

    1 条评论
  • Data Lake & Data Mesh

    Data Lake & Data Mesh

    Global data creation is projected to exceed 180 zettabytes in the next five years. It was always a struggle to create a…

  • Analytics of Data Scientists in Kaggle

    Analytics of Data Scientists in Kaggle

    Kaggle has recently published a report on the Kaggle users on various aspects. The trend shows analysis of people…

  • Text Analysis - Word Cloud

    Text Analysis - Word Cloud

    Text Analysis : Text analysis one of the richest area in the Machine Learning space. Text analysis is the process of…

  • Machine Learning (Without CODE)

    Machine Learning (Without CODE)

    Machine learning is very fascinating for data science practitioners and everyone and there's a continuous effort…

    2 条评论
  • Statistics vs. Visualization (#Data Science)

    Statistics vs. Visualization (#Data Science)

    Understanding the statistical properties of the data is one of the key aspect of data science or Machine Learning…

  • AutoML - first glance

    AutoML - first glance

    "Machine Learning and AI attempts to automate manual work..

社区洞察

其他会员也浏览了