Data Science is a Team Sport

Data Science is a Team Sport

Today, I am going to cover why I consider data science as a team sport?

Now grab my content on your favourite platform: YouTube | SoundCloud | SlideShare | GitHub

From data science use-case identification to the deployment of the models in production, so much goes into data science projects.

It's really rare to have all these skills in one person to deliver the data science project end-to-end.

And it depends on the ecosystem you are working in, whether its a start-up or an enterprise, size of the team, data maturity et cetera…

So what is it like to work in a data science project? What is the high-level process? what roles are involved and who does what?

Let's find out…

No alt text provided for this image

Let's have a look at the high-level steps in a data science project.

Identify

Most of the time, you may start with defining the problem statement, but many a time, you may not have a problem at hand to solve.

In that case, you may need to first identify the use-cases for data science and may also need to qualify those use-cases.

If you have identified and qualified many use-cases, then you may also need to prioritize them based on their return-on-investment (ROI).

Define

After identifying the use case, you define the problem statement, you gather business or domain aspects and you start building your understanding around the data available.

You design a high-level approach for the solution, discuss and define the key-performance-indicators (KPIs) with the business sponsors.

Assess

Most of the time, it is worthwhile to start with a prototype or proof of concept (POC) rather than involving in a full-fledged project.

Building prototype is a way to assess the feasibility of the data science project before investing heavily, here you do all the steps required in a data science project but on a smaller scale.

Once you have built a prototype and stakeholders give a go-ahead, you start the project formally.

Build

You collect and explore the data, you validate and clean it, you apply transformations to make the data ready-to-be-consumed for core data science tasks.

Then you build the necessary features, split the train, validation and test data-set and also train, validate & tune the model.

Above steps are iterative, which means you would be continuously munging the data, building and modifying features; training, validating and tuning the models until you get the required results.

Deploy

Once your model provides required accuracy, you deploy it in an environment to get the feedback from business stakeholders.

After getting the positive feedback, you build required dashboards for business KPIs and make your data science solution live.

Monitor

Once your model is in production, you need to monitor the data and model performance over the period of time for any performance degradation.

If a model performance goes down, you do a root-cause-analysis, replicate the issue in a different environment and repeat above mentioned steps to identify and resolve the issue.

So this is the end-to-end process of a data science project.

No alt text provided for this image

Now, let's have a look at the different roles in data science teams.

Please note that these roles may vary based on many factors.

For example, in a start-up, one or two people might be doing all the stuff.

While in an enterprise, you may even have more specific roles that I have mentioned here.

Business sponsor is the stakeholder who is funding the project. He is involved at the starting and at the end of the project.

Data science leader manages the project and the team to deliver the project as per the business sponsor’s expectations.

Data engineer collects, processes and refines data as per data scientist’s requirement, while the data scientist works on core data science tasks of the project like feature engineering, training, evaluating model performance et cetera.

DevOps engineer looks at the deployment aspects of the data science project, like automating the preparation of the build and its deployment in an environment iteratively.

You may need a cloud engineer for your project if you are using cloud services, which are available in the form of IaaS, PaaS and SaaS.

Once you have deployed the model in production, you need a business intelligence (BI) engineer to build a dashboard where business can look at the results and measure the performance against the KPIs.

If you have many data science projects which are sharing and reusing data and infrastructure components, you also need a data architect to do it in the most efficient way.

As mentioned in the starting, I would like to repeat here that these roles may vary based on many factors in different organizations.

So here we looked on a high-level, who does what in a data science project.

No alt text provided for this image

I will cover more details of this aspect in an upcoming episode, where I plan to provide a ‘process vs role mapping’ for a data science project.


You may notice here that the data science projects require a variety of skills, which are quite uncommon to acquire by a single person unless he is doing it for years.

As a beginner, I would suggest you to build a T-shaped skill-set, which means building depth in a particular area, maybe core data science or core data engineering tasks.

And having breadth in all other related areas which we discussed earlier.

Why I say so? Because I have seen enough data scientists sitting and waiting for data to be available in required format, before starting their work.

Some data scientists find it really difficult to work on cloud, some struggle with writing an efficient pipeline or version maintenance.

Having just enough understanding of these areas can take you a long way, and if required you can perform these tasks yourself rather than waiting for an expert.

In my view, it will help you in getting the job faster and will also make you quite effective in the team.

So, this is it for now.

I hope you found this article useful.

Let me know your views in the comments section.

If you liked this video, please subscribe to my channel to get an update whenever I upload the new content.


Ankit Rathi is an AI architect, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.

Why don’t you connect with Ankit on YouTube, Twitter, LinkedIn or Instagram?

要查看或添加评论,请登录

Ankit Rathi的更多文章

  • Data Science and its Nearest-Neighbours

    Data Science and its Nearest-Neighbours

    I started my journey into data science in 2012, at that time data science, machine learning, and artificial…

    1 条评论
  • How to Build a Data-Driven Organization?

    How to Build a Data-Driven Organization?

    There has not been an exciting time than this to talk about data. Data is everywhere, it is being called the new oil…

    2 条评论
  • Building Data Analytics Ecosystem

    Building Data Analytics Ecosystem

    In this post, I am going to cover how you can build a data analytics ecosystem in your organization. A business doesn’t…

  • End-to-End Data Science Process

    End-to-End Data Science Process

    In this post, I am going to cover a typical end-to-end data science process. Watch this episode on YouTube here.

  • 5 Data Science Use Cases for Every Business

    5 Data Science Use Cases for Every Business

    In this article, I would like to talk about 5 data science use cases for every business. Watch this episode on YouTube…

  • 9 Movies Every Data Scientist Should?Watch

    9 Movies Every Data Scientist Should?Watch

    I have been a movie buff all my life. I have watched almost all the top 250 movies from IMDB and every decent movie…

    2 条评论
  • 5 Books Every Data Professional Should?Read

    5 Books Every Data Professional Should?Read

    In this post, would like to write about 5 books every data professional should read. These are the books that have…

    2 条评论
  • Kaggle Vs Real-world Projects

    Kaggle Vs Real-world Projects

    Now grab my content on your favourite platform: YouTube | SoundCloud | SlideShare | GitHub In this article, I am going…

    6 条评论
  • How to approach Data Science in?2020?

    How to approach Data Science in?2020?

    Today, I am going to cover the 2nd most frequently question by my readers and followers, How they, I mean you can get…

    3 条评论
  • How I got started with Data?Science?

    How I got started with Data?Science?

    In this article, I talk about how I got into Data Science field. Now grab my content on your favourite platform:…

    2 条评论

社区洞察

其他会员也浏览了