How should your Data Science Team look like and What Organization Should it be part of?

How should your Data Science Team look like and What Organization Should it be part of?

In a large organization, for a data science team to become a value creation machine, many components have to exist and many conditions have to be true. In this article, I examine how to determine the size, the composition and the organizational positioning of your data science team.

Who Should Head The Data Science Team?

The best case scenario is when the organization has clarity on what is it seeking and is able to articulate the requirements. In such scenario, any person who is skilled at project execution can lead the team. The organization can even get away with such a person even if the exact requirements are vague but direction is clear i.e., the requirements can be discovered by talking to different people in the organization. If the organization does not have clarity on requirements, but knows that it needs new insights, innovations or new ways of driving revenue, but does not know where to look for guidance, then an industry veteran is better suited since that person is likely to be familiar with the industry’s unmet, unarticulated and upcoming needs and can lead a team of data scientists to address those needs.

 What Should Be The Team Size and Team Composition?

While there is no simple answer to how big the data science team should be, the number of outstanding problems and the complexity of the top problems (based on expected business value) should be the primary factors. Another guideline is to have at least one and half primary projects per person (for making progress on secondary project while the roadblocks on primary project are being addressed) for at least two years.

 A typical data science project in a large organization requires database skills, software development skills, machine learning and/or statistics skills, visualization skills, scientific/analytical skills, collaboration skills and finally, excellent communication skills. While it is impossible to find people who have all these skills, the minimum number of people that collectively bring all these skills into the team will be most beneficial. For example, database skills such as expertise in various SQL flavors, NO SQL databases such as Hive, Impala, Cassandra, MongoDB are necessary to access data from various enterprise data stores. Software development skills necessary to write Java/MapReduce, Scala/Spark, Python/Spark/H2O programs and deploy APIs or develop quick POCs using visualization software such as Tableau, Spotfire etc. People with scientific training with a background in machine learning, statistics are necessary to characterize the nature of the solution and design appropriate experiments to develop and validate solutions. In addition, the team members should have excellent collaboration and communication skills because interacting with business partners and customers is critical to delivering value. Today, there is severe shortage of talented machine learning experts, skilled big data engineers, and Java developers (skilled to write MapReduce code or write Scala/Spark code). So, sufficient time should be allotted for building the team with right skills.

Where Is Data Science Team Housed?

On one extreme, there is a central data science team that has access to all the data, all the tools in the organization and anyone across the organization can seek data science team’s help. In the central team model, since any department within the company can seek the team’s help, it is unlikely that the team will run out of high-value problems. Such teams are typically large and more likely to tackle large and complex problems that need varied expertise. Also, large teams can also afford skill-set redundancy to avoid single point of team failures due to attrition. The drawback of this model is that the team members do not understand any part of the business in depth. The owner of the problem closely collaborates with the team while the solution being implemented and is responsible to rapidly train the Data Scientists with necessary business background to implement the solution. The other extreme is multiple departments (such a product management, IT, marketing) have their “own” data science team (or just one person). This model has too many drawbacks. First, it is unlikely that a small team has many problems to justify a permanent data science team. Second, the investment needed for multiple copies of common tools may lead to wasted investments. Third, small teams are unlikely to have colleagues with vastly different skill sets to draw on each other’s expertise, therefore, solutions might take longer to implement.

Many organizations are still figuring out the perfect place for data science teams. Given the cost of infrastructure, team and varied skill set needed to tackle today’s most complex problems, a centralized data science team is more like to succeed.

 If you want to give me feedback on this article, please contact me or leave comments. Also, if you think your network can benefit from this article, please like or share the article.

John Belizaire

CEO of Soluna Holdings (Nasdaq:SLNH) | Green Data Centers for Generative AI | Author

7 年
回复
John MacLeod

Finance Executive & Analytics Leader | Aligning analytics and finance to address business challenges for organizations.

7 年

Wow, what a great overview. Data Science is truly a team sport, and knowing that, the questions addressed here are critical.

回复
Agustín Berasaluce

External Advisor, Consultant, Investor

7 年

Gonzalez Grau

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了