From Zero to Hero: Forming Strong Data Science Team
The data science game is easy with the right team!
Each of us is not Zukerberg nor Page or Brin, but we always have the opportunity to succeed in something. I believe that we all have the power to do great things, the only condition is our passion and consistency. And yes, we also need to generate great ideas and possess a wonderful team to make them a reality!
And while the first part of this strategy is only at your disposal, today I will take the second part for myself and tell how to build a powerful team. Here will be the main steps for building a powerful data science team from scratch as I see it.
As always, grab your coffee, comfortable chair and just dive in!
#1 Step. Why Do You Actually Need a Data Science Team?
You may agree, you couldn’t just use some technology and wait until your project burs into a successful endeavor. It’s the same thing with data science. It’s not a magical solution for all diseases. You must clearly understand your goals, your needs, and your future prospects. Build the idea you want to realize. This idea should solve an existing problem, or open up new opportunities that never existed before.
After that, ensure that you have good knowledge of data science technologies and their business applications. You must be well-oriented or at least have some basic knowledge of what data scientists actually do and what benefits you can reap off involving them in your project.
So, before making any decision, decide on what niche you’d like to tackle. There are a variety of options for services from data scientists. Data science can either be a service (DS consultancy) or a product (DS-enabled apps), or both. Based on your target market or locale, one may be more viable than the other.
Whether service or product — you want to focus on use-cases for your core IP. This can mean functional areas like Marketing, Finance, Ops, or Risk — or an industry specialization like Financial, Retail, Telco, Utilities, etc. In either case, you want to identify a list of common business problems that data can be used to solve for. Customer acquisition (marketing), price
#2 Step. Choose a Model for Integrating Your Data Science Team
Before getting acquainted with all the roles in a matrix of a data science team, it’s necessary to get the possible way of interaction between your team and data scientists. If you decide to parachute data scientists right from the start without a proper understanding of team models, you should be ready that the approach like this is the road to nowhere.
So, for successful integration you need to be aware of the following team models:
- Center-of-Excellence approach (CoE). Being the most centralized model, this is like a research approach for your workers. The main goal for data scientists is to identify big bets and build prototypes. So, they do the most innovative part of the work.
- Accounting or BI approach. Here, the task for the data science team is to produce reports and presentations on a recurring basis (usually monthly and quarterly). This model is useful if there is a necessity to inform the organization of notable movements in top-level metrics. Once the team identifies an interesting trend, they would work with product teams to investigate the root cause.
- Consultant approach. This model will be the most appropriate decision if there is a small amount of work for the data science team. Like for example, you are developing some product and you need just to hear some advice from data specialist or ask direct questions.
- Embedded approach. Well, this is the most complicated approach to apply for a company. Embedded model is when you hire your own team of data specialists. What does it mean? Each engineering manager is in charge of planning for data scientist headcount, hiring, and allocation. The data scientist within each product team has the engineering team members as their peers.
- Democratic approach. This is easy and straightforward access to data by product managers, designers, engineering managers. Why so? Because engineers would lessen or remove the need for a data science role. Many identify the main need for hiring data scientists to be the lack of proper infrastructure for fast and easy dashboard creation.
- Product Data Science approach. If taking into account the CoE and embedded model, there also exists a spectrum of hybrid models. This means you can use something from one model and something else from another, and the main point you just build your personal approach that focuses on the product you have.
- Product data science (PDS) approach is inspired, in part, by the matrix structure. Here, individuals are simultaneously part of the data science function and the product. This means that data scientists, each a member of a product team, report into a central data science team.
Well, as you can see, there are lots of models you can choose from. But how to evaluate what approach is better? I think it all depends on your purpose, so be careful when you plan to do something with the help of the data science team and clarify all the details of the situation you have. Anyway, here are some of my advice:
- if you have a large organization looking for innovation in different parts of the business but not specifically in a particular area — choose the center of excellence approach.
- if you control a high functioning team that has a good idea of the direction the data scientist is going to take — pay attention to the embedded approach.
- if you have a small team, then a consultant approach will work best for you. With a large organization, this model is inefficient because everything will take a long time. If something is urgent, you might as well hire your own data scientist, and if not this system might work. In a small immature organization, the data scientist will just end up doing anything data-related and get overwhelmed with tasks.
- if the organization already has specific models they develop, maintain and productions, then the accounting approach might then be a good idea.
#3 Step. Hire Remarkable Data Scientists
Of course, you need to build a list of ideal candidates and calibrate with the hiring manager to gauge fit against the reality of the talent market. But apart from this, you need also to iron out nuances to distinguish which types of data scientists will be the best fit for your needs.
I will suggest not hire a ‘jack of all trades’, but a ‘master’ engineer. This is quite obvious thing to remember. And by the way, the most important. Data science is actually a very broad area, and you need to look for a person who can focus on a few things at once, not a whole data science specialization.
For example, a veteran computer programmer will know multiple coding languages such as Python, Java, C++ and some big data frameworks such as Hadoop, TensorFlow, NoSQL. Whereas a scientist might recognize a few of the coding languages but might know advanced techniques such as MapReduce and machine learning.
Moreover, it is often wise to hire a data engineer first to build the data-pipelines, and then hire a business intelligence expert or a statistician to optimize the output.
#4 Step. Learn Data Science Team Roles
There are essentially two types of data scientists — Type A and Type B — based on their roles.
Type A stands for analysis where a person can make sense of data without necessarily having strong coding skills. They can perform tasks such as data cleaning, forecasting, modeling, visualization, etc.
Type B, in turn, is about building. It is about the strong statistical background and ability to build complex structures such as recommendation systems, algorithms and more.
Whatever type the company use, there are the following team roles for both cases:
- CAO and CDO: These are leadership roles and are essentially required to oversee all the other roles that are there in the analytics team. They are business translators that bridge the gap between data science and domain expertise acting both as a visionary and a technical lead.
- Data analyst: The role involves the collection of relevant data and its interpretation. Some of the skills required are R, Python, JavaScript, C/C++, SQL.
- Business analyst: A business analyst takes care of tasks at an operational level. They are involved with converting business expectations into data analysis and are involved with data visualization, business intelligence, SQL and more.
- Data scientist: They are involved with data preparation, cleaning, using data mining techniques and solving business tasks. Some of the skills required by them are R, Python, SQL, Hadoop, among others.
- Data architect: Data architects work with large amounts of data and are crucial to deal with data warehousing, defining database architecture, and more.
- Data engineer: They carry out testing and maintaining infrastructural components that data architects design. In most organizations, data architects and engineers’ roles are merged as the tasks they accomplish are closely related.
- Visualization engineers: They may be required to deliver data science results in applications that end-users face. In most companies, IT units deliver this function, while those with specialized data science roles may have a separate role for them.
#5 Step. Forget About Perfect Planning
Agile and Scrum may have some benefits in software development, but data science is a different game. The unexpected statement, isn’t it? But let’s clarify what’s the point.
So, Data Science is about hypothesizing, investigating, testing, trying, failing, trying again and failing again…. This type of work cannot be estimated the same way that, for example, the implementation of a new button in an app can be estimated: there is no clear path from point A to point B, the path is very much dependent on what you find along the way.
What is more, at some point, you may even decide that you really should go to point X. And no, breaking it down into smaller tasks will not solve the problem at all: the uncertainty will still be there, splitting it up will not make it disappear.
And what does this mean in practical terms? It means that tasks don’t get done on time and are carried over to the next sprint, people get needlessly stressed out, and frictions arise in the team. It means that some days at your daily stand up meeting you may have to tell your team “I have made no progress on this task yesterday”, which is quite disturbing. If analyzing this from a psychological perspective, the person would create imaginary “story points” rather than accept and state in front of the whole team the real truth.
It means that in order to win these imaginary points, people begin to cut corners more and more, and quality goes down as a result, and this is something that you cannot afford to do in data science: you may not have the chance to just “fix the bug” in a second iteration as you would in software development, because by the time you find your mistake, your wrong insights may have already influenced some important decisions at C-Level.
………………………
If you do anything cool with this information, leave a response in the comments below or reach out at any time on my Instagram and Medium blog.