Driving Data Science Initiative: a Simple Four-Stage Model
An “initiative” is defined as “a new plan or process to achieve something or solve a problem”, according to the Cambridge dictionary; and a “data science initiative” creates value from data: it may manifest into a series of data insights to guide company making strategic decisions, or data product to directly serve the customers like a recommendation engine. For Data Scientists, driving initiative is a common approach to creating impact inside the organization, and it is usually more complicated than just “writing code to build a machine learning model”.
In this article, I will share my thoughts about the general stages (or components) involved in driving a data science initiative, along with one of my previous projects as a case study. Hope this could shed light on those who are on the Data Science career path.
In my opinion, driving initiative usually involves the following stages (or components):
This four-stage model serves as a simple way of conceptualizing the complexity involved, and the stages can also be viewed as components as well, as each permeates across the initiative’s full life cycle: for example, Vision is not only established just at the beginning but also requires constant reinforcement; Execution not necessarily comes after everyone is aligned but one need to execute on the idea (e.g. proof-of-concept) even before the sponsorship is granted. It is also worthy to point out that different initiative types may have shifted focus needs in stages: “top-down” initiatives might require more effort on “execution”, while “bottom-up” initiatives may need an equal (or even more) effort in “vision” and “sponsorship”.
With the four-stage model introduced, let’s take a look at a case study, which is one “bottom-up” initiative about creating one data product inside a large organization. I mainly focus on the structural perspective rather than the technical side, to highlight how this initiative moves through these four stages. For those who are interested in the technical side, please refer to these patent applications (https://uspto.report/patent/app/20200005215,?https://uspto.report/patent/app/20200005412,?https://uspto.report/patent/grant/11,270,234) for more details.
Case study: the creation of “Skill Match Index”
It all starts with a vision
In the summer of 2017, I joined the LinkedIn Data Science team focusing on the Learning Solution business. LinkedIn Learning is an online learning platform that helps members to develop and build new skills through e-learning and online classes, and the brand was formerly known as “Lynda.com”. I was very excited about this opportunity, as Lynda.com helped me a lot back in graduate school: I learned many useful tools (e.g. SQL, Python) through high-quality online classes from Lynda.com, and eventually landed my first Data Scientist job. This is all thanks to Lynda.com and its close partnership with the university which offered me access to these online learning resources.
So, in the first month I onboarded and was amazed by the tremendous data available on LinkedIn (commonly referred to as the “LinkedIn Knowledge Graph”), one question came up: can I leverage such data to help students to land their first job smoothly?
My experience tells me that: learning something new is difficult, but what’s more challenging is knowing what to learn, especially for students with the goal to land a job right after graduation. I remember back then in 2012, only after reading hundreds of job postings and chatting with alumni, I learned that SQL was more important than C for data analytics, and Python was becoming more popular than Perl in the industry. Could there be a better solution for graduating students? What if we can inform the “skill gap” between the students with the industrial jobs? This would be fantastic.
So, I advocated the concept and convinced several onboarding Data Scientist peers to work together on this idea as part of the Data Science New Hire project. We developed a simple yet reasonable algorithm to quantify the skill gap, and the presentation was well-received (in the Data Science organization).
(A featured picture about LinkedIn Knowledge Graph)
The path seeking sponsorship
The Data Science onboarding project was a fun experience. However, to move this idea forward, we need to have sponsorship from the Learning organization, which grant the resource to start the initiative and explore applications. Our previous presentation was viewed as an interesting idea with an obvious issue: a student may only refer to such insight once a year, and with such a low-frequency engagement needs, it would not impact much on either the engagement or monetization metrics. So one key next step is to connect this vague vision toward reasonable applications that materialize into real business impact and get the right sponsorship.
Then we brainstormed with many, and eventually came up with the following three potential applications to address each team’s business target:
To seek sponsorship, we took two approaches in parallel:
These efforts were not made in vain; eventually, we got the sponsorship from the Sales executive. Now, this initiative is formally established, and we could allocate resources to work on it!
(One figure inside one patent application to illustrate the concept)
领英推荐
Better alignment with others
When executive sponsorship is granted, an initiative is supported from the leadership perspective, however, organizations usually have competing forces, especially when an initiative requires many other teams to engage and contribute. The same applies to our case.
Our initiative aims to build a data product to provide insights for customers, and it requires multiple partner teams to be involved: we worked with various Engineer teams who create upstream datasets to identify the proper data interpretation; we worked with the Insights team to ensure consistent communication on sales message. Moreover, we requested to sit in related client calls to collect first-hand information and feedback along with our sales partners.
It takes quite an effort to get the alignment across and I remember biking across the LinkedIn campus on a daily basis to chat with various partner teams. Every time I walked into conference rooms, I’d start with an introduction (about myself and the initiative), and then with more follow-up discussions and meetings. Eventually, we reached the status that other teams are 1) aware of our initiative and 2) supportive to provide the needed help.
(Source:?https://en.wiktionary.org/wiki/alignment)
Execution toward milestones
This is the time we fire up all cylinders to build up the data product! It doesn’t take much time for me to realize the following two project management skills are quite practical.
[Set milestones]?Although our algorithm design is advanced, it is neither realistic nor effective to directly go with the most complicated version. We need to deliver a reasonable solution with the required reliability to enable v1 launch, and then continuously iterate upon with more improvements. So, setting proper milestones for what is v1, v2, and more iterations is foremost important. In our case, a simple representation of milestones are:
[Proper delegation]?With a few team members joining forces, it’s also important to ensure each member’s work is in sync and delivers the expected results. It is impossible for me to know every execution detail, so proper delegation is the key. For our initiative, we have each team member owning one specific area with a concrete deliverable, for example, one owns the data foundation part, with the goal to create the right result dataset and pass specific quality checks; one owns the communication with the sales team, with the deliverable to educate on methodology details and consolidate customer feedback. Through delegation, each team member has their clear ownership and together we moved toward the milestones.
It was April 2018, the insights from this data product were announced at a summit for education institution customers, and the feedback was predominantly positive: this could be a game-changer for higher education institutions to understand student career readiness and curriculum effectiveness. Subsequently, such insights were incorporated along with our product offerings to better serve LinkedIn Learning’s mission. For me, it was dream come true: finally, I contributed to the product that helped me in the past.
(Presented the work as keynote speaker at the?2018 LinkedIn Data Science MeetUp)
Learnings and Looking Back
There are so many learnings driving this data science initiative from start to end, while at the high level, I found the following three elements much relevant and important:
Looking back, the story is always smoother than it was, and there’re many challenges in terms of reality. For instance, in the “vision” stage, I was uncertain whether my thought was only inspirational or non-practical, and something concrete may never be landed. In the “sponsorship” stage, there were multiple times I walked into a meeting with hope but left without it. In the “alignment” stage, at one time I was told the same idea was already done by another team (luckily found out later we just need to revise our name to avoid conflict). In the “execution” stage, it tickles me when the date is moving toward the milestone deadline but there were still unresolved bugs in the system.
Still, I had a lot of fun: meeting lots of great people along the way and bringing value to the organization. Even after moving to another different business organization a few years later, I still received “Thank you” emails telling me how insights from our data product win the customer back. Driving initiative is an essential skill for advancing one’s data science career, and I wish this article could be helpful.
— — —
The same content is also published on Medium and can be accessed?here. If you enjoyed this article, please help spread the word by liking, sharing, and commenting.
Here are also a few other articles you may be interested in as well:
Senior Project Manager | Scrum Master | Data Scientist
1 年Dear Pan. Having recently completed a data science bootcamp, I found your article on driving data science initiatives within an organization, particularly within the project management lifecycle, incredibly insightful. Your breakdown of the stages and the practical insights from your case study provide valuable guidance as I embark on my journey to gain data scientist experience. Your emphasis on passion, communication, and delegation aligns with the skills I aim to develop, and I would greatly appreciate any advice you could offer on how to gain the relevant experience needed to work at companies like Meta or LinkedIn.
be Relevant.
2 年You are a very good technical writer, I will have to read more of your work and snag some tips.
Data Science @ LinkedIn
2 年Thanks for sharing Pan! I found this four-stage model tremendously helpful.
Data Engineer at Tryg
2 年The proces of alignment repeatedly (ongoing) is so important. People might have unexpected expectations of the outcome - and the whole project might end up the grave after a lot of effort has already been put into the project.
berusaha untuk beradaptasi
2 年manusia dan obyek yang perlu dilakukan big data ??