Doing Machine Learning Without Hiring Data Scientists

Summary Many data and analytics leaders face a chicken and egg situation — without data scientists, venturing into machine learning and data science is tricky, but without any successful pilots, convincing the business to hire data scientists is tricky too. Here, we offer proven solutions to this dilemma.

Overview

Key Challenges

  • Most organizations are still early in their data science journey and thus struggle to understand what machine learning and data science can do for them.
  • They lack the internal skills needed to plan and execute initial pilots.
  • They don't exactly know which skills are needed and hiring data scientists seems really difficult.
  • Without successful pilots, getting serious business commitment to fund data science projects is complicated.

Recommendations

To address these challenges, data and analytics leaders can:

  • Train existing staff into (citizen) data scientists — Many organizations have mathematically skilled employees without knowing it. These employees might have been math geeks since high school or are using their quantitative skills in other roles.
  • Partner with academia — Many universities and colleges offer data-science-related degrees now. Students are always in need of hands-on, practical projects for their graduation. If all cutting-edge companies do it, why shouldn't this also be of interest to mainstream companies?
  • Hire third-party professionals — There are hundreds of consultancies that can provide a spectrum of assistance, from creating project ideas, early piloting, coaching and teaching of junior staff, to the fully fledged creation of managed services.
  • Utilize packaged applications — These often provide superb cost-time-risk trade-offs, significantly lower the skills barrier and can provide a solution much faster than creating one from scratch.

Contents

Tables

Introduction

Many organizations are still in the early phases of their data science journey. More than 40% of organizations practicing advanced analytics say "the lack of adequate skills" is a challenge (see "Survey Analysis: Customers Rate Their Advanced Analytics Platforms, 2015" ). Without previously successful pilots, getting serious business commitment to fund data science projects is complicated, and without experienced data scientists onboard, implementing successful pilots is difficult too.

For these organizations, hiring experienced candidates can be very difficult for a number of reasons. Experienced data scientists will want to avoid being the first to join a company. The amount of energy needed to just get access to data, get it integrated, and have the first machine-learning models deployed into the business can be staggering. Really good candidates know about these risks and prefer to avoid them, because these are not the learning situations they seek, and in the current job market they have many options to pursue.

Retaining quality data scientists is also a struggle. In big urban arenas, they currently favor job hopping as this gives them more exposure to a broader range of tasks — ideally in different industries.

In this research, we outline four solutions to the dilemma of hiring data scientists and creating successful pilots:

  1. Train existing staff into (citizen) data scientists.
  2. Partner with academia.
  3. Hire third-party professionals.
  4. Utilize packaged applications.

These are the directions large data science labs are also taking to increase productivity of their existing data scientists, and to allow for contingency planning ahead of potential staff attrition.

Analysis

Upskill Existing Staff

The task is to cast a wide net across your organization to identify as many professionals as possible that ideally possess the important characteristics of data scientists:

  • The right mindset — curiosity and entrepreneurial drive.
  • A holistic attitude — the whole data science pipeline, from data collection to delivery of the analytics results, must be questioned and analyzed.
  • The right dose of mathematical affinity — data is often noisy and messy, and the situations data scientists deal with are loaded with uncertainty and high dimensionality.

The easiest of the above characteristics to find in a potential candidate is mathematical agility/affinity. The importance of mathematics in modern business is increasing, and many students have been exposed to much more advanced mathematics in the past 25 years. An estimated 5% to 10% of professional white-collar staff are actually quite math-literate. 1 , 2 Let's call those math natives , who have already been math geeks during their high-school years, but were told "you cannot make money with mathematics or physics, so go and get some real education that can pay your bills."

Yet another cast could be called math-trained . They had to study advanced calculus at university (e.g., computer scientists, physicists, chemists, engineers, biologists) and never realized what this might lead to. Now we know the answer: advanced calculus and statistics provide the underpinnings of data science and machine learning.

Yet another group are math professionals who already perform quite specific data science work in their day-to-day jobs (see Table 1 below for several examples).

Table 1.   Data-Science-Related Disciplines

Enlarge Table

Discipline Description Actuaries

Analyze the financial costs and uncertainty around claims and pricing in the insurance industry.

Process engineers

Use statistical techniques to understand quality control and design of processes.

Financial analysts and accountants

Financial forecasting and cash flow analysis, typically using spreadsheets or specialized tools.

Statisticians

A core discipline of data science. However, data scientists have stronger focus on predictive analytics and are more fearlessly applying modern machine-learning algorithms.

Operations researchers

Apply optimization, simulation and prescriptive techniques.

Marketing analysts

Evaluate the impact of marketing spend and the effectiveness of campaigns by analyzing data.

Business analysts

Simpler business intelligence reporting and dashboard creation tasks, albeit the boundaries are often blurred.

Add/Remove Columns×


Source: Gartner (June 2016)

Recommendations:

  • Find interested candidates by having internal or external experts give company- or line-of-businesswide galvanizing lightning talks on data science. Cast a wide net. Have an explicit call for action that states that the organization has funds available for this upskilling process.
  • Offer access to self-learning courses, as these are being offered through Massive Open Online Courses (MOOCs). They are typically low cost or even free — many are available on YouTube, but they leave it to the autodidactive capacities of the candidates.
  • Send seasoned professionals to one of the many data science courses (e.g., IBM's Big Data University, the SAS Academy for Data Science, Dato's Data Science Summit, Strata + Hadoop World, H2O World), as they will need a more personal approach.
  • Organize "brown bag" lunch sessions, where different participants across the organization can meet (bi)weekly to have a few informal discussions or presentations with typically intense and relaxed discussions. 3

Partner With Academia

Using universities for specific projects serves the dual purpose of an organization getting skilled resources, while also providing students with real-world learning experiences. The relationship can take four main forms:

Internship — As part of internship, a student is dedicated to the company for a period of time, whether over summer or a semester. It's important to have clear goals on what work the student should perform, as well as milestones and objectives. This approach is good when you have an existing data science lab and the student becomes an additional person in the lab. It is less suitable when the organization has few skilled resources to supervise the student.

Class project — As part of a real-world learning experience, the professor is the primary supervisor and a team of students will complete an analysis as part of a class project. This approach is ideal when there is a particular dataset or hypothesis that the company wants to explore.

Innovation lab — Analytics programs have computer and innovation labs with new technology to try that may include a Hadoop cluster, statistical programs or machine-learning software. An organization may rent time at the innovation lab to investigate new technical capabilities.

Hackathons — Participating in or hosting a hackathon can reveal valuable insights, models and applicants. With a hackathon, an organization is providing a dataset and problem statement. Individuals and teams of data scientists apply analytics to solve the problem. For example, Bayes Impact is a nonprofit organization that hosts annual hackathons on behalf of nonprofits, and for the public good. In 2016, it analyzed datasets related to opiate prescriptions, consumer ratings of health insurers and the impact of the Affordable Care Act, and higher than average suicide rates among military veterans. Kaggle is a crowdsource platform that Walmart used to host a virtual hackathon, letting hackers analyze the relationship between store sales and price cuts. 4 The best hackers were then hired.

The following are some examples of how university partnerships have helped companies with their analytics:

  • Coca-Cola partners with Georgia-Tech for innovation in machine learning and robotics.
  • When Nielsen wanted to analyze millions of samples of data on TV viewership and advertising, it partnered with Stanford Graduate School of Business and built the Nielsen Innovation Lab for collaborative experimentation in marketing research.
  • The Data Science Initiative (DSI) at the University of North Carolina at Charlotte is a consortium of academics and industry executives from companies such as Bank of America, Duke Energy, Carolinas HealthCare System, and SAS.
  • Ford has a formal university research program that funds hundreds of university research projects around the world. Students from Wayne State University developed an optimization model to save on the costs of producing prototypes (the model saved Ford $12 million initially, and it may go on to save as much as $250 million). 5

Students will rarely be able to question other parts of a data science pipeline (especially data collection and ingestion). Thus, having student interns is only proven to work well if:

  • The academic supervisor has a strong commitment and experience in the practical area to make this a success
  • There is existing staff that can, to some large extent, guide the student interns
  • The students can work autonomously on the core analytical parts (feature engineering, modelling, testing) of the data-analytics pipeline.

Recommendations:

  • Utilize senior academics that know the business processes and quantitative methods in your area really well. Get them to advise you. They will benefit from the practical traction with you, and you, in turn, will benefit from their knowledge and their students.
  • Be selective with what part of the university or college you are collaborating — there are some highly specialized departments out there, which sometimes can be very beneficial (e.g., industrial optimization, B2B marketing, supply chain optimization).
  • Organize hackathons either citywide or campuswide.
  • Hire more than one student intern, so that teamwork is encouraged and students can coach one another. Interns don't cost much, and if they succeed and you like them, hire them for good.
  • Keep academic advisors close, if they have a real strong, pragmatic attitude. They will thank you with sending their best students. You can even utilize them to keep expensive consultants in line.

See "Citi Uses Hackathons to Accelerate Digital Innovation."

Hire Third-Party Professionals

In this time of immense machine-learning skills shortage, third-party professionals can accelerate and kickstart the success of data science programs.

Experienced external professionals can start creating a portfolio of project ideas and perform initial piloting, potentially jointly with student interns.

Data and analytics leaders should expect their analytics service suppliers to provide the following:

  • Upskilling of existing staff — Suppliers can educate existing math-literate staff (see first section of this research note) and even further assist in hiring full-time data scientists.
  • A range of value-adding knowledge assets and knowledge transfer — Frameworks, collateral and knowledge artefacts that improve solution quality, repeatability, project delivery effectiveness, and time to value. Any frameworks and methods should be modular, transparent and reusable. Designs are to be based on shareable and repeatable ideas and principles.
  • Critical thinking, imagination and creativity — Analytics service suppliers should not be focused solely on project execution. While a service partner's capability to deliver technical implementations is clearly important, this is not the only criterion that delivers business value. Service partners and consultancies should also be seeking to enhance your business; that means offering proactive input and bringing innovation and creativity to every interaction.
  • A learning experience and knowledge transfer — Coaching and mentoring of the client's own in-house team will ensure that the data science capability is sustainable beyond the term of engagement.

There are hundreds of consultancies offering business analytics and data science services, both within local markets and internationally. These cluster in four broad groups:

  1. Global consulting companies and system integrators Multinational organizations that offer a broad range of business consulting and technology integration services, and that incorporate a specialist analytics practice. Representative examples include Accenture, Atos, Capgemini, Cognizant, Deloitte, IBM Global Services, Hitachi, Hewlett Packard Enterprise, Infosys, KPMG, NEC Labs America, PwC, TCS, Wipro (see also "Magic Quadrant for Business Analytics Services, Worldwide 2015" ).
  2. Specialist midsize consultancies These analytics service providers operate in multiple markets, but focus specifically on a range of analytics and machine-learning solutions and services. Examples include Affine Analytics, Clarity Solution Group, Fractal Analytics, LatentView Analytics, Ma Foi Analytics, Mu Sigma, Opera Solutions, Palantir, Tessella, ZS Associates (see also "Market Guide for Advanced Analytics Service Providers" ).
  3. Local companies and analytics solution specialists — Companies that focus on a particular geographic market or on specialized machine-learning solutions for a specific industry or business process setting. Representative examples include Black Swan, Blue Yonder, CognitiveScale, Comma Soft, DataScience, Empolis, Expert System, H2O.ai, QuantumBlack, Silicon Valley Data Science.
  4. Analytics consultancy service brokers — Companies such as Kaggle and Topcoder act as a "middle agent" to bring together clients that require specialist input and analytics specialist resources that work on a project-by-project basis.

Recommendations:

  • Perform a thorough assessment of your internal advanced analytics capabilities to determine which service provider engagement and pricing model is the most suitable for your overall advanced analytics program in relation to skills, funding and bandwidth.
  • Look for narrative and communication behaviors built into the vendor's methodologies and culture, both during the sales cycles and within the delivery process, to help with engagement and the uptake of analytics solutions.
  • Shifting from staff augmentation to managed services is not the goal — the focus is on ensuring that specific business outcomes are achieved.

See also "How to Engage Business Analytics Services Providers" and "Toolkit: Finding the Right BI and Analytics Service Provider."

Utilize Packaged Applications

Machine-learning capabilities are often packaged as targeted software applications to solve specific machine-learning problems. There is, in fact, an already enormous wealth of prefabricated solutions available, and growing all the time. Customer-facing examples are presented in the "Hype Cycle for Customer Analytics Applications." They include sentiment analytics, customer-best-next action, predictive lead scoring, marketing mix optimization, claim fraud, database campaign management and demand signal management.

Further examples come from the "Hype Cycle for Back-Office Analytic Applications, 2015" — supply chain performance management, workforce analytics, asset performance management, predictive coding, warehouse resource planning and scheduling, IT operations analytics, and security intelligence.

In particular, if organizations don't have data scientists on staff, there are many reasons to consider such packaged applications very seriously. For example:

  • Fast time to solution
  • Easy maintenance
  • Lowered skill requirements

Even for organizations with larger data science teams, packaged applications are an important consideration, especially as a productivity gain. Building complete data science solutions often takes three to 15 months. Packaged applications, however, can be deployed within a few weeks (say four to six) from point of purchase. Therefore, the gap between needing a solution and having a solution can be significantly bridged. Even if data scientists are aplenty, it may not be a good use of their precious time to build solutions. It is not clear that your own data science solutions will be any better than that of the packaged solutions, especially if the data scientists don't have the luxury to go through dozens and dozens of iterations to converge at really mature solutions for particular business applications.

Recommendation:

  • Check the vast and rapidly growing arsenal of packaged machine-learning solutions, and be very considerate when deciding against using any such package. Even with data science skills available, packaged applications provide really good time to solution and can provide significant productivity gains over building machine-learning solutions from scratch.

Gartner Recommended Reading


"Staffing Data Science Teams"

"How Data Scientist Skills and Qualifications Differ From Those of BI Analysts and Statisticians"

"How to Take a First Step to Advanced Analytics"

"Market Guide for Advanced Analytics Service Providers"

"How Data Science Projects Deliver Business Impacts"

Evidence

1 Fast Facts , National Center for Education Statistics (NCES).

2 "Why Many Students With A's in Math Don't Major in It." The Hechinger Report. U.S. News. 22 June 2015.

3 Brown Bag Lunch . About.com. 4 May 2015.

4 "Walmart: The Big Data Skills Crisis and Recruiting Analytics Talent." Forbes.

5 "How Big Data Brought Ford Back From the Brink." Dataconomy. 24 August 2015.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了