Advice for Fellow Data & Analytics Leaders
There are several ways to embrace an Analytical culture and leverage it to gain a competitive advantage, while at the same time controlling the costs attached to working with one or multiple teams of data scientists spread across your company. I discuss here a few tips and examples, dealing with hiring the right people, working with good enough data, and using appropriate analytical techniques.
Hiring Process
Should you hire PhD data scientists? My opinion is that you may already have the team in place. Software engineers with an analytical mind and looking for more variety in their job, can easily be trained to develop algorithms and perform other higher analytics tasks, especially now that so many libraries (for instance, in Python) offer rather comprehensive packages to implement various advanced machine learning techniques, and a lot of source code can be found in free repositories like GitHub. I once suggested to one of my clients, interested in building and updating taxonomies, to share my own articles on the subject, written in simple English, with their software engineers. They were paying a lot of money to a vendor processing their data and would have been able to develop an in-house solution, at a fraction of the costs.
Besides, a lot of professionals have considerable experience working with big and complex data. You may be able to hire a bioinformatics engineer or a physicist instead of a PhD data scientist, possibly offering training with one of the many companies offering data camps and similar programs (you need to find the right ones and look at who the instructors are). This broadens your hiring options. Besides, some PhD data scientists may find working in the corporate world as boring, too applied or mundane, and may not be happy in the long term.
As an example, Wall Street routinely hires physicists and even applied mathematicians to design sophisticated financial models. These professionals are called quants.
Data Issues
Your BI analysts should work with data engineers and decision makers to decide on what data needs to be collected, and which metrics should be tracked, and what you plan to do with it (so that the most valuable data is collected). They should also work with external vendors providing all sorts of data, to make sure you also get what you need from external sources (in particular, data about your market and your competitors). Some external data can be collected with web scrapers, at little cost.
Data engineers typically decide on the various levels of data aggregation needed to create efficient dashboards. They should also help you decide how frequently some of the data needs to be updated, and who can access it.
Any data scientist worth her grain of salt should be able to assess the quality of the data and be able to deal with messy or missing data. Any data that goes public (think about Yelp reviews) needs to be discussed with your legal team, as it is a frequent source of litigation.
Analytics
By analytics, I mean techniques to extract insights from data, usually for prediction, user segmentation for better targeting, risk minimization, optimization (e.g., supply chain, pricing, or purchase optimization), forensics, A/B testing, better controlling automated processes, or fraud detection (detecting users sharing their passwords). This involves statistical techniques beyond the use of simple SQL queries.
When it comes to implementing modeling techniques, my advice is:
- Use the 80/20 rule. A model with 80% accuracy may need a few days to build, but one with 99% may need six months. And much of your data is likely to be 80% correct, at best.
- Correlation does not mean causation. Correlations should be interpreted by industry experts to see if they make sense.
- Repeating the same tests over and over will eventually produce any result that you want. Data scientists know how to navigate this issue. It is also a reason why p-values are no longer widely used.
- Get industry experts and human beings look at your data. So many fake news are posted on social networks partly because these companies did not hire the right people to define ‘fake news’ and to check how good the algorithms used to detect them, perform. Or in other words, the spammers seem to be smarter and adapt faster.
- Do you only capture the tip of the iceberg? With Covid data, only a fraction of those infected were reported: for anyone tested positive, as many as 5 were never tested (at least in the first six months), got sick, and recovered on their own. They are not included in the statistics, despite being the largest group. Some were tested multiple times and have been counted as multiple people.
Finally, one way to further optimize ROI on data science: many practitioners complain about spending 80% of their time on massaging data. If you hire someone who can automate exploratory analysis, you will save a lot of money. There are many tools nowadays to automate this, and any data scientist with good exposure to software engineering can do that easily.
For more advice, check out this upcoming webinar panel discussion with data and analytics leaders at Wayfair, Cardinal Health, bol.com, and Slickdeals titled “How to Make Smarter Data-Driven Decisions at Scale’. Each panelist will share an overview of their data & analytics journey, and how they are building a self-service, data-driven culture that scales. Wednesday, March 31, 2021 | 11:00 AM PT (2:00 PM ET).
Save your spot here:
I look forward to seeing you there, and that you find this event useful.
Biz Insights | Data Arch | Digital Transformation | Agile Practitioner | Perf Mgmt Management
3 年Quiet Insightful & Need of the hour (as always).,