AI, Data Science Too Abstract? Channel Your Inner Moneyball
Some of you are about to invest seven or eight figures into new AI platforms with nothing more than account executives who want their bonus. Others are about to hire millions-of-dollars of headcount in the areas of data analytics, data science and/or applied business statistics – simply because the competition is doing the same. Before you cut that check, let me introduce you to your new best friend: professional baseball statistics.
Part One: The fundamentals:
- Professional baseball statistics refers to the entire data lake of freely available, peer-reviewed, industry data on the game from 1871 to present. That’s not a typo, I mean for the last 148 years. You can even go back to October 24, 1845 if so inclined.
- AI (artificial intelligence) is digital transformed programs that make decisions historically needing human intelligence, such as visual perception, speech recognition, and decision-making.
- Data science is the study and strategy of deeper trends and opportunities through large amounts of data (structured and unstructured) to make highly confident decisions in “moments” that might normally take lifetimes.
- Data lake is a place to store vast amounts of raw (hopefully accurate) data in its unaltered format until needed for deeper logic.
Part Two: The opportunity statement:
Fortune 500 companies are investing millions-of-dollars in AI licenses/platforms and salaries for expert-level data scientists, without having a precise and scalable going-in position because companies don’t have a like-sized reliable data lake to pressure-test first exactly what they are trying to solve. By embracing one of the largest free data lakes and clean data sources in recorded human history, professional baseball, companies can advance their use of AI and data science by building the needed muscle before large AI and data science investments. This empowers success for the company, investors/shareholders, employees, and customers.
Part Three: The challenges:
The three most common (but not exclusive) challenges for companies investing heavily in AI and data science include:
Poor data
- This may be for political reasons (each team internally guards their data like a proprietary asset for personal job security).
- Multiple platforms either fail to leverage technology to safely access data, or each platform’s data is in a format that doesn’t aggregate cleanly if at all.
- Data collection is non-repeatable, unclean and/or inaccurate. We still live in a world where Fortune 500 companies operate off of spreadsheets and folks typing in data emailed in static files. Seriously.
No clear strategy for AI or data science
- “Look, our CIO was at a conference and s/he came back insisting we ‘get in the game’.”
- Not knowing employee skill sets and hoping somehow a robot or algorithm will solve everything human management ignores in their people development.
- AI account sales associates or data science candidates use six-syllabic words that seemed to come from a thesaurus at an Elon Musk rocket-launching party.
Company politics handcuff data science
- Scenario one: the company hires data scientists but partitions (either technologically or politically) the data from said data scientists.
- Scenario two: the data scientist took the job because it pays 3x than lab work at her/his alma mater, but has little interest to deeply understanding the company to maximize data usage.
Part Two: Leveraging baseball help your AI & data science going-in position:
“But Justin, we’re not a sports team. My company sells solar panels, or invests billions of dollars in global investment banking, or we are the second-largest gluten-free pizza chain in Silicon Valley.”
I get it. We’re going to play the odds. Baseball provides free trillions upon trillions of independently verified data points funneling up to more complex algorithmic logic. It’s easily accessible to anyone on the net and the data goes back as far as October 24, 1845 with strong collective data starting 1871.
Our plan? Let’s translate your company language into baseball language and then surf one of the sexist data lakes in human history. That is if you think applied business mathematics and data modeling is sexy, which you should, if you like making money.
Company language: We’re the second-largest gluten-free pizza chain in Silicon Valley. We have a lean and repeatable business model and believe through customer feedback we can make a move, and invest what amounts to 7 percent of our fiscal year budget to surpass our top largest competitor to become top in the market. With that, we’d earn double our investment in the one quarter.
Baseball language: We’re in second place in the very competitive American League Central. You need to have deep pockets in this division to compete against big-spending AL clubs, but we also use our fair share of analytics to invest wisely. Right now. If the season were to end today, we’d make the playoffs in a Wild Card spot – which is great. But still risky since the Wild Card game is one-and-done, so we want to make a move and acquire more talent, because we believe we have a strong shot at winning the division. We can make that money back, 2x on our investment, from the extra home playoff games and related "division champion" merchandise sales.
Synthesized language (needed to run the baseball comparative model): Your cutting-edge pizza chain is the 2019 Cleveland Indians which, at time of this article publication, are 69-46 and battling Oakland and Boston and Texas for two AL Wild Card spots. You also are only one game behind the Minnesota Twins for the American League Central lead. Whoever wins the division automatically starts in the AL Division Series. Your 2019 committed pizza “player” contracts are about $116M which means a 7 % investment in AI and data science resources would run you about $8.1M. If you win the division you’ll make $16.2M in October.
Part Three: Your free baseball data lake:
One free baseball data lake is Baseball-Reference.com/leagues/. We’ll leverage data from organized leagues so will “only” go back to 1876 (1871 for National Association fans). The Wild Card is relatively new, but no matter since the data still is the same for teams like yours.
Without making this LinkedIn article a 75-page baseball abstract (which I would love to do, but I also appreciate your time is valuable), we’ll leverage a few team and player stats out of the trillions upon trillions of algorithmic data points available to test out our pizza AI and data science hiring hypothesis.
Part Four: Mapping the metrics:
- (Company) How much additional value would our AI solution and/or data science hires bring above our current staff to achieve our market growth goals? (Baseball) WAR (wins above replacement).
- (Company) Sales growth due directly to AI decisions at point-of-sale and market share growth through strategic business plans from data science insights? (Baseball) OPS (on-base-plus-slugging).
- (Company) What is our customer satisfaction interaction with company employees and how does that translate to our social media reputation? WHIP (walks and hits per innings pitched) * (1+average change in home stadium attendance).
- (Company) If we overtake our top competitor for market share, the 7% total company investment in AI and data scientists will be paid back two-fold. (Baseball) run differential.
Part Five: Pennant race. Simulation results:
Here’s the example summary of the results and one way to extrapolate it:
- The company found the quality of its own data was only 70% as clean as the baseball data. This was caused primarily because (1) human error in pizza sales data entres; (2) the pizza staff doesn’t view all the questions the same way. Too much is open for interpretation.
- ·When the pizza company grows 5% the quality of its data degrades 10% at present staff and without further training. This means that 15% of growth lowers the overall pizza company’s data integrity to 50% -- aka a coin flip.
- Learning: Automate more data entry and algorithmic logic. Train staff on more context of the data -- when & why they need to enter it.
- As for the simulation results, modeling the company : baseball metrics above, we extrapolated the following results:
- Learning: You have a 91% probability that making the investments will not decrease your current company sales performance: This may seem obvious, but business and baseball have plenty of examples where adding new “strong players” actually degrades the current team performance for multiple reasons including chemistry, new inefficiencies created or other factors.
- Learning: You have an 81% chance that your pizza company will overtake your competitor to claim top market share by adding the AI and data science resources at a 7% investment: This translates into the baseball equivalent of a four-in-five chance the extra “players” at your available budget will empower your team to win the 2019 American League Central Division. Per your “scouts” you’ll double the return on your investment.
Part Six: The big finish:
?This free model alone isn’t enough for your the Silicon Valley gluten-free pizza company to make massive real-world strategic decisions, but look at all the baseline/comparative data your management now has to work with. When your company does meet with AI vendors and data science candidates, you can combat their six-syllabic thesaurus with your own personal Moneyball.
###
Justin Lacche is President (Emeritus) of an affiliate in the San Francisco Giants organization.