How to Kickstart an AI Venture Without Proprietary Data
Kartik Hosanagar
AI, Entrepreneurship, Digital Transformation, Mindfulness. Wharton professor. Cofounder Yodle, Jumpcut
AI startups have a chicken & egg problem. Here's how to solve it.
A few years ago, I learned about the billions of dollars banks lose to credit card fraud on an annual basis. Better detection or prediction of fraud would be incredibly valuable. And so I considered the possibility of convincing a bank to share their transactional data in the hope of building a better fraud detection algorithm. The catch, unsurprisingly, was that no major bank is willing to share such data. They feel they’re better off hiring a team of data scientists to work on the problem internally. My startup idea died a quick death.
Despite the tremendous innovation and entrepreneurial opportunities around AI, breaking into AI can be a daunting task for entrepreneurs as they face a chicken-and-egg problem before they even begin, something existing companies are less likely to contend with. I believe specific strategies can help entrepreneurs overcome this challenge and create successful AI-driven ventures.
What is the chicken-and-egg problem in AI entrepreneurship?
Today’s AI systems need to be trained on large datasets, which can pose a challenge for entrepreneurs. Established companies with a sizable customer base already have a stream of data from which they can train AI systems, build new products and enhance existing ones, generate additional data, and rinse and repeat (for example, Google Maps has over 1B monthly active users and over 20 Petabytes of data). But for entrepreneurs, the need for data poses a chicken-and-egg problem -- because their company hasn’t yet been built, they don’t have data, which means they can’t create an AI product as easily.
Additionally, data is not only necessary to get started with AI, it is actually key to AI performance. Research has shown that while algorithms matter, data matters more. Among modern machine learning methods, the differences in performance between various algorithms are relatively small when compared to the performance differences between the same algorithms with more or less data (Banko and Brill 2001).
There are several strategies that can help entrepreneurs navigate this chicken-and-egg problem and access the data they need to break into the AI space.
Entrepreneur with a passion for building great brands. Executive Fellow at Cambridge Central Asia Forum (University of Cambridge)
3 年Thanks for sharing. An excellent read.
Director Living Machine Institute | Startup Mentor & Advisor | Career Transition Mentor | Ex Tata Sons | Ex TCS
3 年Kartik Hosanagar excellent article which is very relevant even for large companies which are trying to create domain specific AI solutions .For eg an IT consulting firm may try creating new AI model to predict diseases using bio markers but would require large healthcare datasets which it doesn't own, collaboration with a hospital would be good way as you suggested.Similarly working with University students on projects also require companies to share relevant data enabling them to create AI models but is often difficult.
ML at Meta
3 年Really interesting! I would probably add using publicly available pre trained models with transfer learning to significantly reduce the need of new data