the next #4: Data-centric AI - 3 Questions to Johannes H?tter, Co-Founder @ Kern AI.
Sebastian Heinz
Founder & CEO @ statworx | Co-Founder & CEO @ AI Hub Frankfurt | Board Member @ AI Frankfurt e.V. | Investor | Advisor | Speaker
Welcome to the fourth edition of the next.?No video today, as I am on “workation” on the beautiful island of Crete (see picture below). Thank you for tuning in again ??
Edition 4 of the next is dedicated to the emerging topic of “data-centric AI”. Many companies are struggling with their data assets for various reasons: access, availability, quality, etc. In particular, data quality is a key factor for training AI models that are reliable and trustworthy.
Data is a key asset for any company. As highlighted in the 2021 State of AI Study from 麦肯锡 , ways of working with data are a key differentiator between AI leaders and laggards. Thereby, a large gap was identified in “Having scalable internal processes for labeling AI training data” (48% of high-performers vs. 22% of low-performers), which is also a key component in data-centric AI.
This finding aligns with what I observe in the market. While many companies are aiming to roll out their AI initiatives across the company, a common bottleneck is data quality and availability. I am confident that the movement of data-centric AI will cause a (necessary) shift in thinking about data as well as considering it an asset rather than just a “by-product”. Bottom line: Data matters in AI.
?? Data-centric AI: 3 Questions to ?? Johannes H?tter , Co-Founder at Kern AI
Recently, I had the opportunity to speak with ?? Johannes H?tter , Co-Founder at Kern AI .
Johannes and his team are developing (open source) tools for data-centric AI. Their current flagship product Kern AI refinery tackles data-centric NLP. Interesting fact: Johannes and his Co-Founder Henrik Wenck are currently raising a seed financing round for further developing Kern AI.
Me: "Johannes, what does data-centric AI mean to you?"
Johannes: “When building AI, you can improve the quality/reliability of a model either by choosing more complex algorithms or by building a better training database. Data-centric AI is all about the latter option, i.e. systematically engineering better datasets, ultimately building better models.”
Me: "What are you doing at Kern AI to support data-centric AI?"
Johannes: “We build (open-source) tools for data scientists to:
In other words, with our tooling, data scientists can both build prototypes within an afternoon and also continuously improve core models to gain reliable predictions in a data-centric approach.”
Me: “What recommendation would you give teams that are getting started on data-centric AI?”
Johannes: “As with the general implementation of AI, it is a good idea to get started with a simple prototype. We usually see implementations of chatbots or sentiment predictions here. This can be implemented within a day or two. For those that are interested, we have great educational content that we share publicly.
Once that prototype is implemented, it is easy to see how data-centric AI is a continuous task. It is about treating training data as a software artifact, so you can now start looking deeper into your datasets. Where are potential mislabels? With new technologies - and tools such as our open-source refinery - it becomes easier to identify them. Improving these subsets of your training data yields the next percentages in the accuracy of your model.”
Me: "Johannes, thank you very much for your valuable input on this important topic!”
About Kern AI
Founded in November 2020 with offices in Eichwalde and Bonn, Kern AI consists of 9 full-time engineers working on tooling for data-centric AI. Before starting Kern AI, the founders built an AI consultancy, working on diverse projects such as weather predictions, database chatbots and e-commerce shopping cart predictions. Kern AI is a Venture Capital-backed startup.
?? Study of the week - How data-centric AI bolsters deep learning for the small data masses
Here, I want to share an article from Datanami featuring deep learning legend Andrew Ng that illustrates the importance of mind-shifting from model-centric to data-centric AI. Andrew Ng is the founder and CEO of Landing AI, a 2022 Datanami Person to Watch and one important entrepreneur who actively pushes the importance of data-centric AI.
领英推荐
“We know that in consumer software companies, you may have a billion users [in] a giant data set. But when you go to other industries, the sizes are often much smaller,” Ng said during his? 英伟达 GTC session titled “The Data-centric AI Movement.” “From where I’m sitting, I think AI — machine learning, deep learning — has transformed the consumer software Internet. But in many other industries, I think it’s frankly not yet there.”
According to the paradigm of data-centric AI, one barrier to the widespread adoption of AI are small datasets. Many companies tend to have not enough data for training their models. This is why data-centric AI focuses on more qualitative labeled data sets instead of the pure quantity of data. Below is a chart from Ng that illustrates the problem.
Andrew Ng encourages viewers to spend more time with the following things:
Opinion: The data-centric approach: bridging the gap of sector-wide AI adoption or just another marketing buzzword, by Benedikt Mueller , Data Engineer at statworx
What’s the issue?
Even though AI has recently taken a huge leap forward in language understanding and computer vision, the adoption of AI is unequally distributed between different sectors - with the manufacturing industry coming in last. For many use cases, lacking curated data from the domain in question is among the main pitfalls why even the newest AI models often fail to deliver any value.
What’s data-centric’s take?
A data-centric AI approach promotes building AI systems based on high-quality data —ensuring that the data clearly conveys what the AI must learn. This is addressed by:
Data tasks, however, are prone to be labor-intense which is why operationalizing the data process is another integral selling point of data-centric AI.
What’s the status quo?
While it is not yet assessed how a data-centric approach can potentially re-energize use cases in various industries, it has certainly helped to restore some recently lost awareness about the lesser glamorous part of AI: data curation.
?? 3 LinkedIn Gems
?? About
My name is Sebastian, I am the founder and CEO of?statworx, one of the leading companies for data science, machine learning and AI in the German-speaking region. I am a board member of?AI Frankfurt Rhein-Main e.V.?and an active business angel for AI start-ups. In my spare time, I love to travel, cook and make music with my drum computer ??????
Get in touch:?If you want to connect, or you're looking for an exchange or in-depth discussion, feel free to contact me through LinkedIn. I am always there to help!
Data-centric GenAI |?Co-Founder Kern AI
2 年Thanks for the feature, Sebastian! :)