the next #4: Data-centric AI - 3 Questions to Johannes H?tter, Co-Founder @ Kern AI.
Johannes H?tter, Co-Founder of Kern AI - a tool for data-centric AI

the next #4: Data-centric AI - 3 Questions to Johannes H?tter, Co-Founder @ Kern AI.

Welcome to the fourth edition of the next.?No video today, as I am on “workation” on the beautiful island of Crete (see picture below). Thank you for tuning in again ??

Es wurde kein Alt-Text für dieses Bild angegeben.

Edition 4 of the next is dedicated to the emerging topic of “data-centric AI”. Many companies are struggling with their data assets for various reasons: access, availability, quality, etc. In particular, data quality is a key factor for training AI models that are reliable and trustworthy.

Data is a key asset for any company. As highlighted in the 2021 State of AI Study from 麦肯锡 , ways of working with data are a key differentiator between AI leaders and laggards. Thereby, a large gap was identified in “Having scalable internal processes for labeling AI training data” (48% of high-performers vs. 22% of low-performers), which is also a key component in data-centric AI.

This finding aligns with what I observe in the market. While many companies are aiming to roll out their AI initiatives across the company, a common bottleneck is data quality and availability. I am confident that the movement of data-centric AI will cause a (necessary) shift in thinking about data as well as considering it an asset rather than just a “by-product”. Bottom line: Data matters in AI.

?? Data-centric AI: 3 Questions to ?? Johannes H?tter , Co-Founder at Kern AI

Es wurde kein Alt-Text für dieses Bild angegeben.

Recently, I had the opportunity to speak with ?? Johannes H?tter , Co-Founder at Kern AI .

Johannes and his team are developing (open source) tools for data-centric AI. Their current flagship product Kern AI refinery tackles data-centric NLP. Interesting fact: Johannes and his Co-Founder Henrik Wenck are currently raising a seed financing round for further developing Kern AI.

Me: "Johannes, what does data-centric AI mean to you?"

Johannes: “When building AI, you can improve the quality/reliability of a model either by choosing more complex algorithms or by building a better training database. Data-centric AI is all about the latter option, i.e. systematically engineering better datasets, ultimately building better models.”

Me: "What are you doing at Kern AI to support data-centric AI?"

Johannes: “We build (open-source) tools for data scientists to:

  • programmatically scale the transformation of raw data (e.g. emails) into training data, and
  • identify critical data slices in existing training data, which causes data scientists headaches during training - and send these slices to experts or crowd workers within one button click

In other words, with our tooling, data scientists can both build prototypes within an afternoon and also continuously improve core models to gain reliable predictions in a data-centric approach.”

Me: “What recommendation would you give teams that are getting started on data-centric AI?”

Johannes: “As with the general implementation of AI, it is a good idea to get started with a simple prototype. We usually see implementations of chatbots or sentiment predictions here. This can be implemented within a day or two. For those that are interested, we have great educational content that we share publicly.

Once that prototype is implemented, it is easy to see how data-centric AI is a continuous task. It is about treating training data as a software artifact, so you can now start looking deeper into your datasets. Where are potential mislabels? With new technologies - and tools such as our open-source refinery - it becomes easier to identify them. Improving these subsets of your training data yields the next percentages in the accuracy of your model.”

Me: "Johannes, thank you very much for your valuable input on this important topic!”

About Kern AI

Es wurde kein Alt-Text für dieses Bild angegeben.

Founded in November 2020 with offices in Eichwalde and Bonn, Kern AI consists of 9 full-time engineers working on tooling for data-centric AI. Before starting Kern AI, the founders built an AI consultancy, working on diverse projects such as weather predictions, database chatbots and e-commerce shopping cart predictions. Kern AI is a Venture Capital-backed startup.

???Link to Johannes profile

???Link to Kern AI

?? Study of the week - How data-centric AI bolsters deep learning for the small data masses

Here, I want to share an article from Datanami featuring deep learning legend Andrew Ng that illustrates the importance of mind-shifting from model-centric to data-centric AI. Andrew Ng is the founder and CEO of Landing AI, a 2022 Datanami Person to Watch and one important entrepreneur who actively pushes the importance of data-centric AI.

“We know that in consumer software companies, you may have a billion users [in] a giant data set. But when you go to other industries, the sizes are often much smaller,” Ng said during his? 英伟达 GTC session titled “The Data-centric AI Movement.” “From where I’m sitting, I think AI — machine learning, deep learning — has transformed the consumer software Internet. But in many other industries, I think it’s frankly not yet there.”

According to the paradigm of data-centric AI, one barrier to the widespread adoption of AI are small datasets. Many companies tend to have not enough data for training their models. This is why data-centric AI focuses on more qualitative labeled data sets instead of the pure quantity of data. Below is a chart from Ng that illustrates the problem.

Es wurde kein Alt-Text für dieses Bild angegeben.

Andrew Ng encourages viewers to spend more time with the following things:

  • Having a high-quality set of human-curated data: This makes the human aspect more important than ever. We have to train our staff, and they have to decide which data sets make sense or not.
  • Make data improvement a core part to iterate a machine learning system: In the old days we focused much on improving the models, but we have to switch and make sure that we improve the quality of the data we use - this works best through good labeling.
  • To get better data, rather than just more data: It is not more about quantity, rather than the quality of the data we use for our models.

???Link to full article

???Link data-centric resource hub

Opinion: The data-centric approach: bridging the gap of sector-wide AI adoption or just another marketing buzzword, by Benedikt Mueller , Data Engineer at statworx

Es wurde kein Alt-Text für dieses Bild angegeben.

What’s the issue?

Even though AI has recently taken a huge leap forward in language understanding and computer vision, the adoption of AI is unequally distributed between different sectors - with the manufacturing industry coming in last. For many use cases, lacking curated data from the domain in question is among the main pitfalls why even the newest AI models often fail to deliver any value.

What’s data-centrics take?

A data-centric AI approach promotes building AI systems based on high-quality data —ensuring that the data clearly conveys what the AI must learn. This is addressed by:

  • collecting and knowing your data meticulously,
  • monitoring and improving its quality continuously,
  • and only then thinking about the modeling phase.

Data tasks, however, are prone to be labor-intense which is why operationalizing the data process is another integral selling point of data-centric AI.

What’s the status quo?

While it is not yet assessed how a data-centric approach can potentially re-energize use cases in various industries, it has certainly helped to restore some recently lost awareness about the lesser glamorous part of AI: data curation.

?? Link to blog post

Es wurde kein Alt-Text für dieses Bild angegeben.

?? 3 LinkedIn Gems

  • What is Data-Centric AI? Here is a short explanation ???Link to post
  • Data-Centric AI: Decoding the Hype ???Link to post
  • Here is the difference between a model and a data-centric approach ???Link to post

?? About

My name is Sebastian, I am the founder and CEO of?statworx, one of the leading companies for data science, machine learning and AI in the German-speaking region. I am a board member of?AI Frankfurt Rhein-Main e.V.?and an active business angel for AI start-ups. In my spare time, I love to travel, cook and make music with my drum computer ??????

Get in touch:?If you want to connect, or you're looking for an exchange or in-depth discussion, feel free to contact me through LinkedIn. I am always there to help!

???Link to my profile

?? Johannes H?tter

Data-centric GenAI |?Co-Founder Kern AI

2 年

Thanks for the feature, Sebastian! :)

要查看或添加评论,请登录

社区洞察

其他会员也浏览了