登录查看更多内容

Unlock Data Insights with Semantic Labeling

Travis Rehl

CTO & Head of Product // Pushing boundaries for SaaS and Startups // Cloud, Generative AI and more

发布日期: 2023年12月7日

Many businesses today are looking for ways to analyze user/human generated content on their platforms. Are conversations trending positively? Are there specific issues or topics occurring more often than not?

This 'natural language' content such as surveys, performance management notes, social media content run a into complicated issue - human made data

"How do I label human-content when the information they provide is inconsistent, poorly formed, or has jargon?"

Imagine your job is to read a piece of information, and label it multiple times depending on the subject, sentiment, outcomes, actions and more.

Here's an example: "I love your institution, everyone is always so nice to me. I wish there were better accommodations oh and can someone plz email me my latest account balance? thx!"

The above sentence is full of grammatical errors, short-hand, and multiple ways to measure sentiment.

Is this all Positive? Neutral-Positive? Hard to tell and also extremely subjective.

What you would normally do:

Traditionally, you would build a ML model by...

Identify a large amount of historical, pre-labeled data
Massage that data into a training set and test set
Train a model and test for accuracy (and retrain if accuracy does not meet expectations).

This is a time-consuming process, riddled with manual effort, and can take weeks/months to implement depending on the cleanliness of the data infront of you.

Instead, let's try a novel approach to labeling... Semantic Labeling!

Using the power of a Vector database ( Weaviate ) and Anthropic on Amazon Web Services (AWS) Bedrock, we were able to quickly deploy and solve for this use case with minimal training effort.

The core concept is this...

Can we use the semantic meaning of synthetically generated user content, to identify the highest likelihood label to be assigned to a new piece of user content?

Here's how it works.

The goal is to break apart complex human conversations into...

Manageable sentences
Thematically or topically grouped content
Using the same words and tone the user originally provided to maintain the authenticity of the request

领英推荐

Data Analytics with Generative AI: A Detailed Guide

Data Science Dojo 1 年前

The past, present, and future of semantic search

Algolia 1 年前

?? Agents for Time Series Analysis

Pascal Biese 6 个月前

First, let's prime the pump.

Take historical pieces of content (maybe a user comment in this scenario)
Use an LLM to break a part the content into thematically grouped sentences
Using these new sentences as samples -> generate semantically similar synthetic data
Store the synthetic data, pre-labeled in Weaviate

Now let's compare a new piece of content and find a label

Repeat this process for a new piece of content

Retrieve your new user content
Separate the thematically similar sentences via the LLM prompt
Do not generate synthetic variations

Perform a Semantic Label search!

Now with your new sentences, perform a looping semantic search over Weaviate .

For Each sentence...
Find the highest accuracy item in Weaviate based on your query
Retrieve the label from the associated synthetic sentence from Weaviate
Assign it to your new sentence and the original piece of content!

Pros and Cons of Semantic Labeling

When it comes to anything created by humans, the information can be subjective. People use different phrases and tone (like sarcasm) to create double-meaning. As a result you may need to...

Cons: You will have to review your historic data for abnormal data points that don't fit a consistent theme or tone (an LLM can help identify these!)
Cons: The prompt you write to generate sentences may need to be dynamic depending on where the content your analyzing originates from, if the users who create that content have their own community-language and lingo etc.
Pros: You can easily human-in-the-loop content labeled in Weaviate, by validating the output label and confirming if the semantic meaning meets label expectations.
Pros: Easily modifiable, is not expensive to train or implement
Pros: COST EFFECTIVE, embedding and searching content in Weaviate can be an optimal use of dollars without long running inference requirements.

You can use Vector databases in unique ways

Hopefully this article show cases a unique way to leverage vector databases more importantly show cases the different ways to implement a solution such as Weaviate on Amazon Web Services (AWS) .

Not every Vector DB article needs to be about Chatbot RAG :)

If you're interested in solutions like this

Reach out to me and Innovative Solutions !

Our GenAI System "Tailwinds" solves for problems like these and many others.

要查看或添加评论，请登录

Travis Rehl的更多文章

Using AI to save $80,000/year with 2 hours of work

2024年9月12日

Using AI to save $80,000/year with 2 hours of work

In today's fast-paced business environment, efficiency is key. At Innovative Solutions, we recently faced a challenge…

2 条评论
Turning GenAI POCs from Months to Minutes

2024年9月5日

Turning GenAI POCs from Months to Minutes

In the rapidly evolving landscape of Generative AI, technical leaders face a common challenge: “how do we quickly…

5 条评论
Empowering Personalized Technical Training with GenAI: INE’s Success Story

2024年6月3日

Empowering Personalized Technical Training with GenAI: INE’s Success Story

Read the full case study here In the fast-paced world of IT, staying ahead requires continuous innovation. This is the…

1 条评论
Building next-gen GenAI solutions on Amazon Bedrock

2024年3月15日

Building next-gen GenAI solutions on Amazon Bedrock

Avannis, a leading provider of customer engagement solutions for banks and credit unions, found itself facing a…

7 条评论
Building a Multi-Agent Orchestration Assistant

2023年11月22日

Building a Multi-Agent Orchestration Assistant

For those who are unfamiliar, Innovative Solutions has a long history of providing Managed Cloud Services to hundreds…

6 条评论
Personalizing Content @ Edge with GenAI

2023年10月18日

Personalizing Content @ Edge with GenAI

Today's consumers expect relevant tailored content the moment they engage with a brand. As personalization becomes…

1 条评论
Go beyond ETL with GenAI

2023年8月11日

Go beyond ETL with GenAI

ETL can be a pain in the %@#. Yet in the digital age, data is king.
What if: AWS DevOps powered by LLMs

2023年7月11日

What if: AWS DevOps powered by LLMs

Imagine you are driving home from work and you receive a Critical Alert! A system or environment is down and..

2 条评论
Forget low-code. AI does it better.

2023年7月3日

Forget low-code. AI does it better.

In this article you will learn how Generative AI can replace or augment low-code Enable non-technical users to live out…

4 条评论
Stop Babysitting ML Models

2023年6月30日

Stop Babysitting ML Models

Let's be honest. Labor is expensive, and business budgets are tightening.

2 条评论

See all articles

Unlock Data Insights with Semantic Labeling

Travis Rehl

CTO & Head of Product // Pushing boundaries for SaaS and Startups // Cloud, Generative AI and more

Instead, let's try a novel approach to labeling... Semantic Labeling!

领英推荐

Now let's compare a new piece of content and find a label

You can use Vector databases in unique ways

If you're interested in solutions like this

Travis Rehl的更多文章

社区洞察

其他会员也浏览了

Build RAG applications using only APIs with Postman! ??

Responsible LLMOps: Integrating Responsible AI practices into LLMOps

Fine-Tune Llama 3.1 with Your Data [No-Code] ??

First Major AI Law Approved: Industry News, Guides, & Handy Scraping Tools

Choosing the Right RAG Framework: LangChain or LlamaIndex?

Get Started With GraphRAG

Effective Data Chunking Strategies for the RAG

My Learnings from CS 242: Information Retrieval & Web Search

Bigdata.com API in action: which companies could thrive or struggle in a Trump re-election scenario

Gretel's Tabular LLM, Synthetic Data Accelerator, and much more

Instead, let's try a novel approach to labeling... Semantic Labeling!

领英推荐

Now let's compare a new piece of content and find a label

You can use Vector databases in unique ways

If you're interested in solutions like this

Travis Rehl的更多文章

Using AI to save $80,000/year with 2 hours of work

Turning GenAI POCs from Months to Minutes

Empowering Personalized Technical Training with GenAI: INE’s Success Story

Building next-gen GenAI solutions on Amazon Bedrock

Building a Multi-Agent Orchestration Assistant

Personalizing Content @ Edge with GenAI

Go beyond ETL with GenAI

What if: AWS DevOps powered by LLMs

Forget low-code. AI does it better.

Stop Babysitting ML Models

社区洞察

其他会员也浏览了

Build RAG applications using only APIs with Postman! ??

Responsible LLMOps: Integrating Responsible AI practices into LLMOps

Fine-Tune Llama 3.1 with Your Data [No-Code] ??

First Major AI Law Approved: Industry News, Guides, & Handy Scraping Tools

Choosing the Right RAG Framework: LangChain or LlamaIndex?

Get Started With GraphRAG

Effective Data Chunking Strategies for the RAG

My Learnings from CS 242: Information Retrieval & Web Search

Bigdata.com API in action: which companies could thrive or struggle in a Trump re-election scenario

Gretel's Tabular LLM, Synthetic Data Accelerator, and much more