Using Raw Data to Discover New Intents

Using Raw Data to Discover New Intents

[This blog post was first published on HumanFirst.ai by Gordon Glasgow]

Customers communicate with businesses on a daily basis, with a range of objectives, expressed in a number of ways. When does my package arrive? Can you refund my credit card? I love this product, is it still sold out? With the emergence of immediacy, live communication, and intolerance for bad customer support, companies are pressured to collect every single utterance conveyed to them and respond.

This type of data is aggregated from a variety of places across the web; e-mail, chat logs, surveys, feedback, call-center transcripts, etc, so it comes as no surprise that 80-90% of it is in unstructured form. Intimidated by the data volume and its disorderly nature, companies are unable to address the requests of their persistent customers. In agreement, a blog by Pure Storage said:

Clearly, the problem associated with unstructured data has never been its rarity. Rather, it’s been the lack of tools and technologies able to extract business value from this diverse and disordered digital resource. If anything, in fact, the daunting volumes of unstructured data have actually discouraged companies from even attempting to mine it for nuggets of useful information.

Once companies get over this fear (perhaps with the help of useful tools), they realize that their historical, raw data is the path towards building a better customer experience. It’s domain-specific, it’s real-world, and you have direct access to it. It’s like being given the answers to an exam before taking it! It’s the place to discover customers’ intents and address them.

Intent Discovery Starts from Raw Data

Intent discovery is the act of categorizing phrases by meaning, based on what the user wants to achieve; an important aspect of designing digital conversational experiences.

Starting your intent discovery with a bottom-up approach is one of the ways to deliver the experience your customer craves. It successfully applies the divide-and-conquer approach to the problem of transforming large datasets into increasingly structured data that can be successfully mapped to each organizations’ exact domain. Instead of expecting a human or unsupervised algorithm to correctly “predict” what intents exist in the data, it provides a simple framework to iteratively discover this information from your unstructured data.

Discovering Intents with HumanFirst

Tools like HumanFirst aren’t scared of large datasets—they apply this tried and tested approach to discover new intents from raw data.

Users are prompted to map out high-level intents found from their raw data with the aid of clustering algorithms, which automatically group together semantically similar utterances into categories.

Most teams are familiar with clustering, but the majority of existing methods are static by nature and lead to noisy results (i.e: they still need to be cleaned up a lot). That’s why HumanFirst created interactive clustering, which is the most efficient way to accelerate the discovery of topics and intents in your data. Here’s a rundown of Interactive Clustering:

1) Selection: Users can select a cluster, get a glance and edit the utterances within that cluster, and label it on the fly.

2) Re-clustering: Users can re-cluster utterances within a specific intent to discover sub-intents and tap into the long tail of data (hence, divide and conquer).

3) Granularity & Cluster size: To remove any noise, users can decide the size and level of precision of the clusters.

4) Accept/reject flows: All workflows around adding utterances/clusters to intents use active learning, so the more you accept and reject suggestions, the better the next set of suggestions will be.

The end result is a cascading hierarchy of intents and utterances, at the level of precision of your choice. This is a perfect example of how to leverage unstructured data to discover new intents.

No alt text provided for this image
Cascading hierarchy of intents and utterances?

Discovering the Long-Tail of Intents

Ideally, every utterance from your unstructured data should be associated with an intent. With HumanFirst, you can sort your unlabeled data by NLU Uncertainty to obtain a list of utterances that don't match well with existing intents. This is another way to uncover new intents from your raw data.

Having a coverage analysis is also a great way to see where you’re at in your intent discovery process. It gives you the ability to visualize labeled and unlabeled data in an intuitive, purpose-built way.

No alt text provided for this image


The majority of data today is an untapped resource. As mentioned in a previous blog post, there is an entire dimension to discovering new intents that can only happen when you are able to explore your unlabeled data, which is where HumanFirst has a definite edge. To learn more about HumanFirst, visit our blog.

Real-time analysis of user's voice looks particularly interesting!

Nike van Heeswijk

Conversational AI | Conversation Design | Linguistics | NLU/NLP | GenAI | Prompt Engineering | Freelance

2 年

Saving this to read later!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了