How to measure the success of a Conversational AI project

How to measure the success of a Conversational AI project

Designing and developing a conversational interface requires a lot of time and effort, and usually an important investment. Consequently, it is essential to measure how it performs after it is released and when users start interacting with it.

Understanding which data are relevant, and how they can be read, is not easy. This is why AI Data Analysts are there for!

Moreover, the release marks the beginning of a new phase altogether, usually called ‘continuous improvement’, in which the bot’s skills to understand and answer will evolve.

But how does this happen? It is often believed that conversational solutions learn and evolve autonomously, while interacting with users, thanks to machine learning.

However, it is not that simple. At least not today, and not in business scenarios.

Behind a bot’s ability to learn, there are experienced professionals who analyze conversations, highlight problems and provide data that allow the team to understand what is not working and to make informed decisions on how to improve it.

Different metrics for different criteria

That said, how can we choose what to analyze?

First of all, we must understand what we want (and can) measure. To do it, we should distinguish the metrics into three areas:

  • tech performances, which refer to NLU, STT, API calls, etc…?
  • business goals' achievement, which might be reducing the number of tickets open to customer care, or generating leads, selling products, etc…
  • user behavior and satisfaction, which refers to how they interact with the virtual assistant and if they find what they’re looking for.

Each area has different success criteria and thus should focus on different KPIs.

Secondly, we should understand which metrics make sense for the specific type of solution we are dealing with.

For example, the conversational interface:

  • might be chat-only, voice-only or multimodal:
  • if it has a chat, it might allow interacting through buttons, carousels and other graphical elements or just through text, so we might want to measure how many users prefer one modality or the other (user behavior)
  • if it has a chat, it might have an NLU engine (while if it is voice-only, it always needs to have it); so if it has an NLU engine, its ability to understand users can be measured with the classic KPI used in AI: accuracy, precision, recall, error-rate and F-1 score (tech metrics)
  • if it is a voice interface, the word-error-rate might be measured, that is the ratio between the number of words correctly transcribed and the total number of words spoken (tech metric)
  • might be integrated with third-party systems, so we might want to measure if the API respond always and within a specific range of time (tech metric)
  • might be active on multiple channels, such as Facebook Messenger, Whatsapp, Telegram, websites, so we might want to measure the traffic distribution across channels (user behavior)
  • It might have human handover by design, so we might want to measure how many users end up requiring the help of a human (user behavior and business goal achievement)

Quantitative and qualitative metrics

Another distinction can be made between quantitative and qualitative metrics.

The former are certainly easier to extract, but might be misleading, if not properly interpreted; the latter require more effort, but might be more enlightening.

As always, however, preferring one over the other depends on what we want to measure and what we want to achieve with that measurement. Let’s make an example.

Quantitative metrics can be used, for example, to analyze users’ behavior; popular metrics in this regard are:

  • Number of total conversations
  • Average duration of conversations
  • Average number of interactions for each session
  • Number of unique users
  • Distributions of the users across channels
  • Sentiment for each session
  • Top intents
  • Task completed
  • Average time to complete a task
  • Abandoned conversations and steps of the flows in which users abandon the interaction more often
  • Human handover?

These indicators are certainly useful, but very often they do not provide an explanation to describe why a certain phenomenon occur.?

For example, why users prefer to chat via Whatsapp instead of via Facebook Messenger? Why users tend to abandon the interaction after a certain question? Why do user spend a certain amount of time in a conversation?

These questions might find an answer in a more precise (qualitative) analysis, for example conducting a user test or asking for an explicit feedback. Feedback can aim at measuring different things, for instance if users reached their goals, e.g. with a closed yes/no question, such as “Did you find what you were looking for?” or how satisfying the experience was, e.g. asking to rate the experience with numbers (1-5), emojis (smiling, neutral, sad face), words (positive, neutral, negative).

Analyze, improve, repeat

Summing up, measuring the success of a Conversational Interface after its release means defining the right KPIs, analyzing conversations, and listening to the users’ voice. All this work should hopefully lead to improve the bot’s ability to understand users and to provide helpful answers, but it’s not a stand-alone activity. On the contrary, monitoring and improving should be key throughout all the project lifecycle.

Robert Andrew Seijkens

Manager @ Fasteners and Springs & Export Sales Manager @ RAS

2 年

The ability to understand, anticipate and read between the lines for the mindset of another human when working with AI is possible by the open experience, having lived the moments and understood the situations in order the place these as input for AI evaluations, I admire the level Assist Digital handles this complex detail for our day to day customer success and efficient experience.

回复
fabiana taliercio

esperta di ... cosa te ne frega ?

2 年

Cambridge university have salesforce as AI

回复

要查看或添加评论,请登录

Assist Digital的更多文章

社区洞察

其他会员也浏览了