Enterprise Grade Chat Bot: Natural Language Processing - Design, Plan and Model #DataScience #NLP

Aaron Butler

Deep Tech Leadership | Data Science & ML Engineering | Data Strategy | Cloud Data Transformation

发布日期: 2018年4月7日

The following is a professional project plan I created. I am open to any feedback and or criticism.

Due to the extensive amount of work involved making the sub-steps into Data Science products used by your departments will increase ROI in the early stages and get the much needed input from the relevant subject matter experts in Sales and Marketing.

Step 1- Design the corpus

Understanding Bag-of-words; a common misunderstanding is that your text analytics is text: what does this mean? well it's not text, all text "words" are transformed into unique numbers like a lookup dictionary for example 1 = aardvark, 2 = aaron and so on. This is due to algorithms being naturally better at processing and seeing the patterns in numbers. A specific example is when for a given word that can have multiple meanings is considered:

Synsets and Hypernyms, Lemmas and Synonyms fun facts:

“A Synset is one or more sets of synonyms where as a hypernym is a word or phrase that is included within another words semantic field”, where as a Lemma is the word that stands as the definition, as to a lexeme is a group of words that have the same meaning"

There is a clear distinction that the same word is specifically given to two different things.

Sentences that relate to the sport cricket and contain "Bat" will have words that have synsets relative to cricket such as bowling, stadium, sport etc. Where as the mammal "Bat" in typical terms will contain words used with relevance to the activities, beings and things of an animal.

It is the density of the language used within a given window that will provide context to the given topic, now to explain; where there is a need to calculate the relationship between words based on their location here is a quick example where words are more useful as vectors

Shown above is the calculated difference between two sentences after converting to vectors. Pretty cool right!

Step 2 - Design the NLP Data Model for the Data Science work.

This model is used to apply relevant data required for the algorithms that is derived from the translated voice files. The model is based on call centre recorded calls and pulls together the relevant metadata to be attached to the record. Example:

Step 3 - Fastest Path to Answer - metrics and business value

This step is about establishing an understanding of succinct delivery and conversations held by efficient representatives and those of wafflers. There is no quality control in just throwing an algorithm at conversation data and this step is a luxury for those that have "lots" of data. The actual methodologies are not covered here but the essence is we are extracting metrics of the conversations in terms of "problems a,b,c,d are all solved by these 6000 sentences and they all have this in common". Out of those examples we can manipulate the training data for the best 5000 as 1000 candidate data observations are considered low quality so we omit.

Step 3 - Communication and Rebuttal

This step works well in unison with step 2. Additionally this part can be used for extracting metrics of call centre quality and target objectives. An example could be that the encoder offers a product or service, we know when this happens and we can capture the decoders response 12354,13245,654 ("that sounds fantastic") across thousands of calls we now know more about our customers than ever before.

Step 4 - Sentiment Analysis (everyone's favourite)

As explained in the slide, we have the ability to track the sentiment of the conversation, this allows us to know if the sentiment is good/bad or if it changes. Simple metrics that can give a wealth of understanding can be extracted such as how many customers exit a call with a positive sentiment. How many customers start with negative sentiment and leave positive. Powerful insights can now be delivered from sentiment calculation data.

Step 5 - tracking commitments

Have you ever been promised something and it wasn't delivered? lets face it we all have. This Data Science product is directed at reducing churn and monitor promises made by both humans and machines. - VERY IMPORTANT.

Step 6 - Chat Bot, we finally made it!

Now that all of the statistics are done, or more or less not covered in terms of probability density functions, we can start to apply nn's and build the chat bot.

The chat bot is reliant on all previous steps.

Data Life Cycle - Final and most important step

The data life cycle is based on all the previous steps. It is based on the business logic that has been placed within the recordings and the metrics. This represents a continuous flow of data that is either purged or retained at the "sub_products" level. This is actually the most genius part of the whole process being the data pipeline of automation where in this case the data model feeds the data science model increasing accuracy over time.

Ranjith Pullagurla

Analyst @ The APP Group | PowerBI | SQL | Python | Helping businesses make insightful decisions

5 年

Great work Butler, But don't you think adding deep learning capabilities to this project would make it more efficient and increase the project capabilities for a larger database?

查看更多评论

要查看或添加评论，请登录

Aaron Butler的更多文章

Make a Neural Network in Excel, AI for Business People

2019年10月30日

Make a Neural Network in Excel, AI for Business People

One of the worlds most used number crunching machines needs no introduction. So here is an elegant way for learning the…

1 条评论
Solve Problem vs Find Answer.

2018年1月19日

Solve Problem vs Find Answer.

Thanks for the feedback, I brought back the pickles and put some garbage collection in so you can run this script…

2 条评论
Solve a problem don't just give an answer - (updated to python3) #DataScience

2018年1月10日

Solve a problem don't just give an answer - (updated to python3) #DataScience

"Many persons who are not conversant with mathematical studies, imagine that because the business of the engine is to…
Machine Learning. Get excited!

2017年4月18日

Machine Learning. Get excited!

For a soft intro lets talk about regression analysis the un-sexy grandfather of predictive analytics: regression…
What I found building robots at home.

2017年3月12日

What I found building robots at home.

For a very long time software and hardware has been delivered in a box that you cannot open as a consumer. The…
AI Computer Vision - Supermarket instock/outofstock detection.

2016年8月13日

AI Computer Vision - Supermarket instock/outofstock detection.

The following is the progress that the shelftastic AI CV machine learning software has had in terms of solid results in…
Datalogic "Joya Touch" - Shelftastic

2016年8月11日

Datalogic "Joya Touch" - Shelftastic

Just wanted to put in an update for the FMCG forecasting technology "Shelftastic" that eliminates out of stocks from…
Machine learning for all could be the best thing yet.

2016年4月26日

Machine learning for all could be the best thing yet.

Considering the way tech moves so fast, we have seen complex models and coding situations become very simplified and…

See all articles

Enterprise Grade Chat Bot: Natural Language Processing - Design, Plan and Model #DataScience #NLP

Aaron Butler

Deep Tech Leadership | Data Science & ML Engineering | Data Strategy | Cloud Data Transformation

Aaron Butler的更多文章

社区洞察

其他会员也浏览了

#NAInsights: Unlocking the Power of Words in NLP

Can We Really Hand-Engineer Level 2+ AGI?

Learning Vectoral Representation Of Words

With document democracy, we go beyond traditional Topic Modeling

Explain LLM and RAG like I'm 5

Creating an Intelligent Website Search with Vectors and AI

"Attention is all you need" - Transformer Architecture and LLMs

Optimizing Named Entity Recognition (NER) with BERT: A Case Study

Building a world-class fingerprint sensor using data insights (part 7)

Artificial Intelligence: Is technological singularity coming?

Aaron Butler的更多文章

Make a Neural Network in Excel, AI for Business People

Solve Problem vs Find Answer.

Solve a problem don't just give an answer - (updated to python3) #DataScience

Machine Learning. Get excited!

What I found building robots at home.

AI Computer Vision - Supermarket instock/outofstock detection.

Datalogic "Joya Touch" - Shelftastic

Machine learning for all could be the best thing yet.

社区洞察

其他会员也浏览了

#NAInsights: Unlocking the Power of Words in NLP

Can We Really Hand-Engineer Level 2+ AGI?

Learning Vectoral Representation Of Words

With document democracy, we go beyond traditional Topic Modeling

Explain LLM and RAG like I'm 5

Creating an Intelligent Website Search with Vectors and AI

"Attention is all you need" - Transformer Architecture and LLMs

Optimizing Named Entity Recognition (NER) with BERT: A Case Study

Building a world-class fingerprint sensor using data insights (part 7)

Artificial Intelligence: Is technological singularity coming?