Codification of Emotions
@ravinps ; https://www.facebook.com/StoriesByRavi ; https://mirror.xyz/ravinps.eth/tIgPlwhidJmHpz5dj9UWl70h1-tD-RvgIeLNX4vnR5s

Codification of Emotions

Abstract

Every customer communication on the internet triggers opportunity, for customer acquisition. More than ? of the world exists on the internet and more than 4.5 billion people [1] are actively communicating on the internet via different mediums like social media. The Internet is becoming a place of virtual databases of customer-conversations across industries. It is eminent for organizations to establish a customer acquisition methodology on top of such internet data.

Most organizations believe in legacy thinking, CRM tools and traditional methodologies for customer acquisition. Globally 60% CRM [2] implementations have failed to generate return on investment, even though more than $50 billion yearly [2] is invested by organizations.

Hence loyalty & CRM must be re-established in this digital age by inversion of control from derivation to definition. Organizations must rethink the impotency of CRM driven defined campaigns for defined market segments and start channelizing AI over internet conversations to derive new customer opportunities. For e.g. new buyers for luxury cars can be easily derived from the customer enquiring about expensive apartments. Sectoral mutual funds can be sold to new customers who are showing positive sentiments in banking stocks. Brooks Brother regular buyers can be discovered for buying BMW cars. Aggregation of all such conversations, builds up an overall dataset of customer intents and emotions about their consumables and aspirations. All this information is processed to discover hidden opportunities. Such opportunities not only exist in discovering new customers but also in product definitions, innovative services and insights about new trends of customers. This paper showcases the Intelli-Discovery Pipeline framework to generate business opportunities from the Internet.

According to the paper, customer behavior can be modelled over mathematical depiction of multidomain communications, to establish a unified opportunity profile. New customers for specific products and services can be discovered by analyzing their emotions towards certain features and generic likeliness. At the same time, new trends of wishlist can be gathered to identify opportunities in future products and services.

Overview

Most customers don’t like to give feedback on designated channel, which are owned or regulated, like company websites, govt website, company owned websites etc. The real communication happens on open/free & personalised channels. Those are the real and truthful communication channels which depicts actual emotions of customers.

Internet is a large cloud of data with customer conversations across domains like banking, travel, health, e-commerce, etc. In general, customer puts forward his/her communication in the form of a complaint, feedback and resolutions on the internet, (not limited to only these). Approach to discover opportunities is defined in the following diagram.

No alt text provided for this image


Figure 1: Concept Overview

Emotions are embedded in all such communications. These emotions provide a basis for business opportunity. These emotions are a real goldmine for capitalism i.e. businesses to know their customers and acquire new ones, so these are the introspection lenses for socialism or govt services to understand the hardcore issues of their citizens. In part 1 of this paper we are targeting capitalism.

As every new communication can generate new opportunities, the Discovery pipeline processes every conversation in five stages. Discovery pipeline generates a set of opportunities in real time and can be used for multiple use cases like customer acquisition.

Discovery pipeline framework is explained in the following sections.

Discovery Pipeline Framework

Following section describes opportunity-framework which exists in the form of staged workflow. The workflow has five stages- Listen/Store, Detect features, Segment customers, Analyze customers and discover opportunities.

No alt text provided for this image

Figure 2: Intelli-Discovery Pipeline Framework

New communication on the internet triggers 5 stages of workflows. In the first stage, all the customer communication is listened to, acquired and stored in the form of customer conversation data. In the second stage, sentiments and emotions are detected into Feature/ Emotion data. At the third stage, customers are grouped into logical segments. At the fourth stage all historical communications are retrieved for customer analysis, by detecting different intents of customers. At the end in the 5th stage, a list of opportunities with their scores are generated. Scores for opportunity depict overall intensity of emotions associated with the aggregated customer’s communication.

Every stage is described below in detail.

Stage 1 - Listen/Store

All the customer communication is listened to, acquired and stored. For each communication, metadata is grabbed, which may include the place of communication, communicator ID, timestamp, etc. All this information is stored in big data infrastructure using an inverted index so that keyword search is possible.

DataStream is defined as an abstract stream of all the customer messages m1,m2 from the internet. Examples of these messages are blog entry, fb post, tweet, retweet, LinkedIn reply, Instagram likes … (any kind of communication on the internet).

DataStream = {m1, m2, m3 …}        

Every internet communication persists as a communication vector for every message. F(m) signifies the communication vector for message m. It contains the message-text, customer, time-stamp, hashtags and receiving party. Every social platform stores a lot more information for every communication. In the context of this paper, only 5 attributes are considered, which exist across all communication frameworks on the Internet.

F(m) = {txt, C=Customer, time, hashtags, receiver} 
--> Communication vector

F(m) vector in the dataflow contains text, customer id, timestamp, relevant hashtags (if exists) and information about the receiver (if exist).        

For every message from a data-stream, customer vector CD(i) is initialized. For every message, a unique customer vector is maintained. At every subsequent stage, new datasets are added into the customer vector. At this stage, the communication vector pertaining to the message is added into the customer vector CD(i).

CD(i) = {mi, F(mi)} 
--> Customer vector        

Stage 2 - Detect Features

For each communication, sentiments and emotions are detected using machine learning as it is well known that most of the machine learning approaches are based on the “bag of words” analysis. To induct NLU (Natural Language Understanding) for superior confidence level, deep learning is used for each feature. NLU is set techniques to analyze natural communications for understanding their intents and meanings beyond understanding words and their meaning from the perspective of a communicator. Typically, this platform is detecting compliment, complaint, anger, threat, emergency as soft emotions and sentiments (positive, negative, neutral) as hard emotions. Platforms form a correlation across these two dimensions to isolate suspicious samples. As, new extra dimensions are added, it will improve the accuracy of the system with much finer controls on such suspicious samples.

Every feature detection level is attached to the confidence level, which indicates accuracy. Due to the fuzziness of the language, confidence-level keeps on varying for the detection of every parameter for every text.

Language fuzziness works where classical 'yes' or 'no' logic ends - where contradictions begin. All linguistic terms may contain their opposites and thus refuse any singular grasp of their meaning in one section of communication or sentence or paragraph. Human potentiality understands and manages fuzziness in language during communication. Humans are programmed to learn how to reduce/enlarge it, reinvent/re-shape it, analyze/synthesize as per expectations to understand it better or to make clear viewpoints more meaningful for us. Customer communication on the Internet tends to be a natural way of expressing his/her emotions. Such common concerns of life, neither have any ultimate precisely defined answers nor have unique scientific solutions. To understand such contexts from the internet, broader analysis is required from multiple viewpoints.

To understand fuzziness, take an example, a person says that “I like the service of your hotel just due to the food, nothing else". Another person says to the same hotel that “I disliked the service of your hotel only on the first day of my stay”. General polarity of word “Like” is towards a compliment but the first sentence is a complaint and polarity of dislike is towards the complaint but still the second statement is a compliment. Such fuzziness of emotions will create ambiguity when analyzed by ML.

In all feature detections, emotional polarities are mapped on each word, thus overall polarity of the text is a vector product of all such polarities.

Our two-dimensional approach avoids the ambiguous outcomes introspecting the sentiment of the sentence, where the sentiment of the first sentence is positive and the second is negative. With introspection, the first sentence is moved into the suspicious bucket.

The function emotion detector detects the set of emotions for communication vector F(m) and provides outcome in form of DE(m). DE(m) is an emotion vector over the message m which contains values for sentiment, emergency, threat, complaint and compliment. Meaning of each is described below.

Emergency?- Communication tagged as an Emergency, if it refers to a serious, unexpected, and often dangerous situation requiring immediate action.

Threat?- Communication is tagged as a Threat, if it shows intention to damage, or any other hostile action either on person or organization.

Compliment?- Communication is tagged as a Compliment, if it is showing praise and admiration to product, service, person or organization.

Complaint?- Communication is tagged as a Complaint, if it is showing dissatisfaction and unacceptance.

Sentiment?- View or opinion that is held or expressed in the communication. Views can be positive or negative. If no such view is expressed the sentiment is treated as neutral. Sentiment for each communication is tagged as negative, positive or neutral.

In the current context of this paper only 5 emotions are considered. In general, emotion vectors are not limited to only these values.

emotion_detector( F(m) ) = DE(m) 

DE(m) = {sentiment, emergency, threat, complaint, compliment}
--> Emotion vector        

Once emotion vector DE is detected, DE is inserted into the customer vector. So now customer vector contains message, communication vector and emotion vector.

CD(i) = {mi, F(mi), DE(mi)} 
--> New customer vector.        

Following diagram shows the 2-dimensional feature detection as discussed in the above text.

No alt text provided for this image

Note - Difficulty of detections are defined based on the detection success ratio from large samples of data.

Success-ratio = Total-True Positives / Total Samples

A true positive is an outcome where the model correctly predicts the emotions for the communication.

Low – If success ratio > 90%

Medium – If success ratio < 80%

High – If success ratio < 70%

Very High – If success ratio > 60%

Specific industry defines the brand attributes which are detected at this stage. Every industry will have a different set of attributes. To understand attributes, following tables showcases brand attributes of different industries.

No alt text provided for this image

For the subsequent section of this paper, brand attributes of the Travel domain are considered. All travel domain attributes are detected using ML. As shown in table, Service category attribute identifies exact service category, the message is related to. Travel class attribute identifies the class of travel like economy, business etc. … travel-location identifies locations he/she will travel in future. service-request indicates if the message is ‘request for service’ or just information.

travel_param_detector() function identifies travel related attributes from the communication vector and prepares the brand vector DB(“travel”,m) for message m. Indication “travel” is required in the DB so that multiple brand vectors are included for different domains.

travel_param_detector(F(m)) = DB(“travel”, m)        

Brand vector for travel domain DB(“travel,m) contains service category, travel class, service request, travel location.

DB(“travel”,m)={service-cat, travel-class, service-req,travel-loc}
-->Customer Brand vector        

New customer vector is prepared by adding the brand vector into the customer vector.

CD(i) = {mi, F(mi), DE(mi), DB(“travel”,mi) } 
--> New customer vector        

Stage 3 - Segment Customers

This platform applies non-supervised learning techniques, to segment all the customer-communications. Unsupervised learning are machine learning techniques, that look for undetected patterns in a data set with no pre-existing labels (in supervised learning human invention is required to label data elements). In the previous stage, different emotions are classified for each communication. All such information, forms the vector for the clustering. After segmentation, multiple dense clusters are received, along with, a few outlier clusters. The center of all these dense clusters are studied and only specific clusters are selected for detailed customer analysis. Manual involvement is required to select the right cluster to work on.

For example, the cluster which has samples with “customers with more complaints but non-threatful regarding competitors" is rightful for discovering opportunities.

Segmenter_travel is a specific function for travel domain to cluster segments of customers. Each segment Si will contain a set of customers SDij.

segementer_travel( {CD}n ) = { S1, S2, S3, S4 …} 
--> Si is segment of similar customers

Si = { SDi1, SDi2, SDi3 … SDin } 
--> SDij is Jth customer vector from ith segment        

Stage 4 - Analyze Customers

Detailed customer analysis is performed, on all the communications of the existing rightful clusters. Unique customers are identified from these clusters and all the historical communications are extracted from the internet regarding them. Identity-correlation is required to extract data from multiple channels for example- If twitter communication exists in a rightful cluster then Twitter-Id and all his twitter communication are extracted. Facebook-Id is derived from twitter information and all Facebook communication is also extracted. Data from all other channels is extracted, by deriving the relevant ID. This is applicable to all the channels. Data-crawlers keep crawling on the internet, for all correlated identities across channels.

All this information is used to identify fine-grained customer behavior and properties. i.e. general interest, profession, purchasing power, opinions, preferred brands etc. This information is useful to identify the right actionable Intelligence for realization of the opportunity.

profile_detector() function works on the customer profile C and prepares profile vector DP(C). C signifies the customer profile data related to any of the communication vectors.

profile_detector(C) = DP(C) 
--> detect profile based on all historical data of customer C.        

In the current context of paper, profile_detector() identifies personal interest, buyingStrengh, favorite brands and opinions for any customers.

DP(C) = {Interest, BuyingStrengh, FavBrands, Opinions} 
--> Customer personal vector        

Customer vector CD(i) is modified with addition of the personal vector into

CD(i). CD(i) = {mi, F(mi), DE(mi), DB(“travel”,mi), DP(C), history(C) } 
--> New customer vector         

During the same stage, deep learning is used on CD(I) to detect the intent of the individuals. Algorithms will run on a complete customer vector which includes, message, emotions, personal data, brand data, and historical communication. Following customer intents will be detected from deep learning.

Customer Intent

Intent to Travel (DL)

  • Intent to travel by ABC airline
  • Intent to travel to New York
  • Intent to travel for vacations

Intent for change (DL)

  • Intent to change the xyz coffee
  • Intent to change the house location
  • Intent to change the college

Intent to Buy (DL)

  • Intent to buy a luxury car
  • Intent to buy a shirt
  • Intent to buy the BI tool

DL_intent_detector() works on the customer vector and identifies the intents. Each intent is signified by Wij. So every customer can have multiple intents generated from a single customer vector.

DL_intent_detector(CD(i)) = {Wi1, Wi2, Wi3 …} 
--> Wij is the specific intent of the customer vector.         

Intent_data_finder() works on the customer vector and identifies the data points, which emphasize for realization of the intent. Every intent may or may not have the data point DATAij for realization. For an example, if the customer's intent is vacation then the data point can be New York, London and all mentioned cities.

intent_data_finder(CD(i)) = {DATAi1, DATAi2, DATAi3} 
--> Set of all the data-items which are mapped on the specific nouns of intent.        

Stage 5 - Discover Opportunities

In this stage, Opportunities are identified for the organization, for customer acquisitions. 4th phase gives us a refined set of customer’s emotions and behaviors in one industry domain. Deep learning models are trained by using large sets of labeled data and neural network architectures, that learn features directly from the data, without the need of manual intervention. Major opportunities are, in the area of improving brand value, qualify management decisions at various levels and improve the bottom-line with new customers.

Opportunity_set(domain) is the master set which contains all the possible opportunities for customers in that domain.

Opportunity_set(domain) = {op1, op2, op3 … opn} 
--> Predefined set of opportunities for travel industry         

Following vector shows the opportunity vector for the travel domain.

Opportunity_set(“travel”) = {“acquire”, “new route”,“increase_flights” } 
--> Predefined set of opportunities for Travel.         

DL_Opportunity_detector(opi) is the specific function which identifies all the customer-intent vector WCDij where opportunity opi is applicable. For an example, if the opportunity “vacation” is looked for, then one can find out all the customers who are intended to go for vacations as outcome of this function.

DL_opportunity_detector(opi) = { WCDi1, WCDi2, WCDi3 .. WCDin} 
--> WCD are opportunity customers, where specific opportunity opi is qualified.         

Each WCDi vector comes with a score, which determines the probability of success.

WCDi = { CDj , score } 
--> For each opportunity customer, realization score will be associated.         

Intelli-Discovery Pipeline framework identifies a new set of potential customers. All action items are identified, to deal with the area of dissatisfactions. Platform also scores action items based on their behavioral patterns and customer feedback. Platform maintains a list of all historical aggregated reasons and their action points and their realizations. All this information/feedback is traced back as a training set to improve actionable intelligence over time.

Conclusion

Intelli-Discovery Pipeline framework provides the complete vectorized model, to establish the concept of customer acquisition from Internet data. The Intelli-discovery pipeline framework not only discovers new customers but also provides detailed data-points / properly-map to understand their alignment with product & services. Every product related decision can be monitored, justified and approached with a feedback path back to internet data, in the form of newly aligned emotions. In such scenarios, Intelli-Discovery Pipeline framework acts as a fine-grained mechanism to appraise management decisions with 100% traceable data points.

Organization’s brand value and reachability are naturally aligned to the customer satisfaction, discovered from the internet. Customer datasets on the Internet is the most reliable data source to dictate demand for the present and future of the industry, hence industries which will follow inversion of control from definition to derivation, will sustain high growth in the long run.

Glossary

No alt text provided for this image

References:

Papa CJ

Comedian ? Executive Coach ? Author ? Oxford MBA ? HBR Writer ? papacj.com ? WIT of the Week newsletter on LinkedIn, papacj.substack.com & papacj.medium.com ? I uplift others & help them be the best version of themselves

2 年

Please write your autobiography already!

Chintan Oza

Independent Director | Enabling Ecosystem for Startups & Investors in G20 & BRICS | Your Sherpa in Digital & beyond | Author | Adjunct Faculty | Advisory Board | Investor | Top Mentor Award at Nasdaq | TEDx Speaker

2 年

Time to monetize the IP ??

Srinivas Y.

Founder - Boutique Advisory Firm || Digital Venture Builder || Entrepreneur-in-Residence, INSEAD || Public Policy Advisor || Ex - EY, ACN, IBM & CG || "Responsible Consulting" || [Listening Intently. Advising Prudently]

2 年

Simply splendid my dear friend! I think you have struck gold in establishing and codifying emotions in computational models. This mixed with behavioral economic models and customer anthropology is key to redefining customer strategy and CLTV! Fabulous paper Ravinder (Ravi) Singh ?????

要查看或添加评论,请登录

社区洞察

其他会员也浏览了