White Paper Blog: An automated Consumer Insights Engine For Brands

White Paper Blog: An automated Consumer Insights Engine For Brands

With the evolution of social and digital media in the last decade, the medium of communication and sharing information among brands & consumers has changed drastically. This realm is an amalgam of conversations, opinions & feedbacks from the brand’s consumers. If exploited perfectly, the data and information could offer brands insights that are proven to get exponentially higher ROIs, reduced research time & increased capital efficiency.

With rapidly evolving consumer preferences, brands have now realized that they need to provide versatile products, services, colors and fashions to stay on trend & appeal to a changing and sensitive customer base. Thus, there is a need of an automated system that aims to deliver actionable insights centric around customers. The system must analyze the data in real time, perform in-depth analysis using machine learning and natural language processing and generate consumer insights. Not only the insights should be backed by data analysis but also should be validated with ongoing trends and must align toward industry knowledge.

Consumer insights provide a deep grasp of a brand’s audience (customers, fans or consumers), that originate from a thoroughgoing analysis of consumer data. The data includes their buying behaviour, queries, sentiments, verbatim and social affinities.

In this article, I will discuss a similar system that we have developed at Prophesee using the data of more than 20K brands and their audience that we track. The Image below describes the overall architecture of automated engine and its components.

The complete consumer insights engine is an end to end automated pipeline with several components. The first component (A) is the data extraction module, composed of different data connectors for different data sources - twitter, facebook, instagram, youtube, websites, news, feeds and offline data etc. The module consists of several REST and streaming APIs and web mining packages. The module runs actively on cloud servers, streams the data in real time and stores it in the databases. Next component is the data preprocessing layer (B) where all of the extracted data is cleaned and structurized. The cleaning involves text cleaning, quantitative cleaning and miscellaneous noise handling. In this component, there is another layer of data deduplication that ensures the uniqueness of data points, irrespective of the data sources. Component (C) is the data integration layer where user conversations data is linked with user meta data - demographics, interests etc. The complete data at this step is standardized, formatted and pushed to elasticsearch.

Component (D) and (E) are the most important components of the complete consumer insights engine. All the machine learning, natural language processing, and information retrieval modules run here. Firstly, different industry-wide classifiers are trained on different types of data. Since the majority of the data is text, convolutional neural networks are used in majority along with others ( xg-boost and linear SVM). This component is further split into several ML components - data cleansing, feature engineering, feature selection, training and tuning - all packed in the form of a sequential pipeline. The classification includes - noise classification, sentiment classification, and industry wise theme categorization. All of this tagged data is then consumed by the natural language processing engine, where concepts and entities are identified. The NLP module consists of dependency grammar, part of speech tagging, regular expressions, topic modeling, and n-gram analysis. For Example -

Conversation: “I really like the blue color of this camera, can you please share the price”

  • Themes: { “theme1” : “Color” }, { “theme2” : “product” }, { “theme3” : “prices” }
  • Sentiment: { “theme1” : “positive” }, { “theme2” : “neutral” }, { “theme3” : “neutral” }
  • Conversation Type: { “isNoise” : “No” } , { “type1” : “question”} , { “type2” : “opinion”}
  • Entities: { “entitiy1” : “blue color”, “entity2” : “camera”, “entity3” : “price”}

This data is used by the insights generation layer (component F, G, and H) which tries to make sense of data and provides fruitful results. This layer consists of four sub modules - Correlation engine, Forecasting engine, Real-time trends and data aggregation engine.

The correlation engine tries to find out if there are any significant correlations that exist among the data points. Example - consumer queries about products increases by 15 % on weekends and get jumps by 9% when brand posts about a new feature.

Forecasting engine is used to predict the consumer behaviors such as likely increase in consumer queries before holiday season will be about "places to visit" in Travel industry, "deals and discounts" in E-Commerce industry etc.

Real-time trends engine links the information coming out from analysis and the news/events/buzz happening in real time and aggregation engine is used to figure out which entities, concepts themes tops the conversations. All the statistical central tendencies are calculated by the aggregation engine. Then finally pushed to dashboards using rest APIs and quick auto-generated reports.

An example of some insights for different industries -

Travel: “Popular Cities”, "Natural Beauty" and "Family Vacations" are the most talked about travel themes among the consumers with 39.8%, 36%, and 16% mindshare. Also associated with the highest negative sentiment. Top consumer queries - "places to visit during night", "hotel offers for families" etc.

Food Ordering: Payment Gateways garnered highest negative sentiment particularly for the mobile app. "payment failure", "no returns", "payment options", "incomplete payments" are the top mentions. Conversations are 18% higher during holidays, 65% of them about offers.

Makeup Brands: "red" (1,078 mentions), "pink" (450) and "black" (306) are most talked about colors by consumers. "Red" is mostly used with keywords - "love", "favorite", "dream", "best" leading to highest positive sentiment. Top consumer queries - "is the red color available in Pune stores", "any offer for new users", "is this a lip balm or a lipstick"

Retail: Gifting appeared as one of the top theme, birthday gifts, diwali gifts, gifts for brothers and dads are the top consumer conversations. Perfumes are the top gift choices among the consumers followed by Jeans and Tops. Top consumer queries - "can i order online", "alternate colors available".

The consumer insights engine is scalable, robust and can be extended to any industry. It can do much more than what is discussed here, We also analyzed the data of last 6 months for several brands and different industries, check out the white paper here. Feel free to share the feedbacks and thoughts in the comments or drop me a line via LinkedIn Inbox or email.

Joel Binn

CEO, Original Digital Corporation

8 年

There are predictive engines that use preference models rather than behavior. For example Truechoicesolutions uses econometric models to determine consumer action.

Venkat Raman Chandrasekaran

Customer Experience, Corporate branding and SEO

8 年

Appreciate your expertise in this Shivam

回复
Vincent Vukovic

Cyber Security | AI & Tech Adoption

8 年

Hi and thank you for the post! Something I'm wondering is how you identify spam/irrelevant data? I worked on global comms for a brand that had millions of mentions online. A portion of this was coming from bots/spam.

回复

要查看或添加评论,请登录

Shivam Bansal的更多文章

社区洞察

其他会员也浏览了