登录查看更多内容

A Look at Text Analytics

Seth Redmore

发布日期: 2016年2月3日

Mining for Gold

The first step in text analytics is text mining―the process of determining and
collecting high-quality information from unstructured text. The very first step of text mining is information retrieval—that is, building a database to analyze. This database can be virtually any type of text, from a mass of Twitter posts to a collection of scientific papers, depending on the focus of the organization conducting the analysis.

In the case of my company, Lexalytics, once a database of text has been established, we engage a number of sophisticated text analytics systems that aim to answer three broad questions:

Who is talking and who/what are they discussing?
What are they saying?
How do they feel?

These three categories are roughly definable by our core features.

Named entity extraction
Themes
Categories
Intentions
Sentiment analysis
Summarization

Named Entity Extraction

Recognizing named entities means identifying named text figures: most often this means people, places, organizations, products, and brands, but Named Entity Extraction can be configured to whatever your organization requires. Names of trading stocks, specific abbreviations, even specific strains of a disease can be identified and tagged as an entity. In addition to specific named entities, Lexalytics identifies pattern-based entities such as street addresses, phone numbers and email addresses. Now that you’ve prepared the text, you can do things like extract the entities, and get the associated sentiment, themes, and summary (for that entity).

Themes

Contextual clues can be vital when dealing with words that have multiple meanings: the word crane, for instance, could refer to a machine used to lift heavy objects, a type of bird, or even a movement of someone’s neck. Lexalytics determines the context of entities through themes and facets, identifying the topics of discussion. Our context determination involves highly complex text mining techniques that will show you what consumers are saying and why they feel the way they do.

Themes are lexically important noun phrases. Think of them as the “buzz” from the document. They work really well when rolled up across many documents – so you can get a feel for what, exactly, are people saying. They are completely automatic. We can also tell you the themes that are lexically associated with an Entity, and not just the themes that are important inside a document.

Intentions

Intentions are "predictions of future behavior." A very simple example is "Hey, I dropped my camera, guess I need to buy a new one." That's a buy intent. We have four intent types out of the box: Buy, Sell, Recommend, and Quit. Using intentions will let you find new customers as well as prevent customer churn. Unlike any other text analytic system that provides intention extraction, we don't just tell you that there is an intention, we tell you who is the "intender," what is the object of their intention, and what is the intention itself. This lets customers act immediately on the information to jump on any opportunities to build business, as well as respond to problems without delay. In Salience Server, you can create your own intention types as well - say you want to configure something for a "desire" or a "vote" intention - you have complete control over the intentions. Intentions are an important part of our Industry packs, as the language for an intention varies widely from industry to industry. The word "return" is a "Buy" or "Recommend" intention in the hospitality space, but is a "Quit" intention in the consumer packaged goods space.

Sentiment Analysis

Speaking of feeling, our Sentiment Analysis feature will show you exactly how consumers feel about their subject of discussion. Our sentiment analysis is the most powerful, accurate, and reliable in the business: beyond telling you whether a given document of text is positive, negative, or neutral, we assign a specific score to show just how strong that sentiment is. What’s more, we attach sentiment scores to entities, themes, facets, in addition to showing a general document sentiment score. This multi-level analysis can be configured and optimized to match your individual needs.

Summarization

Summarization is meant for humans to get a quick grasp on a long document. “Long” could be a 200 page analyst report you’re reading on your laptop, or a missive from your boss that you’re trying to scan along with another 20 emails on your phone. Lexalytics has highly tunable summarization technology to give exactly the right results for your application. One of the most interesting features is the ability to give Entity Summaries – very useful if you’re trying to crank through a few hundred large research reports trying to understand just what they’re saying about the one company (of dozens) about which you need to learn. The summaries we provide are based on the words actually in the document. We give you the most important sentences. We can also give you the summaries that are relevant to an entity – great for dealing with 200 page analyst reports.

The Upshot

Text analytics is no mean feat. At Lexalytics we've spent over a decade refining our systems so that you, the busy professional, can sit back and let our products save you time, money, and headache. This is because we're quickly moving into a world where if you can't hear what your audience is saying you cannot adapt. If you can't adapt, your company will die. Text analytics is no longer a luxury, it's a necessity. Every major company on earth now deploys text analytics in some capacity, allowing them to capitalize on waves, avoid pitfalls and better serve their increasingly global customer base. Perhaps it's time for you to explore the text mining tools your business needs to unlock the insights hidden in unstructured text.

Damien Hutchens

Director of 'Rethink' | Helps Organisations Evolve | Breakthrough Solutions | All about People | Loves Data Too

8 年

Thanks Seth. Great breakdown on how text analytics work.

Koos Vanderwilt

Independent Researcher at n/a - between jobs - who wants me? I want to work!

8 年

Nice. Just as in politics everyone has slightly different views of things, so definitional articles are very useful. I have understood by Text Analytics classification, clustering, summarization, and retrieval. Allow me to shamelessly toot my own horn and announce my effort to combine statistical methods with the Stanford Parser. Working on TF IDF right now, but keep running into programming mistakes I made. https://www.academia.edu/18302162/Linguistics_aids_Kullback_Leibler_Divergence_and_Naive_Bayes_Document_Classification

Raymond Doctor

NLP applied to Indic scripts and languages

8 年

Great "summarisation" of the state of the art. Lucide and easy to read.

1 次回应

Vladimir S.

Deputy CEO for R&D at Mediascope

8 年

May be the best short review of Text Analytics fundumentals.

1 次回应

查看更多评论

要查看或添加评论，请登录

Seth Redmore的更多文章

Airport Series: Charlotte and Customer Complaints

2018年4月9日

Airport Series: Charlotte and Customer Complaints

More than 40 million people travel through North Carolina’s Charlotte Douglas International Airport each year, and it…
I used Semantria to analyze all of Atlanta International Airport's Facebook data. Here's what I found.

2018年1月11日

I used Semantria to analyze all of Atlanta International Airport's Facebook data. Here's what I found.

So, here’s the thing: few people are happy when they’re in airports. Whether it’s for business or pleasure, packing…
Automation Armageddon – Fact or Fiction?

2017年9月19日

Automation Armageddon – Fact or Fiction?

Automation is transforming our economy. Job losses over the coming decades may be as high as 47 percent, some analysts…
9 ways AI isn’t going to be like Hollywood

2017年9月12日

9 ways AI isn’t going to be like Hollywood

When Hollywood isn’t doing comic book franchises, it’s doing AI. Why? Because AI gives us a window into our own souls…
My Team is Hosting an Awesome Webinar!

2016年5月13日

My Team is Hosting an Awesome Webinar!

We're going to be running a no nonsense, straight to the point webinar on Thursday, May 19th at 1:00-1:30PM EDT. It's…
Low Level Text Analytics in 7 Minutes

2016年4月26日

Low Level Text Analytics in 7 Minutes

From entity extraction to document summary, text analytics is a combination of machine learning and natural language…

17 条评论
Tay, the Teen Chatbot and Redmore’s Razor

2016年4月13日

Tay, the Teen Chatbot and Redmore’s Razor

When Microsoft launched an “artificial intelligence” chatbot, or Tay, with the personality of a teenage girl, on a…

2 条评论
NLP Explained In Five Minutes

2016年3月30日

NLP Explained In Five Minutes

The Foundation As you might be able to tell by now, I'm interested in where data analytics and marketing intersect. But…
Understanding Sentiment Analysis in 5 Minutes

2016年3月29日

Understanding Sentiment Analysis in 5 Minutes

Basics Alright, for starters: Sentiment Analysis is the process of determining whether a piece of text is positive…

3 条评论
The Royal Bank of Scotland versus The Vikings

2016年2月26日

The Royal Bank of Scotland versus The Vikings

Every year the world spins closer to streams, allowing consumers everywhere to individually curate the media they come…

See all articles

A Look at Text Analytics

Seth Redmore

Mining for Gold

Named Entity Extraction

Themes

Categories

Intentions

Sentiment Analysis

Summarization

The Upshot

Seth Redmore的更多文章

社区洞察

其他会员也浏览了

How Mining Companies Can Leverage Smaller, Vertically-Oriented Generative AI Models to Gain an Edge

25 Stochastic Scheduling Has Arrived - Finally!

How Consumer Insights Teams Can Leverage Text Mining Techniques

DATA MINING – THE CORE OF A MODERN AUTOMOTIVE PLATFORM

AI & digital twins for mining explained

Data-Driven Powerhouse: How Process Mining and AI Revolutionize ITSM and SIAM ~ Anurag Fuloria

Generative AI in Digital Mining : Transforming the Industry with PwC’s Framework ????

SENSORE (S3N) leveraging advances in AI with some of the biggest names in mining.

Market Basket Analysis - Association Rule Mining, Apriori Algorithm

Redefining Competitive Intelligence Through Data Mining Strategies

Mining for Gold

Named Entity Extraction

Themes

Categories

Intentions

Sentiment Analysis

Summarization

The Upshot

Seth Redmore的更多文章

Airport Series: Charlotte and Customer Complaints

I used Semantria to analyze all of Atlanta International Airport's Facebook data. Here's what I found.

Automation Armageddon – Fact or Fiction?

9 ways AI isn’t going to be like Hollywood

My Team is Hosting an Awesome Webinar!

Low Level Text Analytics in 7 Minutes

Tay, the Teen Chatbot and Redmore’s Razor

NLP Explained In Five Minutes

Understanding Sentiment Analysis in 5 Minutes

The Royal Bank of Scotland versus The Vikings

社区洞察

其他会员也浏览了

How Mining Companies Can Leverage Smaller, Vertically-Oriented Generative AI Models to Gain an Edge

25 Stochastic Scheduling Has Arrived - Finally!

How Consumer Insights Teams Can Leverage Text Mining Techniques

DATA MINING – THE CORE OF A MODERN AUTOMOTIVE PLATFORM

AI & digital twins for mining explained

Data-Driven Powerhouse: How Process Mining and AI Revolutionize ITSM and SIAM ~ Anurag Fuloria

Generative AI in Digital Mining : Transforming the Industry with PwC’s Framework ????

SENSORE (S3N) leveraging advances in AI with some of the biggest names in mining.

Market Basket Analysis - Association Rule Mining, Apriori Algorithm

Redefining Competitive Intelligence Through Data Mining Strategies