登录查看更多内容

Understanding Semantic Analysis (and why this title is totally meta)

Shannon Johlic

VP of Marketing | Strategic Advisor | Revenue Architect | Expert in Product, Demand, and Partner Marketing

发布日期: 2015年7月11日

The purpose of this article is to explain what semantic analysis is, what it means in the context of machine learning and data science, and why it’s important to marketers. But chances are, you knew some of that before you even read this sentence. “Semantic analysis” is right there in the title, and you know this publication targets marketers, not linguists. You might also have noticed that I work for a company that specializes in machine learning technology and that there’s some computer-y sounding headings a little farther down.

Relevance is both the goal and the unit of measure when it comes to Semantic Analysis.

You used the contextual clues surrounding the words and phrases on this page to better understand the implied or practical meaning of the content of this article. That’s semantic analysis (SA). As humans, we do this really efficiently and almost unconsciously. We filter all the context surrounding a word/phrase/object/scenario, pull out the relevant pieces, compare them against our past experiences, and use them to deepen our understanding of the content at hand.

Machines have historically sucked at this because they lacked that filter—that ability to determine what is relevant and why. Advances in Machine Intelligence and Natural Language Processing (NLP) have impacted deep semantic analysis heavily through advanced algorithms, powerful computers, and a lot of practice, machines are getting so much better at it.

Machine-driven semantic analysis has a number of real world applications. It helps:

extract relevant and useful information from large bodies of unstructured data
find an answer to a question without having to ask a human
discover the meaning of colloquial speech in online posts
uncover specific meanings of words used in foreign languages mixed with our own

Before we get into some practical examples of why that matters to you as a marketing professional, let’s take a brief look at the history of text analysis (aka text mining) in marketing.

In the beginning, there was textual analysis… and it was… not good.

In the early days of AdTech, people wrote programs that could scrape huge amounts of data and look for words and phrases that recurred frequently. (Remember word clouds?) The implication was that frequency was a signal of importance. Even if we overlook that erroneous assumption for a minute, there are still a few glaring gaps. First, someone has to look at those results and determine why that word is recurring more frequently and what it means to them. Of course, it’s very difficult to do that with words taken out of context, especially when words can have so many different meanings and connotations:

whip (Cool Whip, bullwhip, whip-smart, ghost ride the whip)
jaguar (more on that example below)
run, take, break, apple, crane, date, foil (the list goes on)

And then there was tagging…

Tagging was essentially an attempt to use a human’s nuanced understanding of content to create a system that a machine could propagate on a large scale. We choose some words (taken out of context!) that we hope will convey some meaning to a reader. The errors pile up fast—redundant tags, misspelled tags, inconsistently applied tags, over-tagging—and get multiplied by every person using the system. As systems began to improve, at least we saw people actually using search behavior to guide tag taxonomies, but we’re still only guessing at how an individual user will conceptualize or search for a piece of content.

(We are not saying that you shouldn’t tag your content. Tags are an important component of semantic understanding, and they serve other purposes too (see our post on Open Graph Tags). Just have an authoritative, data-driven taxonomy for your tags or at least a defined set of rules.)

Sentiment Analysis makes a splash

As social media and user-generated content took over the web, marketers got hungry to mine this massive data set for meaning, but discovered a new challenge: knowing if someone is talking about a given topic or brand is less important than knowing how they are feeling and talking about you. A number of social analytics platforms began offering “hot or cold” analyses of topics and brands. While this seems like a nuanced understanding of language, it is really just a layering of explicit understanding (e.g. if the word “sucks” appears alongside my brand, and I know that sucks = negative, then I can infer that what’s being said about my brand is negative). This is still the computer equivalent of rote learning, and we’re never going to get SkyNet to become sentient that way.

“Semantic analysis is not about teaching the machines, it’s about getting them to learn.”

Enter Semantic Analysis

Here’s where we have to do a bit of hand waving, because the science behind true SA is not something you can really elucidate in a 1000-word article. (If you would like to read 17,000 MORE words on Semantic Analysis and Natural Language Processing, this is a good piece.) Semantic analysis is not about teaching the machines, it’s about getting them to learn. From a data processing point of view, semantics are “tokens” that provide context to language. They provide clues not only to the meaning of words, but to their relationships with other words and other tokens. The goal, as it is for any good reader, is to look beyond the words on the page to see the meaning.

Successful SA requires that a program look at capital-m-massive data sets, and at that scale, it has to be making a lot of (correct) assumptions for itself. It’s about taking things that a computer can easily glean from data by looking at frequency, proximity (and many, many other factors) and using them to make meaningful cognitive leaps. For example, a computer can see patterns that tell it these things:

“dalmatian” and “dog” are semantically related.
“dalmatian” and “spotted” are more closely related than “dog” and “spotted.”
“dalmatian” is more frequently capitalized than other nouns.
“spotted” can mean “seen” or “dotted.”

To achieve the goal—true semantic understanding—the computer would have to make the connection that a Dalmatian is a spotted breed of dog.

Why is Semantic Analysis so Important to Deliver Relevant Content?

Why do we care if a computer knows that a Dalmatian is a spotted dog? If it knows that, then when it sees someone looking for “spotted dog,” it knows to connect them to content containing “Dalmatian Puppies.” (Settle down, Cruella… it’s easier said than done.) Now multiply that across millions of users and tens of millions of interactions, and you have a hint of where the value lies.

“If we can understand the content and the user behavior at a deep, semantic level, we can deliver more relevant content and thereby create a more resonant user experience.”

In order to make sure content is relevant to the user, you need two basic components: an understanding of the user and an understanding of the content. Fundamentally, the problem with establishing relationships between pieces of content is that most “scraping” or data capture technology simply doesn’t understand the language within a document very well. There MAY be very simplistic levels of machine learning involved, but they rely heavily on provided tags and a cursory understanding of the individual words on the page, which leaves a lot of room for improvement.

Let’s look at another example:

If you search for the term “jaguar,” you will return results for:

A luxury car
A large feline predator
A football team
An operating system
And others that might surprise you

The goal of SA is to pair you with the “jaguar” content you’re actually looking for, and it will take a two-pronged approach to achieve that goal:

Find contextual clues in your past or real-time behavior (Did your search include the word “sedan?” Did you search for “zoo” recently?).
Look at all the content at its disposal where “jaguar” or related words occur to determine whether that other content will be the best match for your search. (“Leopard also occurs frequently with “OS,” but not with “car.” “Panther” also occurs frequently with “Jaguar” and “NFL.”)

How many connections it can make and how well it can understand the relationships between those connections determines the relevance of your experience. And, ultimately, relevance is both the goal and the unit of measure when it comes to Semantic Analysis. If we can understand the content and the user behavior at a deep, semantic level, we can deliver more relevant content and thereby create a more resonant user experience.

This post originally appeared on the Boomtrain blog.

要查看或添加评论，请登录

Shannon Johlic的更多文章

Top Metrics Growth Marketers Need to Know

2017年2月1日

Top Metrics Growth Marketers Need to Know

NOTE: This article originally appears on the Blueshift blog and has been posted here with permission. As a follow up to…

2 条评论
Optimizing Your Open Rates: If It Doesn't Get Opened, Nothing Else Matters

2015年11月3日

Optimizing Your Open Rates: If It Doesn't Get Opened, Nothing Else Matters

You can’t get great performance out of an email if it doesn’t get opened. Below, I outline 4 levers you can pull to…

3 条评论
Machine Learning: The Newest Member of Your Marketing Team

2015年10月13日

Machine Learning: The Newest Member of Your Marketing Team

“Imagine what it would be like to have 1 million marketers at your disposal to understand and engage with every single…
Using Machine Learning to Take the Pain out of Email Marketing

2015年8月18日

Using Machine Learning to Take the Pain out of Email Marketing

"What if you could actually trust a machine to write (and continually rewrite) its own rules about what to send to whom…

3 条评论
Know Thy Customer: The Most Powerful Data is the Data You Already Own

2015年7月11日

Know Thy Customer: The Most Powerful Data is the Data You Already Own

No one knows your customer like you do. You have so much information that you're note effectively tapping into.
Reduce Churn with Targeted Resurrection Emails

2015年6月28日

Reduce Churn with Targeted Resurrection Emails

Please note: This post was originally written by Bob Colner, my colleague at Boomtrain, where he is our lead data…

2 条评论
Beyond Email Send Time...Focus on "who", not "when" with Delivery Time Optimization

2015年6月27日

Beyond Email Send Time...Focus on "who", not "when" with Delivery Time Optimization

Are you still wondering when’s the RIGHT time to send emails to your users? The days of batch sending are over. It’s…

See all articles

Understanding Semantic Analysis (and why this title is totally meta)

Shannon Johlic

VP of Marketing | Strategic Advisor | Revenue Architect | Expert in Product, Demand, and Partner Marketing

In the beginning, there was textual analysis… and it was… not good.

And then there was tagging…

Sentiment Analysis makes a splash

Enter Semantic Analysis

Why is Semantic Analysis so Important to Deliver Relevant Content?

Let’s look at another example:

Shannon Johlic的更多文章

社区洞察

其他会员也浏览了

A simple guide to AI search

Comparing AI search solutions in a crowded market landscape

Exploring RAG with LangChain

The Semantic Web Project Revitalized: From Vision to Reality with Reasoning and Inference

A guide to build contextual RAG systems with hybrid search and reranking

Retrieval Augmented Generation (RAG): The Ultimate Guide

Important question today is “Should you have Vector DB on-premise or not?”

Beyond Keywords: Redefining Discovery with Multimedia Semantic Search

Knowledge Graph Writers

How to Launch LLM Chatbot Powered by Enterprise Data on E2E Cloud

In the beginning, there was textual analysis… and it was… not good.

And then there was tagging…

Sentiment Analysis makes a splash

Enter Semantic Analysis

Why is Semantic Analysis so Important to Deliver Relevant Content?

Let’s look at another example:

Shannon Johlic的更多文章

Top Metrics Growth Marketers Need to Know

Optimizing Your Open Rates: If It Doesn't Get Opened, Nothing Else Matters

Machine Learning: The Newest Member of Your Marketing Team

Using Machine Learning to Take the Pain out of Email Marketing

Know Thy Customer: The Most Powerful Data is the Data You Already Own

Reduce Churn with Targeted Resurrection Emails

Beyond Email Send Time...Focus on "who", not "when" with Delivery Time Optimization

社区洞察

其他会员也浏览了

A simple guide to AI search

Comparing AI search solutions in a crowded market landscape

Exploring RAG with LangChain

The Semantic Web Project Revitalized: From Vision to Reality with Reasoning and Inference

A guide to build contextual RAG systems with hybrid search and reranking

Retrieval Augmented Generation (RAG): The Ultimate Guide

Important question today is “Should you have Vector DB on-premise or not?”

Beyond Keywords: Redefining Discovery with Multimedia Semantic Search

Knowledge Graph Writers

How to Launch LLM Chatbot Powered by Enterprise Data on E2E Cloud