How Google Sees Duplicate Content – The Canonical Conundrum
Typically, Google speaks, and we all jump. Only this time, when Google spoke, hardly anyone was listening, and now you might get caught out by your own content strategy.
Repeat after me; ‘kah-non-ik-al-ize-ay-shun’ – canonicalisation.
OK, so ‘canonicalisation’ is not a word that’s part of everyone’s daily vocabulary, but not knowing the word doesn’t diminish its importance. And, if you write or commission online content that’s published in more than one location, diligently inserting the canonical URL as you go, the phrase may be about to become ‘[expletive] canonical URL’.
Note: If you’re doing the above but haven’t been using a canonical URL in your duplicate content, you’re already in trouble. So much trouble.
What is a canonical URL?
Here’s the Cambridge Dictionary’s definition: “Canonical. Considered to be among the best and most important, and worth studying”
A canonical URL is a single line of code that’s added to the second and subsequent versions of a piece of similar or identical content. When Google arrives to index one of these pages, the canonical URL tells Google, “I am not the original version, so do not index this page. The original is over there.” This process is called canonicalisation.
It sounds almost counter-intuitive to want to instruct Google to index only one, not all, versions of a specific page, but it actually makes sense from a user-experience perspective. Plus, it’s Google’s rule, so you don’t have a lot of choice.
Also, if you’re driving traffic to what are effectively secondary or non-primary pages via social media, advertising, memberships, etc., then you needn’t worry that Google is ignoring them as you are creating the audience directly.
The only constant in technology is change ~ Marc Benioff
Solving the problem of duplicate content
Canonical URLs (or links) were introduced by Google in 2009 to solve the issues surrounding duplicate content.
Acting like a signpost, canonical links tell Google which is the ‘master’ version of the content. And, by extension, which version, or versions, to ignore and never show in search engine results.
Yes, you read that right; ‘ignore’. More on this later.
Are you wondering why the best or right version of content, not any version, is even an issue? Well, it is, mainly because Google decided it is. But here’s why.
Google considers content to be a duplicate when the same or very similar content appears on two or more websites. This can happen with syndicated content, such as when another publisher uses your content on their website.
‘Duplicate’ means it’s either identical to the original or it’s not different enough.
Confused? It simply means that if a piece of content is rewritten and published elsewhere, at least 89% of it must differ from the original for it not to be considered a duplicate.
Google likes clear, defined content and sees duplicates as diluting the value of the original, even though each version is present in a different context (i.e. on a different website).
Let’s say your website is about interior design, and you’ve written some content in the form of an article about waterproof material.
You get approached by an outdoor living magazine who wants to use your content, and you agree.
They will likely edit the content to fit their house style, but it’s fundamentally the same as your original (otherwise, what would be the point?).
When Google indexes both pieces of content, it does a Google ‘hmmm’ – hand on chin, looking up to the left, waggling its finger.
Google knows they are two versions of the same content, but it doesn’t yet know which is the original. And Google always wants to know which is the original or is the “…best and most important…”.
If it chooses the article on your website, that’s great news for you, but it will be at the expense of the magazine article which is potentially being penalised and being pushed further down the rankings.
However, if Google chooses to index the magazine article instead, it will be driving traffic towards their website, not yours, and your website version is at risk of being penalised.
Good website developers understood how this worked. Content owners were arguably better off because of it. Google was happy. Readers were none the wiser but saw better, more relevant results without realising why.
So, what’s changed?
Enter stage left: the canonical URL.
Content is king, but engagement is queen, and the lady rules the house! ~ Mari Smith
Why was not using canonical links a problem?
If Google found two or more similar pieces of content, and there was no canonical URL present in any of them, it would decide for itself which it considered to be the most important, based on several factors.
These include:
1. Page titles: The way page titles are written, i.e. how descriptive or keyword-rich they are.
2. Content differentiation: Whether and how different the content is.
3. Domain authority: Also known as PageRank, higher authority domains are given more weight, and more relevance.
4. Website configuration, including:
Consider the following: you wrote an article and published it on your own website. You then published an identical copy on other platforms such as Medium, Vocal Media, Substack, etc. You didn’t use a canonical URL on the others, and you consider your website to be the main ‘hub’ for all your content.
Google visits your website and indexes the page containing the article. But it also visits the other platforms and indexes your article there, too.
At some point, Google realises it has multiple identical copies of the same content and, based on the factors mentioned above, selects one to keep and deletes the others from its database.
The question is, which one did it keep?
It’s not a trick question, as there’s no right answer. We simply don’t know.
Let’s now play the same scenario again, but this time, you added a canonical URL to each of the copies on the other platforms. The canonical URL points directly to the version on your website.
In this new scenario, Google visits your article on the other platforms and sees the canonical URLs on each. It instantly knows the version on your website is the ‘original’ – the “...best and most important…”.
Job done, and everyone’s happy.
True enough. At least they would have been until Google changed its mind again.
The secret of change is to focus all of your energy not on fighting the old, but on building the new ~ Socrates (via Dan Millman)
Google sees things differently now.
On May 2, 2022, Google updated how it treats canonical URLs. And by ‘how it treats’ them, that means ignore them.
领英推荐
The decision would go on to negatively impact all syndicated content. And, while Google was previously happy that the canonical URL forced it to ignore duplicate versions of content, it now wants every piece of content to be unequivocally unique.
In fairness, that’s not a bad call.
In the example above, the content about waterproof materials should, ideally, be rewritten for the outdoorsy magazine. It should specifically talk to the needs of those looking for external waterproof/weatherproof material.
Whereas the original version talks about the benefits in terms of spilled liquids within the home.
Now that you know this, there should be no problem in the future. You simply create context-relative content. Right?
Yes, that’s correct. But what about the duplicate content that’s already out there on different platforms/websites? What do you think Google will do with those in light of its new way of thinking?
Let’s first explain why the change in Google’s approach is a problem.
Keeping up with Google
The main problem in the world of content creation, and keeping up to date with Google’s capricious nature, is that it’s mainly only those involved in SEO (Search Engine Optimisation) who get to hear about the changes and can also interpret what the changes actually mean.
Many SEO experts will publish content on their own websites about the changes as they happen.
But on the basis that we don’t know what we don’t know, why would anyone even go looking to find the information about canonical URLs?
And, if you don’t have an SEO expert on a retainer, they won’t be able to tell you.
It truly is a conundrum.
There are a great many subject experts out there, many of whom run their own businesses and who write fantastic content for their own websites.
There are also plenty of platforms not only looking for excellent, subject-specific content but who are willing to pay for it too. And it’s the latter that often increases the risk of duplicate content.
One such platform is Medium; “a social publishing platform that is open to all and home to a diverse array of stories, ideas, and perspectives.”
Medium is as much a platform for readers as it is for writers, hence a paid membership is required to be able to read content. The revenue generated from membership fees is shared among writers based on the popularity of their content.
Medium therefore prefers unique content (most ‘publications within Medium insist upon it).
However, it is possible to publish content that exists elsewhere (duplicate content) because they conveniently provide a way to add a canonical URL to every published article.
That all worked fine until Google changed tack. And now it’s causing a very real and significant problem for the ‘master’ version of content published elsewhere, such as your own website.
Let’s assume you run a business, and the primary goal of your website is to build credibility, trust and authority.
Your aim is for your website to always be the master source of your content so that when people search in Google for what you do, your website will be presented in the search results.
You write a quality piece of content and publish it on your website. And, not wishing to ignore your loyal readers on Medium, you publish the same piece of content on Medium and include the canonical URL that points to the master version on your website. All good.
Well, yes, but not quite.
From Google’s perspective, the perceived quality of content is based not only on its uniqueness and thoroughness but also on the level of authority the domain name on which it is published has.
Domain Authority is a score that represents how successful a website is in terms of search engine results.
So, despite all of your best efforts, now that Google is ignoring canonical URLs, it chooses the version based (in this example) on the domain authority. Medium, with its long-standing global presence and millions of users, will win hands-down over your own website.
The result? Google chooses to index your content on Medium’s website.
The version on your own website might as well be invisible.
Google’s decision to ignore canonical URLs (or ‘tags’ as they’re sometimes called) is rooted in its ongoing commitment to enhancing the overall user experience and the quality of search results. They want to ensure that users receive the most relevant and valuable content when conducting searches.
It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change ~ Charles Darwin
What can you do with existing duplicate content that’s published on other platforms?
If you have created interesting, engaging, and unique content that you want to be found when people are searching with Google, you first have to accept that this can only happen in one location.
And you have to decide which location that will be. Is it your own website or another platform, such as Medium?
Once you’ve calmed down and accepted the way things are now, you have two options.
Option one is to completely rewrite the original content that will be published elsewhere.
It will be challenging to maintain the same voice and messaging, especially if you have to do this more than once for the same article.
If it’s really important to you to have this content in multiple locations, the best advice is to find a professional copywriter. They will be able to reinterpret your content and will understand the rules Google imposes about how different the content needs to be.
Option two is to keep the primary article (remember, the “…best and most important…”) on your website and create summaries of the content on other platforms.
If you do this, you should include a statement at the foot of the article, such as, ‘This is a summary of the main article, the full version of which appears here: [your website]”.
Google will eventually pick up on the fact that these pages are completely different (no duplicate content) and will see the version on your website as the one to index.
However, you will need to be patient.
Key Takeaways
Google will change its policies. Google’s approach to canonical URLs has evolved over time. Initially, canonical URLs were introduced to resolve duplicate content issues. However, the shift in Google’s policy means they are now ignored.
Duplicate content is dead. You cannot create duplicate content – the exact same content in multiple locations – and expect to be able to control which version is indexed in Google.
The impact on content strategy: If you earn money from syndicated content published on platforms such as Medium and Vocal Media, and where the content is identical, Google may index only one version of the content and ignore the others. If the platform allows access to content only via a paid membership, Google will be driving visitors to a location where they cannot actually read your content.
The importance of originality: Remember, Google prioritises original, context-specific content. If you know enough about the subject you’re writing about, it shouldn’t be too difficult to create another ‘original’ version. And, if you can’t do that, seek help from a professional copywriter.
More insight at?The Marketing Alliance