What is Duplicate Content ?

What is Duplicate Content ?

Duplicate Content in SEO

#Duplicatecontent, is a fairly common problem in the world of SEO. Indeed, many websites are penalized by search engines because of the considerable number of duplicate pages. Let's discover the different types of Duplicates contents, their dangers and the different solutions to get rid of them.

 

Definition of Duplicate content

When copying and pasting a piece of text in multiple places on a site, we do what we call duplicate content . This repeated content X times is a problem for users and for search engines. Indeed, the content is not necessarily useful, and much more like spam than anything, harms the good understanding and user experience within a website.

 

There are two types of duplicate content: internal and external duplicate content.

 

Internal duplicate content

Internal duplicate content is duplicates within a site. This can occur when there is a configuration problem, a problem with indexing and crawling pages, or other. Partial duplicate content is used when a small portion of the site is copied and duplicate content total when the entire site is duplicated.

 

External duplicate content

External duplicate content is often stolen content, reused excessively ... and which is on a different domain than the site in question. External duplicate content is a bit more difficult to manage in the sense that we do not necessarily have control over what has been copied. Moreover, the engines are not always up to the task of knowing the original author.

 

Even if for many, all this is simply bad information or ignorance, many people still think they are taking advantage of the system by stealing content from the best and then reusing it. Unfortunately, today, the rules have changed and it is no longer as easy to manipulate a search engine, especially regarding Google.

 

The dangers of duplicate content

Today, doing too much duplicate content is sanctioned by search engines. Google in particular, has released a specific algorithm that acts to filter good and bad content: Google Panda. It is grafted to the indexing process and analyzes your pages to verify that they are of good quality, that they are not stolen, and at the same time ... they are not duplicated.

 

Clearly, if Google Panda, scanning your site, realizes that you have an incalculable number of copied and pasted, it will punish you. Thus, you will be able to lose places to see even in rare cases to be expelled from the index for spamming.

 

Another penalty at Google, the manual penalty. In Google Webmaster tools, you can be notified that a penalty for Duplicate content has been registered for you. This can be partial or total. Generally, it depends on the severity of the situation.

 

Then, of course, there is a psychological penalty for your users. If they realize that your site is a duplicate content factory, they will ignore you and make you a bad name. Clearly, you will not see these users so often, they get lost in the meanders of meaningless and repetitive content.

 

Finally, I can’t help but warn you about the concept of "Negative SEO". If one or more people steal your content or reuse much of your content on the web (which will duplicate external content), it may happen that Google penalizes you. Google does not express itself on this subject in the sense that it seems to be a flaw in their engine. Indeed, the negative SEO is to lower the referencing of a site by sending negative signals to Google about a site. This is punishable by law. Do not have fun at it.

 

Delete duplicate content

As a first step, we will see how to identify duplicate content, then we'll see some of the main reasons why a site may be at risk.

 

Identify duplicate content

First of all, we need to know if we are facing a penalty and / or if we are dealing with duplicate content. To do so, we can start by looking at our Google Webmaster Tools account. Indeed, this one often indicates if there is a problem / an error with your site. Thus, you may receive an alert message. Your site may have been reported as partially or totally spam. If you have this indication, it is better to consider that you will have to look at your content.

 

Also, you have the opportunity to use tools like Siteliner, Copyscape, not to mention Google.

 

Siteliner scans your site and tells you your percentage of internal duplicate content. This tool can also help you understand your content and the architecture of your site. It is used for Duplicate internal content generally. Note that Siteliner has a free version and a paid version without limitation of the number of pages scanned.

 

Siteliner: duplicate internet content detection tool

Copyscape is Siteliner's big brother. Indeed, this well-known tool referencers will serve you to identify if a person has the same content as you on the web. It's up to you whether you are the original author or not. This tool is mostly used for the external Duplicate content. Note that this service is also free with a limitation and you can buy a paid license that can even warn you about duplicate content.

 

Then, by doing a simple Google search indicating a few sentences of your content, you can very well see if the giant has indexed similar pages. This can be interesting to find the original author in the eyes of Google or other.

 

Finally, I quickly go over the obvious facts. If by exploring your site, you find that there is often the same thing, that your content often appear on several pages, it is that you have a concern for duplicate content. Personally, I'm waiting today for a site to have less than 20% duplicate content.

 

Delete internal duplicate content

Now that we know how to identify duplicate content, we just have to get rid of it. As it is quite complicated to adapt for each site, I will simply discharge myself by saying that the following points are general points and that it is better to think about it several times before sticking to it. Remember that SEO consultant is a profession and as in any business, there are subtleties to know that we can not put in an article.

 

A URL for each content

Obvious, but easier said than done. When you add URL parameters, IDs, or anything else that can change the URL of the same content, you make duplicate content. Indeed, https://www.domain.com/page.php and https://www.domain.com/page.php?post=2 are a duplicate. So, I suggest you:

 

Avoid using session IDs

Avoid using URL parameters (or indicate them in Google Webmaster Tools, especially in e-commerce)

Be careful to use either the www subdomain or the domain only (by redirecting to 301)

Be careful to use only one protocol. Either HTTP or HTTPS (note there are exceptions, especially in e-commerce)

Set your CMS or other URL routing system to have beautiful and unique URLs

This list is probably non-exhaustive and many special cases may arise depending on the technology used, the structure and the design of the site. Really, find out.

 

Do not index all content

Since the advent of dynamic sites, it's much easier to create duplicate content. Thus, it is up to you to select which pages should not be indexed or used. Indeed, if you indicate a noindex, follow at the level of your double pages, you will be able to avoid the similarities in the results of the search engines and thus to remove suspicions on the duplicate content.

 

Avoid copy and paste

This point is also obvious: avoid copying and pasting your content. Reuse portions of code, text or other causes a redundancy that the engines do not like: it is not qualitative. So, try to always propose something unique, I think especially the product sheets that are often long to write and are very similar. The next point allows to solve some problems.

 

Set up a canonical tag

For some time, there is a new tag that can indicate the reference page of a content: the canonical tag. This is in this form:

 

<link rel = "canonical" />

This tag allows you to tell the engines: "This page B looks a lot like another page that is the reference on this site. We inform you that this page is not necessarily qualitative and that it is better to index the other one. "

 

Clearly, this means that you can specify a page "mistress" that will act as a reference page to a group. For example, if I have an e-commerce site on which I sell a phone that comes in 10 colors with one page for each color, I can indicate in each of these pages that the reference page, the canonical page, is the model in black.

 

Note that this tag should not be abused and that there are other ways to avoid the stated problem.

 

Avoid repeated features

Another thing to be aware of is the fact of using features, modules, widget, code ... that repeat the same thing many times. Indeed, if on a page you have a system that displays the same thing everywhere, you create Duplicate content again. So you have to avoid all these things. To name a few examples, you have PDF generators, printing systems, text and comment modules, and many other things. So check all this before installing a feature permanently.

 

Set the default domain in Google Webmaster Tools

It may be interesting to send another signal to Google regarding the main domain. It will be able to be persuaded that the reference domain for the site indicated is the one that has a qualitative content. Go to the site settings in your webmaster tool.

 

Attention to all other forms of duplicate content

As stated in the introduction, I can not list everything. Depending on the CMS, the framework, the code, the site, the theme, the products, the way the site is designed, there are many ways to create duplicate content. It's up to you or an SEO consultant to set up an anti-DC strategy.

 

Delete external duplicate content

As mentioned above, external duplicate content is content outside your website. The problem with this kind of content is that we do not always have control and it is more difficult to eradicate it. However, there are methods that will allow you to avoid the penalties and consequences of acts like the negative seo (which do not exist according to the engines, it is a long debate).

 

Avoid using the same content on multiple sites in a network

You are a webmaster and you have several sites? It happens frequently, under the mountain of contents to produce, to want to use the instrument of the facility: CTRL C + CTRL V. Indeed, when one has hundreds of pages to conceive, we prefer to reuse our contents like a recycling. Unfortunately, this is not a good practice. Unless you really use a few sentences and have 90% original content next to it, your content will be considered of poor quality. So be careful not to use the same content on your sites.

 

Pay attention to migrations and redesigns

When we change our domain name or redo our website, we often have to change the structure, make redirects in all directions or otherwise. However, when the actions are not done well, an old site may very well cause problems. For example, it is common to see an old blog on an address and a new one on another, it is duplicate, and it would be necessary to carry out a redirection 301.

 

Do not use indexed content

This point is very broad, but allows to lay a very strict rule. Whether you're buying content, inviting someone to your site to write, hosting a PDF, Word, PowerPoint, or whatever, you're creating duplicate content. The best way is probably to request the non-indexing of these contents or simply not to put them online.

 

Aggregator and other platforms

Sometimes people use content aggregators to have all the news centralized in one place. In addition, in order to improve the SEO of a site, some sites host your content. Pay close attention to all these tools. Be careful that external content points to a source, that an RSS feed only offers a snippet of your content, and more.

 

Request the deletion or deindexation of your content

Clearly, enforce your copyright. A person who takes back your writings or pictures without your consent

要查看或添加评论,请登录

Nicolas COULON的更多文章

  • Why use dark mode apps ?

    Why use dark mode apps ?

    - Battery saving. Dark mode can reduce power usage by a significant amount - Improves visibility for users with low…

社区洞察

其他会员也浏览了