This site uses cookies and other tracking technologies to make possible your usage of the website, assist with navigation and your ability to provide feedback, analyse your use of our products and services, assist with our promotional and marketing efforts, and provide better user experience.

By using the website, you agree to our Privacy policy

Accept and continue

Report a bug

Cancel
5898   20  
SEO 13 min read May 15, 2017

How to Identify and Fix Duplicate Content Issues


Elena K.
Editor at Serpstat
Duplicate content means that the same or partly the same content is accessible on multiple URLs. As a result search engines don't know what page to show in the results. At first glance, seems that there's nothing to worry about as your website is still ranking and supposedly the myth of duplicate content is collapsed, right? But duplicate content may lead to significant rankings drop and thus this problem totally worth paying attention to.

In this article, I'll cover what the duplicates are, why they appear, how to remove them and what tools to use.
Let's see how search engines feel like facing several similar pages. Imagine that you're traveling by car and see the fork in the road, the signs say that both of these roads lead to your destination. Which one will you choose? This is pretty much what Google feels running into duplicates. The search engine has to select one from two or even more similar pages to show it in the results.


Why duplicate content matters

We've already clarified that duplicates hurt your SEO. Now let's find out why this happens. The rankings drop often stem from several main problems:
1
Visibility dilution due to the specifics of search engines' work.
As the main task of search engines is to provide the most relevant data, they rarely show two same pages at the results. They are forced to choose only one page from your duplicates, one which is the most relevant (as a fork in the road, remember?). And thus the visibility of each of duplicates suffers.
2
Dilution of the backlink value. The importance of backlinks cannot be underestimated when it comes to ranking. So let's imagine that the same article is available on two different URLs. Thus all backlinks and shares will be divided between these two articles as some readers are linking to the first URL and others ‒ to the second one. And as I've already said backlinks is one of the main ranking factors this will affect search visibility.
3
Waste of the crawl budget. Crawl budget is the number of URLs Googlebot can and wants to crawl over a certain period. That means that the number of pages crawler indexes once is limited. So if there are lots of duplicates on your website, there are high chances that Googlebot will crawl the duplicates instead of valuable and required pages.


The most common duplicates' types

There are dozens of reasons why duplicates can appear. If I cover all of them in this post, you'll spend several days reading it. That's why, I'll mention the most common ones.

"Www" & non "www" version + "http" & "https" version

If your site is accessible both at www and non www versions, you have two identical websites with the duplicates all the pages it has. The same happens to http and https versions. If both of them are indexed by search engine crawlers, again you'll face the duplicate content issue.

Filters and sorting

Such elements as sorting and filters usually cause the duplicates creation. The results are formed on a separate page with the dynamic URL. The combination of different filters and sorting parameters result in creation of numerous automatically generated pages. By ignoring this error, you let the crawler indexing all these pages. That's how the duplicates appear.

Pagination

Pagination also creates duplicate issue as titles and descriptions of all the pages are the same. You need to set up pagination correctly. Read about how to do it at the end of the article.

Pages with and without slash at the end

Another common situation I met as well: the website pages have two versions both with and without slash at the end of URL. Here what I'm talking about:

mysite.com/stores/

mysite.com/stores

Printer friendly version

If your site's CMS creates printer-friendly versions of the pages, it can also cause the duplicates creation if multiple versions are indexed.

Session IDs

To track the activity of the visitors, a site's server assigns a specific number to every user for the duration of his/her visit, where every visit is a new session. The most striking example here is online stores saving the items the user added to the cart. Some sites use session IDs in URLs, creating a separate page for every new session, which causes a duplicate issue again.

This issue is becoming obsolete as most of websites are storing session in cookies. Thus if you still store the session data in URLs, this problem can be solved by storing session info in cookie.

Referral link duplicates

When user comes to the website via referral link, which looks like «?ref=…», he should be automatically redirected to the canonical URL. But, unfortunately, the developers often forget to do this, and thus the duplicates appear.


How to identify duplicate content issues

We've already discussed what duplicate content is, why it worth paying attention to and why it appears. It's time to learn how to find the duplicates and what tools to use.

Google Search Console

The basic way to find duplicate content is Google Webmaster Tools. Go to Search Appearance section and click on HTML Improvements. If there are duplicates on your website, they'll be shown at Duplicate meta descriptions and Duplicate Title tags. Unfortunately, this tool doesn't show all duplicates types.

If the same item is placed in two different categories thus title and description can partly differ, and such duplicates are not shown at Google Webmaster tools. Thus this method is great to identify the existence of duplicate content issue, but it's not the best one for in-depth analysis.

Desktop Crawler programs

Such programs help specialists do a fast, comprehensive technical website audit. They crawl your website like a search engine robot, and this makes them the best way to detect SEO issues, including duplicates of course. Netpeak Spider and SEO frog are the most well-known crawler tools.

I used Netpeak Spider to show you how such programs work. Enter your website, set crawl parameters, and after a while, you'll see the list of all errors the crawler found on your website. As duplicates are considered as critical errors, they're shown in red. By clicking on the error type, you'll get the whole list of duplicates your website has.

Serpstat

You can use Serpstat's Audit module as well. Create a project and set the desired audit parameters. After a while, you'll get a list of errors divided both by the error type and level of priority. Go to Meta tags section of the Audit module to see the list of pages that have identical title or description tags. Here's how it looks like:

Manually

Well, you can also search for duplicates manually, moreover is your website is quite small. There are several search operators to help you.

Use site:yourwebsite.com to get the list of all your website pages indexed by Google.
Now you can manually check the results to identify the duplicates. Or if you want to check whether this or that page has duplicates use the following operator: site:mysite.com intitle:the title you're checking

As we've already clarified, search engines are showing mostly unique results, thus be sure to click "repeat the search with the omitted results included" at the bottom of SERP.


How to fix the duplicate content issues

When it comes to duplicate content issue, saying "fix" we mean showing Google which of several identical pages is the original one. And there are several most common ways of how to do it:

Set 301 redirect

In most cases, the best way to show what page is the original one is to set good old-fashioned 301 redirect from the duplicate page to the original one. 301 redirect says something like "Hey, this page is moved permanently to the new one, remove it from your index and pass the link authority and relevance to the new page." Thus we use it when we want to remove this page as it's totally useless. It's the best solution for canonicalization:

  • the www or without www site version;
  • http and https version;
  • pages with and without "/" at the end;
  • other useless duplicates.

Use rel "canonical" tag

Another way to fix the duplicates is rel "canonical" tag. Unlike 301 redirect, we use canonical tag when the duplicate page is required and cannot be removed. So, let's say we have two similar pages listing sandals, the original page is the page where no sortings are used, while the duplicates are the pages where the sandals are sorted from the low price to high and vice versa. Well, of course, all these pages are necessary but as we already know duplicates hurt our SEO. And here is where "canonical" tag comes in handy. It is the best fit for:

  • sorting pages;
  • filters pages;
  • utm pages;
  • other necessary pages.

Let's get back to our sandals page example. You need to put

<link rel="canonical" href="https://onlinestore.com/shoes/sandals/" />
to the page

https://onlinestore.com/shoes/sandals/?sort_min_price

Thus when crawler visits the page where the sandals are sorted by the filter "from low to high price," it understands that the category page is preferred and you'll avoid duplicate content issue.


Choose preferred domain at Google Webmaster tools

At Google Search Console you can choose preferred domain: with or without www. It's an alternative to 301 redirect. But, you should remember that everything you set at Google Webmaster tool works only for Google. Thus it's better to set 301 redirect to show all existing search engines what site version is preferred.

Meta robots

The last but not the least way to fix the duplicate content issue is meta tag "noindex,follow." It allows search engines to crawl a particular page but not to index it. Using the "noindex,follow" tag makes sure that search engines won't ignore the links on the duplicate pages. It's the best solution for printer-friendly pages e.g.

Set rel="prev" and rel="next" tags for pagination

In 2011 Google added a pair of tags rel="prev" and rel="next" aimed to indicate the relationship between component URLs in a paginated series. Use rel="prev" and rel="next" tags to help Google understand that this is not a duplicate but a pagination. Tag rel="prev" stands for the previous page, while rel="next" for the next one. Here is how it should look like:

At <head> http://site.ru/category/

<link href="http://site.ru/category/2/">


At <head> http://site.ru/category/2/

<link href="http://site.ru/category/">

<link href="http://site.ru/category/3/">


Summary

Duplicate content is a significant issue that leads to rankings drop and therefore traffic loss. That's why despite the fact that it doesn't result in penalty, it's 100% worth paying attention to. There is a bunch of different reasons why duplicates can appear, and it's crucial to remove them promptly.

Please note that duplicate content issue is a huge subject, in this article, I just covered the basics. Moreover, all the ways of fixing this issue are only recommendations because much depends on the particular situation when it comes to duplicates. I hope that you've found this piece of content useful.
Found an error? Select it and press Ctrl + Enter to tell us

Recommended posts

Subscribe to our newsletter
Keep up to date with our latest news, events and blog posts!
Sign In Free Sign Up

You’ve reached your query limit.

Or email
Forgot password?
Or email
Back To Login

Don’t worry! Just fill in your email and we’ll send over your password.

Are you sure?
Please pick the project to work on

Personal demonstration

Serpstat is all about saving time, and we want to save yours! One of our specialists will contact you and discuss options going forward.

These may include a personal demonstration, a trial period, comprehensive training articles & webinar recordings, and custom advice from a Serpstat specialist. It is our goal to make you feel comfortable while using Serpstat.

Name
Email
Phone
We are glad of your comment

Upgrade your plan

Sign Up Free

Спасибо, мы с вами свяжемся в ближайшее время

Invite
E-mail
Role
Message
Optional

You have run out of limits

You have reached the limit for the number of created projects. You cannot create new projects unless you increase the limits or delete existing projects.

I want more limits

Christmas is a time for miracles.

You are almost on the finish line of our Christmas quest. The last brick of your lego-promocode is left on the way up 55% discount.

Did not find previous lego-bricks? Fill the form anyway.

Name
Email
Phone