How to Identify and Fix Duplicate Content Issues


In this article, I'll cover what the duplicates are, why they appear, how to remove them and what tools to use.
Why duplicate content matters
Why duplicate content matters
As the main task of search engines is to provide the most relevant data, they rarely show two same pages at the results. They are forced to choose only one page from your duplicates, one which is the most relevant (as a fork in the road, remember?). And thus the visibility of each of duplicates suffers.
The most common duplicates' types
The most common duplicates' types
"Www" & non "www" version + "http" & "https" version
Filters and sorting

Pagination

Pages with and without slash at the end
mysite.com/stores/
mysite.com/stores
Printer friendly version
Session IDs
This issue is becoming obsolete as most of websites are storing session in cookies. Thus if you still store the session data in URLs, this problem can be solved by storing session info in cookie.
Referral link duplicates
How to identify duplicate content issues
How to identify duplicate content issues
Google Search Console
If the same item is placed in two different categories thus title and description can partly differ, and such duplicates are not shown at Google Webmaster tools. Thus this method is great to identify the existence of duplicate content issue, but it's not the best one for in-depth analysis.

Desktop Crawler programs
I used Netpeak Spider to show you how such programs work. Enter your website, set crawl parameters, and after a while, you'll see the list of all errors the crawler found on your website. As duplicates are considered as critical errors, they're shown in red. By clicking on the error type, you'll get the whole list of duplicates your website has.

Serpstat

Manually
Use site:yourwebsite.com to get the list of all your website pages indexed by Google.

As we've already clarified, search engines are showing mostly unique results, thus be sure to click "repeat the search with the omitted results included" at the bottom of SERP.

How to fix the duplicate content issues
How to fix the duplicate content issues
Set 301 redirect
- the www or without www site version;
- http and https version;
- pages with and without "/" at the end;
- other useless duplicates.
Use rel "canonical" tag
- sorting pages;
- filters pages;
- utm pages;
- other necessary pages.
Let's get back to our sandals page example. You need to put
<link rel="canonical" href="https://onlinestore.com/shoes/sandals/" />
to the page
https://onlinestore.com/shoes/sandals/?sort_min_price
Thus when crawler visits the page where the sandals are sorted by the filter "from low to high price," it understands that the category page is preferred and you'll avoid duplicate content issue.
Choose preferred domain at Google Webmaster tools

Meta robots
Set rel="prev" and rel="next" tags for pagination
At <head> http://site.ru/category/
<link href="http://site.ru/category/2/">
At <head> http://site.ru/category/2/
<link href="http://site.ru/category/">
<link href="http://site.ru/category/3/">
Summary
Summary
Please note that duplicate content issue is a huge subject, in this article, I just covered the basics. Moreover, all the ways of fixing this issue are only recommendations because much depends on the particular situation when it comes to duplicates. I hope that you've found this piece of content useful.
Learn how to get the most out of Serpstat
Want to get a personal demo, trial period or bunch of successful use cases?
Send a request and our expert will contact you ;)
Cases, lifehacks, researches and useful articles
Don’t you have time to follow the news? No worries!
Our editor Stacy will choose articles that will definitely help you with your work. Join our cozy community :)
By clicking the button, you agree to our privacy policy.
Comments