Crawling & Indexing; Exclude Low-Value Pages
Indexation of lower value pages was among many of the findings from our correlative study of the November 2019 Google update. As a result, one of our recommendations is that websites (blogs) need to cut down on low-value pages by developing an indexing strategy that keeps only their most important pages updated and relevant to searchers in Google’s index.
What exactly is an indexing strategy and why does my site need one?
Think of it this way: it’s a map and a set of on-page instructions for search engine crawlers to understand and follow as they crawl through your website looking for new information.
But Arsen, that’s what my sitemap is for!
Yes, this is true! But also, Google and other search engines will discover and crawl pages on your website via multiple points of entry, not just using your sitemap.
Based on our observations in the study; it’s important to handle sequential pages appropriately based on SEO best practices.
Let’s quickly define what Pagination is. It’s the sequence of numbers assigned to pages in a document. It’s most commonly used in your category and tag archives, or to break up a bunch of comments on a post. By allowing for each page to be numbered and properly organized, you’re essentially helping Google understand which and how many posts are within each category.
Indexed pagination: SEO Best practice for handling multiple pages of comments
If you have comments enabled on your posts it’s conceivable that a popular post could result in multiple pages of comments from readers. “Great,” you think, “because comments equal user engagement, something Google likes!”
You’re not incorrect. But you’re also probably not handling all of those pages correctly in the eyes of Google.
The good news is that you are not alone, many large sites must also address similar types of issues when it comes to handling reviews and comments (AKA User Generated Content). So Google has, over time, become very well aware of these common issues and has gotten really good at understanding which pages are in a sequence (paginated) and which are not.
The bad news — It is also very easy for Google to get confused. During our study, we observed multiple domains where Google did not respect the canonical suggestion. Especially when it came to indexing comment pages. A typical practice is to canonicalize page-2 and on of your sequenced pages to the first page. But a canonical is just a suggestion, so you’re not really telling Google what to do with that.
In this case, the best, SEO-friendly way for bloggers to handle multiple pages of comment-related content is to no-index page 2 and on. Meaning that only the first page (the recipe page) should be indexed.
Pagination of Tag and Category Archives
If you are using categories to organize your content into topically relevant silos, and tags to help the user find groupings of content, we recommend that you apply a noindex directive for robots on the TAG pages. In this scenario, your tag pages are low-value to search engines as they do not inform or provide them with anything unique.
Continuing with the above scenario; your category pagination should be set up as follows:
- All pages in a sequence should have a self-referencing canonical
- All pages in a sequence should have index/follow robots instructions
- Second page in the sequence should be page-2
- Numbered links should be presented, link to the first page should always be present
The point is to create a good visual and an organizational structure so that your domain is authoritative around a certain topic(s) and is organized in a way that users can easily navigate. And to avoid getting useless pages into Google’s index that serves no use to the reader and creates confusion for search engines.
One final, closing note…
In the SEO world: Low-value pages are useless
Just like a Bonsai tree needs careful pruning and attention to keep it alive and thriving, a website is a living thing (online) that requires a strategy to maintain the best pages and weed out any excess.
This is why having a pulse on your website analytics is important. In a given month, you need to be able to clearly identify which pieces of content continue to receive the majority of visits (Traffic) and which pieces of content are getting very little visits.
When you know which pieces of content are underperforming, you can quickly take steps to clean up the content (i.e. redirect it to a better, higher quality page) or address the topic again and see how you can make it better and more relevant to your users.