Categories
SEO

A Guide to Understanding 410s, 404s, and Soft 404s

Note: For the purposes of this post, please assume that non-existent URLs can be referred to both 404s and 410s.

Every SEO will, or has already, grappled with non-existent URLs, or some variations of it. In recent years, it has become more clear on how to deal with these pages; however, even today, a lot of the SEOs battle with figuring out what to do with these errors.

In fact, if I may be so bold, it’s not that the SEOs do not know, it’s that the someone in the executive team notices these errors, and they don’t necessarily have the SEO know-how to understand the why. In my experience, the biggest bother of an executive is the high number of errors in Google Search Console. Seeing 1,000 404 errors in red apparently becomes a significant red flag, and in their minds, you as an SEO has solved those issues only when those errors go away. What they fail to realize that these issues are likely trivial in nature.

Hopefully, this post elucidates to everyone why non-existent URLs are extremely common, the importance of 410s and 404s, and if anything, why the only area of concern should really be soft 404s.

Let’s get started!

Everything You Need to Know About 410 URLs

410 is an official HTTP response code that indicates a URL has been permanently removed. It is “gone,” and will not be available anymore. It’s very similar to a 404 HTTP response code (more below); however, what differentiates 410 from a 404 is that it’s more deliberate and intentional.

For example, I once had a Cookie Policy Page with the URL of https://feedthecuriosity.com/cookie-policy/. After some time, I decided to remove/delete this page, and appended a 410 code to it. Technically, I could have done nothing, and by definition, it would become a 404 page (a non-existent page). However, I had a reason to mark this as a 410 (more below). But first, let’s uncover where the subtlety lies between these two:

  1. 404 is an official response code for a page not found. So, for instance, if you go to any random URL on my site (or any site), such as https://feedthecuriosity.com/kjfhkjdshgkjsdhgkjhdfkjghfdhg, you’ll get a 404 error. Why? You guessed it; because it doesn’t exist. And technically, any non-existent page (even if it used to exist before, will become a 404 unless indicated otherwise, or taken some kind of redirect action).
  2. 410 commonly comes into the picture only when a URL used to be live at least once, but isn’t anymore. Taking the same random URL from my example above, I won’t be able to implement a 410 to it because it never-ever was alive in the first place. Well, super-technically, it can be done (within the right circumstances), but the default state would be a 404.

Okay, Big Deal. But Why, or Rather, How, Do 410s Help SEO Efforts?

Google’s official stance on both 410 and 404s is that in the mid to long term, in their eyes, they’re almost equivalent. The key here is “long term.” What about the short term?

That’s precisely where the real value of 410 comes into play. Typically, Google will slow their crawling of non-existent URLs (404s or 410s); nevertheless, they would still crawl them — to eventually drop it from their index.

With 410s, you can assist Google in dropping these much quicker from their indexes. If the same URLs were 404s, it’d take them a few more days; however, for a massive website with millions of pages, a few days can be vital for ROI.

Everything You Need to Know About 410 URLs
For URLs that used to exist before, but don’t anymore, marking them as 410s is the right call if:
[A] Those URLs do not have a clear replacement/a.k.a, you do not have an appropriate redirect. Be careful of blanket redirects too. There are some nuances to consider (although, not the primer of this post).
[B] Has no SEO value.
[C] They’re in huge numbers.

This way, Google will drop them more quickly from its index — when compared with them being just 404s.

On a concluding note about 410s, there are two more use cases for it:

  1. If you need Google to remove URLs sooner than later. It doesn’t necessarily have to be in great volumes, but say there is something sensitive, or a branding play, marking them 410s would be more helpful.
  2. If you were hacked, and you ended up with tons of spam pages.

Everything You Need to Know About 404 URLs

First & foremost, let’s tackle the big elephant in the room. Executives, if you’re reading this: 404s are an extremely common occurrence, especially for big websites. In fact, Google has gone out and said that, in general, they do not impact your search performance, and that you can avoid doing anything to them, if you’re sure that they don’t play any role whatsoever on your site.

How to Handle 404 URLs?

Now that — ^^that’s been addressed, let’s talk about some of the best practices on how 404s should be handled.

  1. The only reason for you to investigate 404s is if you know they should exist, but aren’t.
  2. Otherwise, you can ignore them if:
    • You do not have clear replacements.
    • No SEO value.
    • Google says that the report in Search console will stop showing these in about a month.
  3. Do not block 404s URLs from crawling or via robots.txt because:
    • Google remembers that a page used to exist, so it likes to check on them sporadically to see if they came back.
    • It won’t affect your crawl budget. If anything, it’s a healthy sign that Google is also able to crawl your bunch of 404 URLs.
  4. For faster de-indexation, you can submit an XML file of 404s, and it may do the trick. But again, you needn’t worry about it. You can also manually submit a few 404s through Inspect element to potentially give Google a bit of a nudge, but again, almost unnecessary.

In summation, 404s are usually not a cause for concern. If they are, that would mean that somewhere in the process, SEO considerations were not taken into account, and as a result, your essential URLs are 404-ing.

Everything You Need to Know About 404 URLs
Remember: 404 URLs do not impact your organic performance (all things being the same).

Everything You Need to Know About Soft 404s

It should be pointed out that a soft 404 is not an official HTTP response code. I believe, it’s more of a verbiage that’s gotten famous because of SEO.

Soft 404s (as the name implies, or does not), are referred to URLs that from the technical side of things tells search engines that a URL exists (in other words, has a 200 OK response code), but on the front-end, on the URL itself, there’s nothing, or nothing of significance to the site users.

For example, say you have a blank page, or a page with just 2 sentences as thin body content. Instead of returning an ideal 404 or a 410, you’re telling search engines that it exists. That’s bad all-around.

Why Are Soft 404s Bad for SEO?

Most commonly, there are a two great reasons for it:

  1. Soft 404s waste your crawl budget: Instead of telling search engines that these URLs do not exist by 404s or 410, or that they’ve moved by 301s or 302s, you’re signaling that your inconsequential URLs are real. This will force them to crawl them, and not to mention, think that your site doesn’t provide great value (because of the thin content).
  2. Soft 404s can typically be indexed (unless you indicate otherwise): As a continuation to above, these same URLs would be indexed. Subsequently, users will see thin or useless content from your site in the search results. Now, you could put directives in place for search engines to not index them or disallow crawling, but it takes time and effort. You’re better of converting them into non-existent URLs, or redirecting them of it makes sense SEO wise, and in general too.

How to Fix Soft 404s

Already touched on this, but on a high-level, the two accepted solutions are:

  1. Marking them 404s or 410s.
  2. Redirecting them.
  3. Another solution is to transform these pages into real value-driven pages. More content, more everything. An accepted page, like any other.

If you think your URLs are being incorrectly declared as soft 404s, the advice would be to inspect these URLs and see what Google sees. Maybe run your URLs through the mobile-friendly test, and take note of resources that are being blocked.

It may very well turn out that you’re accidentally disallowing crawling of necessary scripts, and CSS files, for Google to be able to understand your content. In that scenario, unblock it if feasible.

You May Also Want to Check Out:

Conclusion

There are very minute, yet subtle distinguishable contrasts between 410s, 404s, and soft 404s, and while ordinarily, they aren’t a cause to ring any alarm bells (okay maybe for soft 404s; you win!), it’s always a best practice to keep an eye out on these for efficient crawling, and in the case of 404s, ensuring your link equity is not going to waste.

Google does have a URL removal request tool/method, but it should be taken into account that using this tool won’t reduce the number of 404 errors in GSC.