Categories
SEO

5 Methods for Permanently Removing URLs From Google Search

Every now and then, you might find a need to either remove a URL, or a set of them from Google Search. And while it may take some time to complete the entire process, and there are some things you can do to speed up the process, one of them comes straight from Google in the form of a removals tool — that allows you to request URL deindexation temporarily. The benefit of this tool is that of speed and of more direct communication with Google.

Coming back to the premise, though, below are the 5 methods that you can employ, to permanently remove URLs from Google Search.

Method 1: Make Your URLs 404s

This method, by far, is probably the most utilized one. By definition, 404 URLs are those that do not exist. For instance, if you go to some garbage looking URL on my site, say something like–>https://feedthecuriosity.com/garbageurls404swgdsgsg, you’ll see a message that says page not found.

an example of a page not found (404)
Notice the page not found message when you go to a URL that doesn’t exist. Conceptually, this is true for any website on the Internet. You might see things differently on the front-end, but technically speaking, it would be a 404 page (more explained below).

What the Heck Is a 404 URL?

The number 404 has less to do with how your URL is named, but everything to do with the HTTP status code. Without going into too many details about it, know this: Every regular URL on the entire Internet will always generate an HTTP status code.

For example, on a high-level, when a URL works & exists the way it’s supposed to, the HTTP status code is 200. Similarly, when a URL doesn’t exist, as you may have guessed, the status code is 404.

How Google Treats 404s?

Realize that Google has to be able to crawl your 404 URLs to know that they are indeed 404s. So yes, they have to crawl it first. With that stated, here are a couple of nuances and interesting facts you need to remember:

  1. Typically, webmasters get concerned that when Google crawls 404 pages, it’s wasting the crawl Budget. However, Google confirmed that it doesn’t, and also made a few extra notes about it:
    • If anything, the fact that Google can crawl a bunch of 404 URLs is a positive sign — hinting that its bot has more than enough capacity for your website.
    • It’s also a way for Google to ensure that they’ve still remained 404s, or whether they’ve come back to life (with 200 status codes). You see, Google remembers that these 404 pages in used to exist before, so it’s merely double-checking to confirm the status.
  2. Eventually, though, if the 404 status persists, Google will drop these URLs from its index, the end goal — for all intents and purposes of this blog post.

Method 2: Mark Your URLs as 410s

Comparable to 404s, 410 HTTP status code ultimately serves the same purpose. However, a differentiating factor is on what the status code actually means. Put plainly, 410 is a more intentional HTTP status code. A word one summary of it is “gone.” Meaning, hey, we know that these URL(s) existed before; but now, we’re telling you that they’re gone.

How Google Treats 410s?

Google’s stance on 410 is almost the same as that of 404s in the long-term. Although, in the short term, if we’re talking a large set of URLs, 410s, in theory, can help with faster deindexation due to the intention it presents. As stated, 410s are a more deliberate action. 404s, on the other hand, can be accidental too, which is one of the secondary reasons Google comes back to crawl them, to make sure.

If you need more detailed learning on what separates 404s, from 410s, and soft 404s, feel free to check out this post — where I go into the nitty-gritty.

Method 3: Leverage the Noindex Meta Tag

If configuring things from the HTTP status code side of things is not viable, you can input a line of code into the page’s/URLs HTML — within the <head></head> section, that can achieve the same result. The syntax is:

<meta name="robots" content="noindex">

If, you want to explicitly keep this command for Googlebot only, you can use:

<meta name="googlebot" content="noindex">

How Google Treats the Noindex Directive

The googlebot will respect this line, and convert that into the command of, okay, I need to remove this URL from my index. The only caveat here is that if these same URL(s) are blocked via robots.txt, chances are Google will not be able to even get to this line of code.

And because it won’t be able to access it, your pages may still be indexed. Another note is that the URL doesn’t become invalid. For instance, people can always navigate to the URLs that are marked as nonindex; it’s just there is a separate line that tells Google not to index it. Simply put, what I am saying is that these URLs can have a status code of 200, but can still be no-indexed from Google SERPs.

You can learn more about the Robots meta tags from Google Developers.

Method 4: Use the X-Robots-Tag HTTP Header

A a good rule of thumb that might be beneficial to know is that anything you can do with meta robots tag, can be done via the X-Robots-Tag HTTP header.

As you might have hypothesized, you can submit commands for noindex with this method as well. The process is a bit different, and unlike the meta robots tag, you will not be able to find this line of code in your page’s source code. X-Robots-Tag is a header, so the only way to QA for its presence (or at least one of the ways) is to use something that can extract out the response header information.

I’ve previously written a post discussing meta robots tag and X-Robots-Tag, where I also demonstrate how to QA for each, using your Web Browser (and nothing else; no extensions, or anything of that sort). If you get a chance, check it out, as I am confident you’ll get some value out it.

One advantage that X-Robots-Tag has over meta robots, is that it can be used on special URLs such as those that end in .pdf.

Method 5: Block Google’s Access to URLs’ Content

A prime example of something like this is to password protect your pages. This way, only users who know what the password is, can access the URLs. Googlebot, on the contrary, won’t be able to.

Google has confirmed that it won’t index such URLS, due to a very fundamental reason that it cannot access it.

How to Implement These Methods

Up until this point, I talked on how to permanently remove URLs from Google Search. Now, I will briefly cover how you can put these into action.

For all 5, a Development/Network Engineering Team/IT Team, etc., can almost always help. Nevertheless, if you don’t have these resources at your disposal, in most cases, marking URLs as 404s should be doable either from your CMS, domain registrar, and or through a CDN provider (if you have any). 404s are traditionally the most common form.

As for the rest, being on WordPress can undoubtedly help, as plugins can do these jobs for you. For instance, the “Yoast SEO Premium” plugin lets you tag URLs as 410s.

That being laid out, you should have a few options available from your hosting, CMS, etc. — even if paid, for you to manage URLs. In fact, something similar should be available to implement custom meta robots tags too.

In my mind, the trickiest of all, is implementing the X-Robots-Tag HTTP header. Still, if you have Dev resources, this should be doable.

You May Also Want to Check Out:

Conclusion

Whatever your reason may be for you wanting to permanently remove URLs from Google Search, any of these listed 5 methods should help you out. Nevertheless, if you want something more speedier to get you started, you can make use of the Temporary URL Removals solution provided directly inside Google Search Console.

This way, you can expect your URLs to be quickly removed from Google SERPs (as a stopgap only), while in the background, you work on a permanent fix.