Categories
SEO

3 Basic URL Structure Best Practices for Avoiding Negative SEO Performance

People who are not technical, in fact even technical people (including developers), may not always have the time, knowledge, or accountability to stay on top of URL structure best practices — for SEO. And while all of that is fine, if that gets out of hand, and you’re growing, you’re setting yourself up for SEO failure. Or if you’re already a big organization, I can bet that your SEO is not performing as well as it should be.

Plus, you being a marketer responsible for website performance, would be gravely mistaken if you’re relying on them, rather than creating systems in place that can catch errors (and automatically fix), or avoid errors in the first place. In my 6+ years of SEO experience, one of the common, often not talked about mistakes I’ve seen everyone make is URL structures. In fact, it’s a problem that most big organizations don’t want to touch (and may make sense in a lot of cases).

In any case, before we get started, let’s go over some very basic context below.

Did You Know That in the Eyes of Search Engines:

1. https://feedthecuriosity.com/seo/
2. Is different than https://feedthecuriosity.com/seo (notice the absence of trailing slash after “seo”)
3. Is different than https://www.feedthecuriosity.com/seo/ (notice the inclusion of www)
4. Same combination as above, with the absence of the trailing slash
5. All of the above are different than https://www.feedthecuriosity.com/SEO/ (notice the capitalization of “SEO”)
6. Which in turn, are different than https://FeedTheCuriosity.com/seo/ (notice capital F, T, and C).
7. So and so forth, and to add to the mix, add the combination or absence of http, www. The possibilities are endless.

Why Do These Inconsistencies Occur?

Typically, in an organization, there might be different departments and divisions out there. Say, for instance, that an Editorial Department decides for some reason that the best practice for URLs is to capitalize the first letter.
Now, let’s say Developers decide that to execute on things quickly, we’ll stick to all lowercase URLs.
In very few cases, appropriate redirects to https, or the correct www, are not setup.
One of the last common reasons is that you cannot control how other people will link to you. Maybe Company X has its own “best practice” so to speak, where they say that when we link to someone outside, we would always capitalize the first letter of the word.

So What Are These 3 Best Practices, After All?

The biggest argument against these (and obviously from someone who wouldn’t necessarily understand SEO repercussions), is that why does it matter? Isn’t the user still going to the same exact page?

The answer to the first question is that it does in a big way, and the answer to the second is, yes! Let’s dive a little bit into the first one.

How Search Engines Treat URLs

Literally, all things being equal, all URLs are treated as unique URLs in the eyes of search engines. The biggest drawbacks in my mind, of you not having appropriate systems in place are:

  1. Duplicate content issues: you will end up with X number of URLs having the same exact content. That’s a serious issue as all of them would be fighting against each other to rank. What ends up happening is that all of them are also decreasing each others’ chances to rank because they’re competing in the first place!
  2. Crawl Budget: Google has a crawl budget. By letting them crawl and telling them to index all different variations of the same URL (user destination), you’re wasting their crawl budget for your website. Imagine, you have 10 variations of the same URL. In theory, you just wasted the crawl budget for 9 URLs because guess what, you’re better off if Google used their time to crawl 9 other unique URLs.
  3. Missing out on internal, and external link authority: As you might know by now, links are a ranking factor. Now imagine, that a capitalized version of your URL got 5 links from unique domains/websites, and the all lowercase got another 5. That’s awesome, but it’s also not great, because you can have 1 URL that gets the credit of all 10 unique domains, to increase your websites’ fighting chance to rank.
  4. Remember, all these 3 factors can occur on a very large scale, too. So the bigger your website is, the more detrimental it would be for you, to not have URL structure best practices and system in place.

You May Also Want to Check Out:

Best Practice 1 (Come Up With a “Decided Version”)

First and foremost, you need to decide on the following 4 basic rules:

  1. Whether or not to have www.
  2. Whether or not to have https (in this day and age, the second choice seems obvious)
  3. Whether or not to have trailing slashes for path names (it doesn’t matter for the main home-page; for instance, example.com is treated the same as example.com/)
  4. Case of the URLs. Do you want it camel case, all capital, lowercase, etc. I would highly recommend to simply go with all lowercase, to avoid headaches.

Whatever you decide for all 4, stick to it! If you’re just about starting your website, it’ll be very easy for you to pick one from the get-go. If you’re already established, you’ll have to look at your analytics, traffic, ROI, backlinks, etc. to decide. It gets a bit complicated on the buy-in from the entire organization. Proceed with caution here, as in, in the pursuit of fixing things, you may end up affecting things more negatively.

Best Practice 2 (Set Up 301 Redirect + Canonical Rules)

Once the decisions has been made on best practice 1, set up a 301 redirect rule to redirect all variations to the “decided version”. Say for instance you decided on [https + www + all lowercase + without trailing slash]. Something like–> https://www.example.com/seo. What a correctly set up 301 rule should achieve is the following:

  1. https://www.example.com/seo/ (with a trailing slash) will be redirected to the decided version.
  2. Same without https, without www, capitalization possibilities, or any other weird combination you can think of. A pronounced developer can help you set up regex rules here.

Setting up 301 redirects this way also covers you from factors outside of your control. Say for example if a website linked to you with an uppercase URL. However, because of the 301 in place, you will make sure that the correct version/a.k.a the decided version gets the link juice from that same website.

One caveat here is about setting up 302 redirect rules. Google has become sophisticated about this, and can eventually understand which version of the URL is important to you (depending), but still, it can cause confusion. Go with 301 from the beginning!

Another way you can achieve the almost same result is to implement canonical logic. To understand what canonicals are, you can learn more from Moz.

Essentially, you can implement the canonical in two ways:

  1. do a self-referential canonical to the “decided version”
  2. all other variations canonical to the “decided version”

Canonicals are a great indication to Google for avoiding duplicate content issues. On a high-level you’re saying, hey, I know I have all these duplicate URLs, but please give all credit to my decided version. Where canonicals could fail is that it cannot account for all variations.

So what’s a good rule of thumb? You might have guessed it. Set up 301 + canonical rules!

Best Practice 3 (Query Parameters)

We all have come across URLs such as example.com?page=2&color=red. etc. All the letters, characters, and words that start with the question mark could be called a query parameter.

Query parameters are quite useful, especially to big e-commerce sites. For instance, it could be used to apply filters by price, sorting functions such as high-to-low, or even proceeding to the next page. These are common ones; there are tons of uses cases for query parameters out there.

And while all that is great, they end up creating unique URLs on your website which are open to being crawled and indexed. Without going into too many technical details here, you can use robots.txt to disallow crawling of certain query parameters. What this does is that it tells search engine crawlers to not crawl such URLs. This is extremely useful for crawl budget reasons. To learn more thoroughly about robots.txt, navigate to Yoast.

url structure best practices: robots and crawling
uuScreenshot from: https://yoast.com/ultimate-guide-robots-txt/

Expert Resources

  1. Moz’s guide on URLs.
  2. What Google has to say about canonicals.
  3. Spyfu’s guide on www vs nonwww.