Internet and Networking

What Is Crawler Hints From Cloudflare, and Why Is It So Significant?

Anything and everything we do has an energy cost directly or indirectly. So it is not surprising that organizations that are ethical and are at the forefront of pioneering innovation are always looking for ways to reduce the harmful environmental impact that takes place from our day-to-day or more grander activities.

As citizens of the internet, one of the most common tasks we conduct for ourselves or work is to perform web searches on a search engine, be it Google, Bing, or any other.

To that end, a fundamental process of making search engines work for us is to index the websites out there. Organizations that own these search engines have their own methodology of indexation, but in some form, they all have an automated system of scouring the internet for URLs to index. This scouring is also referred to as crawling in more technical terms, which is an ongoing operation achieved via bots. Every search engine has its own bot/crawler.

Sidenote: Search engine bots are considered good bots.

Now crawling, as you may have guessed, requires energy, which means, carbon footprints.

By your own definition, imagine the size of the internet, and then multiply it by infinity. That’s how big it is, AND it is ever-growing! Now imagine a search engine having to sift through such an enormous dataset to index. Not only that, it has to see what’s spam, malicious, what’s worthy, what isn’t, etc. There’s just so much that goes behind before a decision is made to give you the result of your search inquiry.

How Cloudflare’s Crawler Hints Comes Into the Picture, and What Does It Do?

The way search engines work is quite sophisticated, and they’re already pretty energy efficient. Still, even with that, more than necessary crawling occurs because of the accurate, relevant, and contextual need to provide the best search results. This isn’t to say that organizations aren’t trying to get better, except that stopping over-crawling isn’t as simple as a switch of a button.

Cloudflare recognized and understood this conundrum, and as a significant big step on making the internet more green, it introduced Crawler Hints.

What Problem Is Crawler Hints Trying To Solve/Already Solves for, Exactly?

One of the areas search engines constantly struggle with, or rather, continuously strive for, is to ensure that they’re indexing the most up-to-date content. It’s not real-time, but they try to get close. For that reason, they crawl the same URL/page/blog post — whatever you want to call it — again to make sure the information in their system/index isn’t outdated. A similar phenomenon of constant crawling also takes place to discover any new content (something that was previously unknown). Search engines do this on an extremely large scale. Now, you may quickly surmise that the information for a website does not change in many instances, and hence, that crawl is wasted and, in theory, wasn’t needed. Precisely, and as mentioned before, this extra crawling is a big problem now and for the future of energy.

Cloudflare exactly found that to be the case. They observed that 53% of these crawls are wasted — at least in the circumstances of revisits. That’s a huge number, considering the vastness of the internet. To put things into perspective, and mind you, it’s not that straightforward or 1:1, Cloudflare cites a report from the U.S. Environmental Protection Agency saying that cutting down on the 53% of excessive crawl can be equivalent to growing 31 million acres of forest or eliminating 5.5 million vehicles.

Centrally, Crawler Hints helps the internet as follows: It lets search engines know when the data has changed for the websites that leverage Cloudflare. The benefit? It can help search engines sync their crawling to when something changes, avoid wasteful crawls, and lower carbon footprints. Put another way, crawler hints mitigates the core problem of over-crawling the internet by the search engines, which demands immense amounts of energy — and for the lack of a better phrase — to no avail. The outcome? Reduced carbon footprints.

It cannot be emphasized enough how remarkable this feat is! The solution sounds painless enough, but it isn’t by any means.

You May Also Want to Check Out:

How To Enable Crawler Hints for Your Site

The beauty of being on Cloudflare is that the company does all the hard work. Translated, oftentimes, that means that utilizing any of its services or functionalities is a matter of turning something on or off.

To enable crawler hints for your website, after logging in, head over to Caching > Configuration first. (See screenshot below for visual context).

navigating to caching -- configuration inside Cloudflare
As depicted, navigate to Caching > Configuration.

Then, scroll down to the section/card with the heading of “Crawler Hints.” You may notice that it is still in the beta phase.

Anyhow, there would be a toggle switch associated with it — for enabling or disabling crawler hints. If it’s off, click on it to activate this feature, and you’d be all set!

Expert source: Cloudflare.