Unexpected Results of My Google Crawling Experiment

Some people might call me crazy.

I call myself curious.

That’s why I did a thing:

I disabled the crawling of my main website. This website.

(don’t rush to check though, the experiment is over)

I decided to do this crazy experiment to illustrate how Google will treat an established website after it’s disallowed in the robots.txt. 

By the way, this is not my first technical SEO experiment to run. You can check the previous experiments:

The Curious Case Of a Page Being Noindexed

When Rel=canonical Doesn’t Work: a Tale of Not Similar Content and Noindex

My Experiment Video

In addition to this post, I also documented everything in a video (which is pretty cool so definitely check it out, especially the end 😀 )

The post and video are a little different, and there are things mentioned here but not in the video and vice versa.

(P.S. I decided to use a new tool for this video. At some point, I realized the video quality is not perfect and it was too late to change things. Sorry about that)

Why My Crawling Experiment is Crazy

MarketingSyrup is my main website. I haven’t updated it for a while, but it still drives traffic to:

  • My blog posts
  • My free Technical SEO Audit Checklist
  • My free SEO Pro extension

Monthly traffic stats: 20,000 views on average. 

Check your technical SEO knowledge with a free quiz!

Answer just 11 questions and see your results!

Why am I Doing This Experiment?

I’m doing this robots.txt experiment and risking my website so that you don’t have to.

I love PRACTICAL learning over anything else (and always structure my courses accordingly).

So I want to give you as much value from this experiment as I can.

Once I disallow access to the website for Google, Googlebot will lose the ability to see what’s going on with my pages and content there.

People’s reaction

It was great to see so much support coming from people!

experiement-feedback
experiement-feedback
experiement-feedback

TL;DR:

– After you disable crawling of a website, it will take time for Google to start decreasing your website positions and traffic (depending on the industry and initial trust – my assumption, there’s no specific metric to measure it)

– Google might remove a favicon from the search results

– You’ll see an increase in reported indexed pages in Google Search Console as now even noindexed pages might appear in this report (since Google can’t crawl and see the noindex tag)

– You will see an influx alerts sent by Google Search Console: “Indexed, though blocked by robots.txt”, “Blocked by robots.txt”

– You will not be notified though, that your robots.txt is blocking crawling (there’s a slight difference between this and the previous point). The new tester tool will show that your disallowing robots.txt is perfectly fine (which makes sense because it might be your intention).

– Your image or video search results might be impacted differently from the text search results (my video results plummeted)

My Hypothesis About This Crawling Experiment

Once I disallow access to the website for Google, Googlebot will lose the ability to see what’s going on with my pages and content there. 

Based on my previous experience when my clients would accidentally disallow their website and my overall technical SEO experience, I think that the following will happen:

Hypothesis 1:

Most of the pages will still stay indexed but Google won’t be able to see them so it will deprioritize them in the search results.

Hypothesis 2:

My website pages will slowly stop ranking, so I’ll see a decrease in impressions and clicks in the Google Search Console.

Hypothesis 3:

The pages that will stay in the index will have a meta description saying that no information is available about the page.

Pre-experiment Benchmarks 

Google Search Console Traffic (3 months)

Here’s a snapshot of the website performance for the last 3 months before the experiment started:

GSC performance - before experiment

Analytics traffic (last 3 months)

The website gets about 20k visitors on average monthly. You see that the traffic has been pretty stable, more or less:

Before experiment - Analytics

Keyword positions

I thought that it would be really interesting for this experiment to track keywords. So I used SERanking for that.

I’m tracking 15 keywords here in Google Canada and 15 keywords in Google US. So the average position is 2 and the keywords are actually pretty different. Most of them have traffic, but I really wanted to focus on the keywords that are more specific to my website vs pretty general.

seranking-positions-before

Google Search results

Before I start the experiment, I see that my homepage is ranking first for my name:

branded-search-crawling-experiment-before

And then another page – SEO Pro Extension – is ranking second after the Google Chrome Store:

before-google-results-seo-pro-extension-crawling-experiment

Aaaaaaaand it’s done:

crawling-experiment-done-gsc

Crawling Experiment: a Week Later

A few days after I started my crawling experiment Google Search Console sent me a few alerts:

screenshot from google search console - pages are not being indexed

It’s great Google did it… But when I looked into the report, only 4 pages were showing up, some of which shouldn’t be crawled and indexed anyway:

screenshot from google search console - report from blocked by robots.txt

The first page is even noindexed. So now Google can’t see its meta robots tag as it can’t crawl the page. 

Google Search Console Traffic

The traffic from Google hasn’t changed a lot. No decrease is currently seen, even when I looked at it on the page level. 

1-week-later-gsc-traffic-update



Google Search Console – Indexed pages 

The number of indexed pages has decreased just by 1, which is weird but insignificant:

1-week-later-gsc-indexed-pages

Google Search Console – Crawl stats

The crawl stats report shows the biggest change: the number of daily requests went from 200 on average to just 1:

It makes total sense to me since crawling is disabled.  

1-week-later-crawl-stats-report-gsc

Analytics Traffic 

There’s no traffic loss from Google after 1 week of my crawling experiment:

screenshot from google analytics traffic after 1 week

Google Search results – Branded 

When I looked up my name, I saw an interesting change: my LinkedIn profile which used to rank after my website, jumped to the #1 position. 

And then the first result from the MarjketingSyrup site is the About page instead of the homepage:

1-week-later-brand-search-results

Google Search results – SEO Pro extension

No changes here at this point:

1-week-later-seo-pro-extension-search-result

Keyword positions

What’s interesting, after a week of the MarketingSyrup website being disallowed for crawling, there are no significant position changes in Canada (1 keyword is even up which I find funny):

1-week-later-positions-canada

And there’s a little bit more keyword movement in the US:

1-week-later-positions-usa

1-Week Summary

From what I can see, Google still believes in my website and doesn’t want to disregard it. That’s why there are no significant traffic decreases or page movements in the search results. 

But Google can’t see what’s going on on my pages. So let’s see how much time my website lasts by benefiting from its previously established authority and Google’s goodwill. 

Crawling Experiment: 2 Weeks Later 

Not a lot has changed…

  • I got more alerts from Google Search Console
  • My favicon disappeared

But other than that, it’s smooth sailing.

marketing syrup website's favicon disappeared

Loved this comment, I 100% agree here 😀

crawling-experiment-alek-twitter-post

Crawling Experiment: 2045 year 53 Days Later

It feels like a long time since I started this crawling experiment. 

I initially thought I’d wrap it up within 3 weeks. But here we are 53 days later. 

This is the last experiment findings check before I get my website back to the crawling party. 

A fun thing that happened during this time is that a few people have reached out to me, concerned that my website crawling was disallowed. Great catch; I’m happy people are paying attention. 

So let’s see where it all got us.

Google Search Console Traffic

There’s a slight decrease in traffic after 53 days of the experiment:

google search console traffic statistics

But honestly, I’ve seen much worse scenarios when crawling was disallowed by mistake. 

(I guess the intention behind disallowing crawling is a ranking factor 😀 )

Google Search Console – Indexed pages 

google search console - indexed pages

According to Google Search Console, the number of indexed pages has increased. This is interesting since I’ve added only 1 new page, and it’s noindexed. (I’m sharing more info about this below)

Google Search Console – Crawl stats

Unsurprisingly, the crawl stats show a minimum number of crawl requests:

google search console - crawl stats

I also noticed that the “By purpose” report mimics the above stats: 

google search console - crawl requests

It makes total sense because Google can’t refresh my content in its index as it can’t see my content. 

Analytics Traffic 

There is no significant traffic decrease according to Analytics either:

no significant decrease in traffic

Google Search results – Branded 

LinkedIn and Twitter started ranking on positions 1 and 2 respectively for my name:

google search results of Kristina's name

My website is found on position 3, and it’s lacking the favicon:

about page is ranking in google

Google Search results – SEO Pro extension

Google Search results - SEO Pro extension

10 Days After the Crawling is Re-enabled

Notifications from Google Search Console

Google Search Console is still sending me lots of different notifications with errors like “blocked by robots.txt”, “indexed though blocked by robots.txt”, and “page width redirect”.

Obviously, Google Search Console is confused about what’s going on. 

gsc-alerts-crawling-experiment

Video performance

In the previous update, I also mentioned that there was a decrease in video, impressions and clicks. It has not returned to the pre-experiment part yet. So we’ll see how long it will take.

crawling-experiment-video-search-results-after

New pages are being ‘indexed

Earlier, I noticed that the number of pages that were marked as indexed in Google Search Console increased after I started my experiment. Which was very surprising.

Luckily, I downloaded a list of indexed pages from Google Search Console before the experiment and then after the experiment. Here is a list of the ‘new indexed’ pages:

new pages are indexed during the crawling experiment

I added their status here too, so you can see that each of these pages is either noindexed or returning a 301 redirect. So the pages are not exactly meant to be in Google’s index.

I also checked the impressions for any of these pages in GSC, and one had a few:

crawling-experiment-page-impressions

The impressions came from the query ‘site:marketingsyrup.com’. To be fair, there are a lot of impressions for this query for my website, so it’s not only me doing this check. And during the experiment was the first time when these noindexed pages were getting impressions reported in Google Search Console.

Also, I think it’s important to clarify one thing here: even though these pages are marked as ‘indexed’, it doesn’t mean they would rank for any non-branded query. I think it’s more about reporting vs the actual performance.

Just don’t be surprised that the number of indexed pages reported by GSC would increase if you disable the crawling of the website.

As you can see below, after I re-enabled the crawling, the number of indexed pages decreased and almost got back to the pre-experiment number:

page-indexing-changes-gsc-crawling-experiment

The search performance of the homepage has been impacted

Another thing that I noticed is that impressions and clicks of the homepage plummeted during the experiment, even though the average position didn’t decrease.

After 10 days of re-enabling crawling, the number almost got back to normal.

10-days-later-ms-homepage-impressions-dip

Crawl stats

I re-enabled the crawling on November 27th, and that’s exactly when the spike in crawling requests happened:

10-days-later-crawl-stats-gsc

Seems like I opened the gates for Googlebot just to go and crawl everything because, before the experiment, there were 200ish requests per day, and then on the 27th, it was 680, so it’s more than three times more than usual. And now it looks like it’s getting back to normal.

Overall website performance

This is the overall website performance. You can see that it’s starting to drop slightly at the end of November. Now, it’s much easier to observe this dip. So there was a 1.5-month delay before this decrease is easy to spot.

gsc-performance-crawling-experiment

Currently, the amount of impressions and clicks is picking up, which is great.

Website rankings change

Overall, my website rankings were more volatile in Google Canada:

overall-search-results-canada-crawling-experiment

And in Google US, the rankings were more stable. Which is surprising to me as I’d think the US would be more competitive.

overall-search-results-us-crawling-experiment

Search traffic

The biggest ‘loser’ page here is one of my most popular articles – How to Verify Domain Property in Google Search Console via DNS.

The crawling experiment started on October 5th. And then, on November 7th, the traffic to this page decreased significantly. And then it got back after I re-enabled the crawling:

crawling-experiment-post-traffic

I’m sure if I were checking rankings for something even more competitive and more searched, the results would be much faster, and the page would just be moved much faster to page two or three of Google search results, or even removed from the Google search results.

My Google searches

What I see for my name search is that now LinkedIn is ranking first, and my website is ranking second. So, it is up again, plus the Favicon is back. 

crawling-experiment-brand-search-after

The search for the SEO Pro Extension is back with the favicon as well:

crawling-experiment-seo-extension-search-after

Crawling Experiment Hypothesis Check

Hypothesis 1: Partially true

The first hypothesis was that most of the pages would stay indexed by Google, but Google would decrease the positions. I would say that it is partially true. Most of the pages stayed indexed, but I would expect much drastic decrease in positions.

Hypothesis 2: Partially true

The second hypothesis was that the pages would slowly lose rankings. I marked it as partially true as well, just because some of the pages did lose rankings, and some of the pages didn’t. So it was so funny to see one of the pages actually climb on top of the search results and then be removed to page two or three of Google search results.

Hypothesis 3: Partially true

The next hypothesis is that “no information available” would be displayed in meta descriptions. I also marked it as partially true because you saw that there were some pages that had these meta descriptions, but most of the pages didn’t, so I think it will be more evident for the website if the website was just released and then most of the pages would have these magic descriptions.

But since Google already had some information about my website, it decided not to replace the magic descriptions with just no information available.

The Biggest Changes Caused by the Crawling Experiment

Biggest change 1: Favicon removed

The biggest changes also with this experiment is that the favicon was removed. I did not expect that. I just didn’t think about the favicon. But it was interesting to see that it disappeared after I disabled the crawling.

Biggest change 2: Negative video performance

The next thing is negative video search performance. I didn’t think about videos either, to be honest, because I do not rely on videos a lot, but some of my pages do have videos. So I noticed that The performance of the video results was so much worse during this experiment than the regular text search results. As of now, video performance hasn’t gotten back to the pre-experiment numbers, I assume it’ll take more time.

The most unexpected change:

And the most unexpected change is mind-blowing mind boggling for me: some of the pages got ‘indexed’ even though they had noindex tags (I covered it earlier in the post).

Another thing that happened: I lost my knowledge graph. I’m not sure that it is connected to the experiment, but I wanted to mention it still.

And last but not least, I started seeing lots of search spam reported in Google Search Console:

search-spam-wordpress-example

I’ll be honest: this one is something I’m the least happy about.

The search pages are noindexed. However, Google couldn’t see that during the experiment, so the number of such pages reported increased drastically.

Final Thoughts

What can I say? Don’t try this experiment at home 😀

Based on the experiment outcome, I believe that when Google has prior information about the website, there’s trust there, and this website will not be wiped out of the Google Page 1 existence after disabling crawling (for a short time).

With that being said, I think a bigger website in a competitive industry would see negative outcomes of disallowing crawling much faster than I did.

Overall, I enjoyed doing the experiment (though it took about 50 hours and 2 months to put everything together – a time commitment I didn’t anticipate). Hope you enjoyed reading about the test results too!

Check your technical SEO knowledge with a free quiz!

Answer just 11 questions and see your results!