Nov 12 2009

GoogleBot Follows URLs in JavaScript

Published by Jon at 10:32 pm under Uncategorized

I was very surprised the other day when I received a notification telling me GoogleBot was having troubles executing JavaScript on my new app Good Camel Games. The thing is that app is built with GWT and has tons of JavaScript so I have implemented a Javascript error handler on the client that will send me a notification whenever there is an error executing some JavaScript. This is actually quite simple to implement:

<script type=”text/javascript”>
window.onerror=function(message, url, line) {
window.location.href=’/error?msg=’ + escape(”Error: ” + message + ‘\nUrl: ‘ + url + ‘\nLine: ‘ + line);
return true;
};
</script>

And then, on the server side, I just have a servlet that handles “/error” and sends me a notification with the content of request parameter “msg” plus some other info about the client.

So I received that error notification (just pasting the relevant parts here):

From: googlebot(at)googlebot.com
user-agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
msg: []

After looking at it carefully, I realized that GoogleBot does not actually execute JavaScript. You can see this because the “msg” parameter is there but empty, which means GoogleBot sent a request to “/error?msg=”, without the parameter value (”Error: …”). So GoogleBot just looks in the JavaScript and takes whatever resembles to a URL and tries to index that URL.

After some researches on the web, I found I was not the only one:

To be very sure, I added another test on the home of Good Camel Games:

<script type=”text/javascript”>
function thisFunctionIsNeverCalled() {
window.location.href=’/this_page_does_not_exists’;
}
</script>

If GoogleBot hits the URL “/this_page_does_not_exists”, I will receive a 404 error notification. I added that code today, I will update that post when I have the notification. Hopefully, Google will have some day a true JavaScript engine to crawl JavaScript heavy applications.

UPDATE – November 20 2009: I confirm GoogleBot tried to access the page “/this_page_does_not_exists”. This confirms GoogleBot does not executes the javascript, it just checks for URLs in it. Here is the 404 error notification I received (just pasting the relevant parts here)

From: googlebot(at)googlebot.com
user-agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Request URL: http://games.goodcamel.com/this_page_does_not_exists

No responses yet

Leave a Reply