PDA

View Full Version : Bots/Spiders


casualcottage
11-17-2004, 10:12 PM
Is it possible to tell from anything in the stats if a bot or spider, etc. is crawling my website?

webado
11-17-2004, 11:46 PM
Is it possible to tell from anything in the stats if a bot or spider, etc. is crawling my website?
If a lot of hits are logged very fast, that's what it is. Usually it's got javascript not enabled too. The well-known bots and spiders are not tracked by Statcounter it seems, so you'll just get the smaller ones.

CaribbeanChoice
11-19-2004, 04:09 AM
What about bots that don't load images or javascript? I am guessing they wouldn't get counted by StatCounter? Or would they?

webado
11-19-2004, 04:13 AM
What about bots that don't load images or javascript? I am guessing they wouldn't get counted by StatCounter? Or would they?
Probably not those.

CaribbeanChoice
11-19-2004, 04:21 AM
I think the only way to track those kind of bots is server side.

grbeneke
11-20-2004, 10:43 AM
StatCounter can track certain of the bots/spiders and not others - and there are a number of variables!
I think the only way to track those kind of bots is server side.This is correct - the only GUARANTEED way of tracking bots/spiders is through your server logs.

There are 2 main ways that StatCounter tracks the hits - through the JAVA script and through the request to the StatCounter server for the counter image (this includes the transparent image used for the invisable counter)
And so whether hits are recorded depends a lot on how carefully the bot is going through your site.
Take for eg a big SE like Google or Yahoo: they are indexing billions of sites per day so they have tried to make their bots as light-weight and efficient as possible. Thus that they don't run ANY Javascripts they find (wastes CPU time on their servers) and (generally) they don't care about your pics either so they never even request them from your (or StatCounter's) servers. Therefore no hits counted - just one hit per html page within your server logs.
However there are some bots - including some private bots people on their personal machines to scour the web - that take a little more time and resources and look a bit more carefully.
An eg of this - The w3c validator. I think their server processes all the Java in each page that is validated and so StatCounter is able to record a hit from that.

So basically after that rather detailed - and probably boring to most people - explanation. Don't be surprised to find bots in your logs - but don't rely on that as complete information.
And a little aside - if someone disables Java AND sets don't display pics in their browser you probably won't track them either. But i don't think there are too many of those paranoid types around :D

webado
11-20-2004, 04:07 PM
StatCounter can track certain of the bots/spiders and not others - and there are a number of variables!
I think the only way to track those kind of bots is server side.This is correct - the only GUARANTEED way of tracking bots/spiders is through your server logs.

There are 2 main ways that StatCounter tracks the hits - through the JAVA script and through the request to the StatCounter server for the counter image (this includes the transparent image used for the invisable counter)
And so whether hits are recorded depends a lot on how carefully the bot is going through your site.
Take for eg a big SE like Google or Yahoo: they are indexing billions of sites per day so they have tried to make their bots as light-weight and efficient as possible. Thus that they don't run ANY Javascripts they find (wastes CPU time on their servers) and (generally) they don't care about your pics either so they never even request them from your (or StatCounter's) servers. Therefore no hits counted - just one hit per html page within your server logs.
However there are some bots - including some private bots people on their personal machines to scour the web - that take a little more time and resources and look a bit more carefully.
An eg of this - The w3c validator. I think their server processes all the Java in each page that is validated and so StatCounter is able to record a hit from that.

So basically after that rather detailed - and probably boring to most people - explanation. Don't be surprised to find bots in your logs - but don't rely on that as complete information.
And a little aside - if someone disables Java AND sets don't display pics in their browser you probably won't track them either. But i don't think there are too many of those paranoid types around :D

I love this explanation. I had the blurred notion of something like this but I couldn't explain so clearly. Thank you. :D

I really think this post should be made part of the FAQ file.

CaribbeanChoice
11-20-2004, 11:26 PM
Take for eg a big SE like Google or Yahoo: they are indexing billions of sites per day so they have tried to make their bots as light-weight and efficient as possible. Thus that they don't run ANY Javascripts they find (wastes CPU time on their servers) and (generally) they don't care about your pics either so they never even request them from your (or StatCounter's) servers.
This is also something good to note, for another reason. Google and other search engines index your page based on the TEXT on your page and in the html, and ignore javascript generated content. Something to keep in mind if you want to show up highly in search engines.