Web traffic, Amazon Web Crawlers, and Ad CTR Anomalies

Status
Not open for further replies.
#1
I have a few very strange things happening on my site, and I'm hoping someone might have some insight. I use both StatCounter and Analytics to monitor my traffic. One thing I've noticed on StatCounter is that I have a ridiculous amount of Amazon.com (Ashburn, Virginia) traffic that appears to be crawler traffic. In recent weeks I've been seeing over 40% of my traffic from web crawlers, and today in particular it's at 73%. What could be going on?

Also, and I feel like it might be related....I use Adsense and I have been seeing a large number of invalid clicks that are boosting my CTR but not being registered for revenue. The odd thing is that both Adsense and Analytics register the clicks, but StatCounter doesn't show exit link activity on the google ads to reflect what is being reported in Adsense. Could this odd ad clicking anomaly be related to the web crawler traffic reported by StatCounter? And, what should be done about this web traffic situation?

I appreciate any advice that can be offered. My website reports will also appreciate it. :)
 
#2
Statcounter cannot track all Google Adsense clicks, only some of them (depends on the type of ads).

Analytics cannot track hits from javascript disabled visitors (that includes image enabled robots which Statcounter does track).
 
#3
I am fairly certain I have gotten to the bottom of things. I noticed that an increased amount of Amazon AWS Crawler traffic seemed to coincide with the increase in invalid CTR. I added a line in my .htaccess file to deny the Amazon AWS Crawler and it appears to have resolved the issue. Very odd that Amazon's crawler would be causing such issues.
Here's what I added to .htaccess

Code:
Order Deny, Allow
deny from www.amazonaws.com

Of course, this may not get all of the Amazon crawler traffic but it will get the majority. If you run into the issue and want to find the other networks to block I suggest searching for those IP ranges. There are quite a few.
 
#4
Blocking amazonaws.com works, but at a cost

I've found that the amazonaws network is too large to block outright. There seem to be plenty of legit sites hosted within the ec2-cloud. One that I rely on a good deal is Pinterest. If I keep amazonaws.com blocked, then pins can no longer be made of my content.

For now I've pulled all of the aws ip addresses that have crawled through recently and created deny rules for them in my .htaccess file. This is kind of helping, but I still get a several dozen invalid ad clicks from aws crawlers per day. The only method that has solved the traffic spikes and invalid ad clicks is blocking the entire amazonaws network.

There are plenty of resources out there relating to people having similar problems and there have been some (many outdated) posts here and there with IP lists for aws crawlers. This article sums it up well:

http://www.seo-theory.com/2012/03/14/amazon-web-services-is-slowly-crushing-the-independent-web/

Is anyone else struggling with aws crawlers? What are you doing about it?

Thanks.
 
#6
Evidently they do

I agree that robots don't click ads. However, deny traffic from the aws crawler stopped the invalid ad clicks. When I allow traffic again the bad ad clicks start up again immediately. Unfortunately, I can't track the exact IP addresses of the crawlers that are the culprits since only Analytics shows the clicks, but Google doesn't cough up IP Addresses as part of their ToS. However, I can see the networks that generated the invalid ad clicks and they are all from amazonaws, and then going back to statcounter I can see that all of the amazonaws traffic has been crawler traffic.

So, in this case, for some reason....crawlers are clicking ads. What's worse, ads that are housed in a directory disallowed in the robots.txt file.
 
#7
I guess they are rogue robots, with super powers if they can click javascript ads.

Actually they may be some sort of browser plugins rather than robots.
 
#8
That's precisely what's going on. Based on that seo-theory article I linked and other info out there it seems that these are rogue robots that aren't following robot.txt files or properly defining their user agent.
 
Status
Not open for further replies.
Top