Check Your Logs for Spammers and Splogs using AWstats and Excel

Status
Not open for further replies.
#1
This subject has been discussed often here, so I thought there would be interest in this:

http://www.maxpower.ca/how-to-check...gs-using-awstats-and-excel-part-1/2006/10/25/

Quoted from the site:

"One thing I know most webmasters (including myself) don’t do enough of is sift through our weblogs. Website logs collected at the server level can tell us a myriad of things but most importantly, they can help us pinpoint who is stealing our bandwidth (via out of control robots and comment spammers) and who is stealing our content.

A tiny bit of hard work, the ability to follow directions, some familiarity with excel, and a keen eye for the out of place is all you need to figure out who is stealing from you by looking at your logs. Let me show you."
 
#2
That tool looks as if it'll do a lot of useful stuff. From my own part I already use Excel. I've posted on here before about downloading raw logs and using autofilters to sift through for error codes and suspicious activity. Those plugins should lift the whole process to a new level and I'll have a better look at them first chance I get.

Thanks CJ.:wink:
 
#4
JWJ said:
That tool looks as if it'll do a lot of useful stuff. From my own part I already use Excel. I've posted on here before about downloading raw logs and using autofilters to sift through for error codes and suspicious activity. Those plugins should lift the whole process to a new level and I'll have a better look at them first chance I get.

Thanks CJ.:wink:
Yes... I knew you did that, and, at first, I was going to ask you to report the whole process, but, knowing you were very busy at the moment, I decided to go search the www... and that's when I found these articles.

Note that he has another article, "what to do when you've found a thief". (lower case to indicate it's from memory, and may not be the actual title)

I'll be very interested in hearing your conclusions.

As I posted under another thread today, my hits have suddenly zoomed, and I'm hoping to find out why.
 
#5
Maybe it's because spring has come and summer will be soon and people are thinking more about RV's and things like that. Hopefully, anyway ;)
 
#6
Well I've spent a little time looking at the Awstats Analysis Workbook and here are a few observations, for what they're worth.

This is a more formal, and simpler version of what I already do using Excel. Rather than download the entire Raw Access Logs as I do, this spreadsheet relies on a simple Copy/Paste of the Hosts section of your awStats. The Hosts table is pasted into an existing 'template' and associated Excel worksheets then extract and work on that data. Simple filters allow you to find the highest bandwidth users and identify possible spammers/bots by looking at Page/Hit ratios. A section that creates htaccess rules to block bad IP's is useful and a link to WhoIs helps make sure you don't ban the good guys like Google.

One problem I encountered was with the copy/paste of the statistics. The instructions tell you to copy and paste to Notepad first and then open the text file so created in excel. This is then copy/pasted into the template. In reality I found this didn't work correctly and pasting directly into the template without first going to Notepad worked just fine. This may be a difference in Excel versions maybe. (?)

I shall continue to analyse my entire logs as I do now but for anyone who wants a bit of automation this is a very useful tool.
 
#7
JWJ said:
Well I've spent a little time looking at the Awstats Analysis Workbook and here are a few observations, for what they're worth.

This is a more formal, and simpler version of what I already do using Excel. Rather than download the entire Raw Access Logs as I do, this spreadsheet relies on a simple Copy/Paste of the Hosts section of your awStats. The Hosts table is pasted into an existing 'template' and associated Excel worksheets then extract and work on that data. Simple filters allow you to find the highest bandwidth users and identify possible spammers/bots by looking at Page/Hit ratios. A section that creates htaccess rules to block bad IP's is useful and a link to WhoIs helps make sure you don't ban the good guys like Google.

One problem I encountered was with the copy/paste of the statistics. The instructions tell you to copy and paste to Notepad first and then open the text file so created in excel. This is then copy/pasted into the template. In reality I found this didn't work correctly and pasting directly into the template without first going to Notepad worked just fine. This may be a difference in Excel versions maybe. (?)

I shall continue to analyse my entire logs as I do now but for anyone who wants a bit of automation this is a very useful tool.
OK... I've got my Raw Access Logs pasted into Excel, but it's not readable yet, as the fields are not separated properly.

Is that my next step... go through and figure out the field lengths and what each field contains and give it an appropriate name... or should I be able to find a ready-made worksheet?

Thanks, John!
 
#8
CJ, if you want to use the ready-made spreadsheet in your original post then simply copy/paste from the Hosts section of your awstats. Their method involves pasting the Host stats into Notepad and saving as a text file, opening the text file in Excel making sure the import wizard is set to 'Tab' delimited and then pasting the resulting spreadsheet into the Analysis template. Follow their instructions but, if like me, you find opening the text file in Excel results in all the data going into the first colum, try skipping the second step with Notepad and paste directly from the Stats page into the template.

On the other hand, if you are trying to read the entire Raw Access Log as I do, then the method is completely different. Download the log file and extract it from the zip. Open the file in Excel and the import wizard will kick in. It is not necessary to define every field size. In step 1 of the wizard select "Delimited". Click Next. In step 2, select Delimiter type "Space" ... not Tab. Click Finish. (You could click next and go on to step 3 to be more selective over which fields are imported, but it's not necessary).

That gives you the data in rows and columns but I go two steps further. Highlight across a few colums and click on Format/Column/Autofit. That makes the data more readable. Then goto A1. Click on Insert/Rows and a blank row should open up at the top of the sheet. Into A1 type a short heading .... I type "IP". Then click on Data/Filter/Autofilter. (You could skip the row insert and heading steps but if you do the Autofilter will be on the first line of your data).

Now experiment with the Autofilters and you'll discover the power of Excel. The 'Custom' settings are particularly useful for pulling out those records involving particular pages or images.

I hope this helps but if not just holla. ;)
 
#9
Thanks, John, that is a great help.

You just told me things I didn't know about Excel, mostly that it's a lot easier to use than I thought.

I thought it was necessary to count columns to separate out the fields.

Now, I understand that you are working with "Raw Access Logs", rather than Awstats logs... and that they are different.

I thought they were the same thing and that Awstats was just doing analysis on the Raw Access Logs.

I see how to get the "Hosts" file from the Awstats page, but I still can't find a place to download the entire Awstats log as a zip. Does Awstats, in fact, offer that, or am I looking for something that doesn't exist?

Thanks again... this whole thing is getting more fun all the time!
 
#10
CJ, Awstats IS analysing the Raw Access Logs. I'm still unsure if you're trying to use that Analysis Workbook you posted about, or are trying to do things my way. If it's the former then you don't need the log file, if it's the later then you do.

To download the whole raw file in a zip (if that's what you want to do) you need to use cPanel. Go to the Logs section ... click on Raw Access Logs ... and up should come a list of log files (with a gz extension) with the instruction "click on a log file to download it".
 
#11
JWJ said:
CJ, Awstats IS analysing the Raw Access Logs. I'm still unsure if you're trying to use that Analysis Workbook you posted about, or are trying to do things my way. If it's the former then you don't need the log file, if it's the later then you do.

To download the whole raw file in a zip (if that's what you want to do) you need to use cPanel. Go to the Logs section ... click on Raw Access Logs ... and up should come a list of log files (with a gz extension) with the instruction "click on a log file to download it".
I'm trying to do things your way, but, in my efforts to figure out the whole thing, was asking an additional question.

In my search (mentioned earlier) for a way to look at data as you are doing (this was before I got your "how to" message), I was trying to test the trial version of a program, aws2xls, which I found here:

http://www.internetofficer.com/awstats/excel/

I thought it was supposed to convert the Raw Access Logs, but it didn't work.

In response to my request for help, the programmer (of aws2xls) answered this way:

=================================================================
aws2xls does not convert the web server log files to Excel files. aws2xls converts AWStats data base files to Excel files. The AWStats data base files are generally called awstats042007.internetofficer.com.txt or something similar.

I hope this clarifies the matter.

Jean-Luc
InternetOfficer SPRL
=================================================================

I was trying to use the file that Bluehost Control Panel, cPanelX, calls "Raw Access Logs", with a gz extension (as described by you above).

I can't find the files Jean-Luc describes as "AWStats data base files generally called awstats042007.internetofficer.com.txt or something similar".

Thanks for your patience!
 
#12
Look in the home directory, for a tmp folder. You may find an awstats folder in it. If you have that, you will find in it 2 kinds of files: configuration files (one for eqach domain and subdomain of the account) and then regular monthly awstats data files for each of the conf files.
 
#13
CJ, I've looked at aws2xls and have been succesful in using it. As Chris states, you need to locate your tmp folder and inside that is one called awstats. In there you will find a number of files called awstats032007.yourdomain.net.txt where '032007' could be any month/year. I downloaded mine to my desktop using my usual ftp program. Running aws2xls then converts the file to awstats032007.yourdomain.net.xls which will obviously open in Excel. The new spreadsheet presents the data in a readable form but, to be honest, is little better than Awstats itself.

If you decide to use this program let me know how you get on and what you think of the result as compared with what awstats already displays. I may have missed the point. :)
 
Last edited:
#14
That's it!

My files are under /tmp/awstats/

And as Jean-Luc said, they are named:

awstats042006campingandrving.com.txt
awstats042007campingandrving.com.txt

And... for anyone else looking at this for the first time, notice that the files are sorted in numerical order, not date order.

Thanks very much, Chris!
 
#15
If you can't find the gz files under a heading "Raw Access Logs" in cPanel, then you could look for them using your usual ftp program. They are in a folder called 'logs' which is at the same level as the tmp folder previously discussed.
 
#16
JWJ said:
CJ, I've looked at aws2xls and have been succesful in using it. As Chris states, you need to locate your tmp folder and inside that is one called awstats. In there you will find a number of files called awstats032007.yourdomain.net.txt where '032007' could be any month/year. I downloaded mine to my desktop using my usual ftp program. Running aws2xls then converts the file to awstats032007.yourdomain.net.xls which will obviously open in Excel. The new spreadsheet presents the data in a readable form but, to be honest, is little better than Awstats itself.

If you decide to use this program let me know how you get on and what you think of the result as compared with what awstats already displays. I may have missed the point. :)
OK.. I am going to at least check it out, and will let you know what I find.

Thanks, John and Chris!
 
#17
I haven't completely checked out the programs we have been discussing, but at this moment, I think I have determined that the information I am looking for is in StatCounter's Recent Visitors.

It identifies the visitor by Location and HostName, and gives the Entry Page, the Exit Page, and the Referring URL

That's what I need to know, I think, to figure out how they found my site, what they were looking for, and did they browse around looking at any other pages.

I still haven't found out why I'm getting so many hits all of a sudden. (I really like Chris and John's idea that maybe it's because it's Spring and that's when a lot of people get interested in RVing, but I think there's more to it than that.)

I created a page for Sharron's Bluebonnet photos (with her permission, of course), and it's been getting a lot of hits, almost all from message boards or "no referring link".

I don't know why so many hits unless someone else posted it on a high traffic message board.

I'm not sure what "No referring link" means when it's such a new page. I though that meant they had it marked as a Bookmark or Favorites... but they still just look at that one page, so they didn't come back to browse.

Or at least, I don't think they are browsing... if the "Entry" and "Exit" are the same... wouldn't that indicate they didn't leave that page?

Anyone who wants to see Sharron's page, look at my website: http://www.campingandrving.com/

One of her Bluebonnet photos is about halfway down on the home page, then under the photo is the link to "Texas Bluebonnets and Easter Snow".
 
#18
A test of an older mind!!

Thanks for the wonderful information. I will have to see if my "old" mind can follow the directions. :roll: I am wondering if this is what is happening to my website? I have a proxy hostname that has camped on ONE page of my website for over a week now and I have no clue why. I only know my sales from that one page have died off. Any clues or comments? I am a computer idiot so please be gentle with me if you have ideas. Thanks and enjoy the wonderful weather.8)
 
#19
Nancie, why are you using an https connection to your site and why are you redirecting http to https?

This is useless (and doesn't work well even) except when you are on a page that collects information from a user, completing an order and such. Such pages shoudl not be allwoed to be idnexed by search engines even.

Right now the site is not working properly because of the mix of https and http on the same page.

Get rid of https for all but your pages that require it.

I bet your visitors get put off by the warnings about secure and unsecure items on the same page and are running off.
 
#20
This subject has been discussed often here, so I thought there would be interest in this:

http://www.maxpower.ca/how-to-check...gs-using-awstats-and-excel-part-1/2006/10/25/

Quoted from the site:

"One thing I know most webmasters (including myself) don’t do enough of is sift through our weblogs. Website logs collected at the server level can tell us a myriad of things but most importantly, they can help us pinpoint who is stealing our bandwidth (via out of control robots and comment spammers) and who is stealing our content.

A tiny bit of hard work, the ability to follow directions, some familiarity with excel, and a keen eye for the out of place is all you need to figure out who is stealing from you by looking at your logs. Let me show you."
Thanks for this very useful information.
I was not aware of these problems until now.
From now, I too will analyse my logs and findout such spam activity if at all there is.
 
Status
Not open for further replies.
Top