Webalizer hacking

February 4th, 2009

On my websites, I use webalizer to generate statistics about my visitor. I know there are other packages available, such as Google Analytics, but I am to lazy to look into that :) Anyway, I sometimes change the settings of webalizer, and if you are also a user of this program, you know that with the Incremental option on (which is default and I think, sane) your config changes do not take effect on already collected statistics. This kind of sucks, because you’ll have to wait for a month to see if your new setting of Grouping/Hiding user agents works.

Or not? I couldn’t find a website on how to regenerate the statistics, but I found a way to do it. You’ll only need the apache logs for the month you want to regenerate them. If you have them, you can follow my instructions, but make backups of your files before you do. Also, this works for me and it’ll probably work for you, but YMMV.

So then, first we have to collect the apache logs of one month (the one we want to regenerate). Make sure you do this when webalizer is not running (for me, it runs at 06:00am). Copy the log files to some place and unzip them if necessary. I catted them together so I only have 1 file:

mkdir /tmp/hack
cd /tmp/hack
cp /var/log/apache2/<site>/access.log{1,2.gz,3.gz} .
gzip -d access.log.2.gz
gzip -d access.log.3.gz
cat access.log{3,2,1} > all.log

Now edit all.log and delete all the entries at the beginning (or end) that are not in the month that you want to regenerate. I regenerated 2 months at the same time, january and february, so I deleted everything from before january. Double check that only these dates are in the all.log file.

Also, take care of the order in which you cat the files together. I used access.log 3, 2, and 1, because that’s the order they are created; access.log.3 has the oldest data, access.log.1 has the newest. Please not that I don’t use access.log, because that one is not rotated yet. Webalizer uses the dates from the logs, so they have to be in the proper order.

When you do this, you delete webalizer.current in your sites OutputDir (no no, you move them, because you want backups!). For me, the OutputDir is in /var/www/webalizer/. This file holds incremental data of the last month, so if you regenerate earlier months, you’ll have to regenerate the last month too, I think :)

The last step before we regenerate the statistics is editing the webalizer.hist file. This shows the grand total of each month. Because we regenerate statistics for the whole month, we just delete the appropiate line. The first number of each line is the month, the second is the year. So if you want to regenerate january and february of this year, delete the ones that start with “1 2009″ and “2 2009″ :)

Then we run the normal command:


webalizer -c /etc/webalizer/webalizer-<site>.conf -Q /tmp/hack/all.log

This should do the trick. Checking the webalizer.hist file, there should be entries for the months you regenerated, and the webalizer.current file should’ve been created as well. More important, on the website, you can see your new report and all the old ones as well! I used this to test my settings and I finally got a satisfactory combination.

Comments are closed.