Webalizer

The Webalizer - An open source program for server log analysis. It examines server logs and generates html reports based on the contents. Paste the following configuration sections into webalizer.conf to add powerful customization to Webalizer.


Configuration

search.txt - Webalizer can pick out the search terms from your logs that are left behind in the referrer field when a visitor comes from a search engine. The default config it only picks out terms from Yahoo. This list of several dozen of the largest search engines along with their names and the search string keywords makes Webalizer much more powerful and accurate.
Updated: February 23, 2005

sites.txt - If you have hostname data about the visitors to your website, you can get information on which parts of the world and which ISPs most of the people visiting come from. The default configuration for Webalizer only groups AOL users. This is not an exhaustive list, but should group most of the top ISPs as well as a few known spiders.
Updated: September 27, 2002

If you have corrections or additions for these lists, please send an email to Stephen.


Log Sorting

Webalizer does not deal well with log files that are not completly sorted. If you find that webalizer is ignoring a lot of records, unsorted logs are one culprit. Not finding a log sorting program, I wrote one that sorts web logs that are in combined log format.

Log files are assumed to be semi-sorted. If too many log entries are out of order, an error message will be printed and you should try again with a larger look ahead pool size.

If P is the look ahead pool size, N is the number of log entries and F is the number of files, this program will run in order N*F*Log(P) time. (Notice that it is not optimized for a large number of files and that it will run faster with a smaller look ahead pool.)

Downloading:

Download and install the Java2 Software Developer Kit (SDK) from java.sun.com if you do not already have it installed.

Download the source code for the log sorter.

Usage:

First compile the source code:
javac LogSorter.java

This program either reads form standard input (with no arguments) or from each of the files specified as arguments. Output always is written to standard output, errors to standard error. If you wish to sort logs and remove duplicate entries or if you want to bump up the look ahead pool size, simply recompile the program with those options.

Examples:

Sorting a singe log file:

java LogSorter access.log > sorted_access.log

Merging log files from several servers:

java LogSorter server1_access.log server2_access.log server3_access.log > sorted_access.log

Merging rotated log files:

cat access1999.log access2000.log access2001.log | java LogSorter > sorted_access.log
(concatinating all the log files and sending them to standard input has effect of running this program with one log file rather than F files.)

Sending the output directly to Webalizer

java LogSorter access.log | webalizer

Sending the output to gzip:

java LogSorter access.log | gzip > sorted_access.log.gz

Sorting gzipped log files:

gunzip -c access.log.gz | java LogSorter | gzip > sorted_access.log.gz

License

Copyright (c) 2001-2002 by Stephen Ostermiller

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.