search.txt - Webalizer can pick out the search terms from your logs that are left behind in the referrer field when a visitor comes from a search engine. The default config it only picks out terms from Yahoo. This list of several dozen of the largest search engines along with their names and the search string keywords makes Webalizer much more powerful and accurate. Updated: February 23, 2005
sites.txt - If you have hostname data about the visitors to your website, you can get information on which
parts of the world and which ISPs most of the people visiting come from. The default configuration
for Webalizer only groups AOL users. This is not an exhaustive list, but should group most of the top ISPs as well as a few known spiders.
Updated: September 27, 2002
If you have corrections or additions for these lists, please send an email to Stephen.
Webalizer does not deal well with log files that are not completly sorted. If you find that webalizer is ignoring a lot of records, unsorted logs are one culprit. Not finding a log sorting program, I wrote one that sorts web logs that are in combined log format.
Log files are assumed to be semi-sorted. If too many log entries are out of order, an error message will be printed and you should try again with a larger look ahead pool size.
If P is the look ahead pool size, N is the number of log entries and F is the number of files, this program will run in order N*F*Log(P) time. (Notice that it is not optimized for a large number of files and that it will run faster with a smaller look ahead pool.)
Download and install the Java2 Software Developer Kit (SDK) from java.sun.com if you do not already have it installed.
Download the source code for the log sorter.
First compile the source code:
javac LogSorter.java
This program either reads form standard input (with no arguments) or from each of the files specified as arguments. Output always is written to standard output, errors to standard error. If you wish to sort logs and remove duplicate entries or if you want to bump up the look ahead pool size, simply recompile the program with those options.
java LogSorter access.log > sorted_access.log
java LogSorter server1_access.log server2_access.log server3_access.log > sorted_access.log
cat access1999.log access2000.log access2001.log | java LogSorter > sorted_access.log
java LogSorter access.log | webalizer
java LogSorter access.log | gzip > sorted_access.log.gz
gunzip -c access.log.gz | java LogSorter | gzip > sorted_access.log.gz
Copyright (c) 2001-2002 by Stephen Ostermiller
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.