Compressed / Rotated logfiles
   

As the web server writes transactions to the logfile, it will grow, potentially becoming unmanageably large over time. Many web servers are configured to periodically rotate the logfile to keep the size manageable while avoiding the loss of any data. The server typically does this once per month for smaller sites and weekly or daily for higher traffic websites.

A typical web server writes to a logfile named access.log or similar. When the logfile is rotated, it is given a name that reflects the period of time it spans, such as access_april.log. After renaming the file, the server creates a new file under the access.logfile name and starts writing the latest data to it (in this example, data for May). Thus, the older data is rotated into archive files.

The directory containing the logfiles on the server might have several older files with names indicating the periods they span, as well as a file containing data for the current period, which is updated with each client request as it's received.

In addition, the web server often compresses the older data as it is rotated into archive. On UNIX systems, the format most often used is .gz (GNU Zip). Compressing the data is very efficient because logfiles contain much repetitive data, resulting in files that are 10% of the original size.

Thus, a very common scenario is to have the following files on the web server:

access.log  
april_2001.gz  
march_2001.gz  
february_2001.gz  
 
Access.log contains all web requests from May 1 onward. The other three files contain data for each month, as indicated by the names.

When you use ClickTracks, you do not need to worry about reading the same data twice. If you import a logfile that ClickTracks has read part of before, it will simply ignore the lines it has previously read.

Virtual Servers / Multiple Domain Logfiles