|
How ClickTracks Determines a Visitor Session |
|
A fundamental concept in ClickTracks is the way a visitor session is determined. This process greatly influences almost every statistic that ClickTracks calculates.
The following applies almost entirely to ClickTracks when used with logfiles. ClickTracks Hosted bypasses many of the complications described.
Definition of a session
The raw logfile contains requests from IP addresses for different pages, each on a different line and in chronological order. A fundamental problem to be solved in logfile analysis therefore is which pages are seen by which visitors. This is further complicated by the fact that requests from different visitors are mixed together in the log. A method must be used to extract the pages for visitor A and correctly associate them in order, and to simultaneously perform this for all other visitors. This process is know as sessionizing the data. A session is the sequence of pages seen by a single visitor during their visit to the site.
What is a visitor ?
A visitor cannot be determined solely from their IP address. The IP address can change during the course of the visit length (A problem with AOL and some visitors originating at large companies). A single IP also may represent many different visitors at different times or at the same time, for example a large company will have all browser requests coming via a single firewalled IP. ClickTracks therefore uses several techniques to join pages as they are read from the logfile to the correct visitor session:
| 1. | A session cookie present in the logfile guarantees accurate sessionization for the second and later requests (there's no cookie in the first request, so the sessionization falls back to the no cookie case and uses the heuristics described below). ClickTracks Analyzer uses PHP, ASP or JSP session cookies automatically. ClickTracks Pro can be configured for any custom cookie. |
| 2. | Without a session cookie ClickTracks uses a combination of partial IP address (to account for changing IPs for AOL etc.), user agent, time since last request and referrer to select the most likely session the request belongs to. These factors are fed into a heuristically based algorithm that factors in how busy the site is. The accuracy of the heuristics ranges from 100% for medium sites to 95% for very busy sites without cookies and large numbers of users from AOL. |
Sessionization from start to finish
Logfile reading
| • | Graphics files are stripped. Since ClickTracks doesn't calculate technical stats like server bandwidth there's no need to count them. |
| • | HEAD requests ignored. If it's a valid page request it will be followed up with a GET later |
| • | Status code checked: 200-206 & 304 are successes. 300-307, except 304, are redirects and counted. All other codes are failures and the request is dropped. |
| • | Check the user agent. If it's a known robot/spider like Googlebot the request is dropped (a separate report uses this data however) |
| • | Check if this logfile line has been read before (ie it exists within the dataset). Duplicates are dropped. This process prevents accidental re-importing of the same logfile from resulting in double counting, and permits overlapping logfiles to be imported and only the overlapping range is dropped. |
Building sessionized visits
| • | Strip requests that are excluded via the options dialog. |
| • | Is the referrer 'external' (ie not a domain that's part of the alternate domain names) ? If so a new visit session is started. This ensures PPC tracking is accurate, since a visitor coming through in rapid succession on several PPC ads (clicking back to the search engine between them) is the same visitor refining their search, yet each click must be counted distinctly. |
| • | Session cookie present ? If so find the corresponding visitor data and add this page to the session. |
| • | If no session cookie fall back to heuristics based on partial IP, user agent, time, etc. Find the most likely visit session to attach the page to. |
| • | If no existing session is found a new session is started using this page as the opening page. |
| • | Has the session reached the maximum duration, number of pages etc as determined by the options setting? If so the session is deemed to be complete. |
ClickTracks Pro persistent cookie tracking
The cookie database in ClickTracks Pro is examined also to determine if this visitor session represents a visitor previously seen. If so the number of unique visitors will account for this, and the original campaign referrer and landing page are recovered from the database and placed into the dataset. Visitor conversions that take place long after campaign completion are correctly tracked.
Adding sessions to the dataset
Once a session is complete according to the timeout parameters, additional sanity checks are performed before the session is committed to the dataset. Some robots do not provide a valid user agent so ClickTracks examines the session to see if a repeating pattern of pages is being requested. If this is found the session is dropped and not counted.
Valid sessions are updated with the session length (up to the last page) and exit pages etc. and then committed to the dataset.