URL Pruning

   

URL Pruning is a way to have ClickTracks temporarily delete a portion of a URL which you don't want to analyze. The typical reason for using URL Pruning is when the same page is referenced by multiple URLs.  In these situations, some path in the URL is often not meaningful in terms of describing the page.  For example, if /products/product1.html is really the same page as /product1.html you could prune out the '/products' part of the URL, and all visitors to either of these  two URLs will be seen has having visited the same page. 

 

Another common scenario for requiring URL pruning is when URLs contain a session ID or some similar string that makes pages look different when they really aren't.  This is particularly useful for sites with very complex URL structures and lots of different pages.

 

 

Datasets-Edit_Pruning

 

Pruning with Masking

 

URL Pruning can be done very simply by entering a particular string into the masking field.

 

For example, perhaps there was an old portion of the URL from a prior version of the website:

 

/megasite/OLD/customer.html   (old version of URL)

/megasite/OLD/areacode.html

....

/megasite/customer.html       (new version of URL)

/megasite/areacode.html

 

In the Mask field, enter:

 

/OLD

 

and Save Changes.

 

This will remove that string from every URL, so the above examples would contain the values:

 

/megasite/customer.html

/megasite/areacode.html

 

 

Pruning with regex

 

URLs can be pruned using a more advanced regular expression (regexps) syntax. Regular expressions are often know as regexps and are familiar to web developers through programming languages such as Perl and PHP.

 

Regular expressions are useful if you want to remove a variable string, such as a session ID, from your URLs. Suppose your URLs look like

 

http://www.example.com/catalog/pineapple.html/102-0590433-8620953

 

where the last part is a session ID that you want to remove. Then you could specify the regular expression

 

/[0-9]{3}-[0-9]{7}-[0-9]{7}$

 

to remove the session ID from the end.

 

More information on regular expressions and syntax can be found through Google. A good primer is at

 

       http://aspn.activestate.com/ASPN/docs/Expect_for_Windows/1.0/regex.html 

       

 

Note: It should be understood that URL Pruning is not used for doing find/replace operations.  It is strictly used for removing certain strings from URLs, not for adding other strings in.