**Before I begin, I’ve decided to make this a multi-part post. The post below will include some conceptual information and implementation requirements. I’ll be adding the remaining parts as I experiment with the concept over the next few weeks so stay tuned.**
In todays world, web analytics play an important role in developing & maintaining any successful web presence.
As more and more organisations begin to rely on their web analytics data to drive strategic decision making, the importance of verifying your website’s web analytics code becomes paramount.
From a testing perspective, web analytics testing is typically done manually. In fact, I’ve never heard of if being done any other way. The process involves scouring your website with a browser based analytics profiler like WASP and verifying that data being sent to your analytics provider is accurate. While this type of testing is relatively simple (and tedious), it’s repetitive nature makes it the perfect candidate for automation.
Fresh from integrating yslow performance analysis into our Selenium tests, the next step was to analyse what part Selenium could play in validating our onsite analytics code.
The concept for automated web analytics testing is relatively simple. As Selenium drives GUI based tests through your web application, we intercept any outgoing analytics data, capture it and store it in a database for further processing. The process isn’t actually all that different to what’s already happening under the hood with the exception of the “intercept” component – typically, the data hits your analytics provider where its stored and processed in a similar manner. Once you’ve captured this data, it’s simply a matter of comparing it against a production baseline and identifying any differences every time you execute a test run.
In order to implement the above, here’s a list of what I needed:
1) A website… with web analytics code embedded.
In my specific context, the target site includes a significant implementation of Omniture web analytics code. The concept will work with other analytics providers too so this solution isn’t confined to a single flavour of web analytics. Ultimately, the bigger your target site and the more analytics code it includes, the more value that you’ll derive from this exercise.
2) A machine that runs your Selenium based automation against your target site.
My guess is you already have something in place here so I wont go into any further detail.
For the sake of transparency, I run in a Selenium Grid environment where I’ve configured multiple clients with this setup.
3) A custom analytics host – essentially, a machine that will collect your analytics data.
This machine will impersonate your actual analytics provider. It needs to be running apache web server and must run a database (mysql or similar). PHP is recommended but optional (you’ll need it for (5) listed below but you can replace it with whatever technology you prefer).
NOTE: I’d strongly recommend implementing this on the same machine running your showslow server as the prerequisites are the same. In fact, if you’ve already got showslow setup, this should be a pretty simple extension for you.
4) A hosts file redirect which hijacks your analytics traffic and directs it to your custom analytics host in step 3. This needs to be completed on the machine performing your automated tests. Analysing your target site’s html should reveal your analytics host, for example analytics.company.com. Once you have this information, redirect all traffic accessing this host to your custom analytics host.
5) An analytics beacon to capture, parse and store your analytics data on your custom analytics host.
You need a little bit of development knowledge for this bit. Similar to how yslow / showslow work, your analytics data is usually sent to your analytics provider in a query string containing a series of name / value pairs. These pairs hold the analytics variable name and associated value. In order to strip these values, you’ll need to setup a page that you can direct traffic to (this page will be hosted on your custom analytics host) which strips parameters from the query string and logs them to the database. I’d strongly recommend having a look at how showslow’s yslow beacon has been setup as the concept is practically the same. Also, I have some code in this area that I could possibly share (after I clean it up a little) so feel free to get in touch if you need some assistance. Once you’ve setup the beacon, ensure you configure apache to load the beacon page automatically when you hit the location of the page in your browser.
Finally, you need to actually verify the format being used to send analytics data to your analytics host. If the URL actually includes subdirectories for example, “http://analytics.company.com/site/data?test=test&test1=test1&”, you’ll need to set a rule in apaches httpd.conf alias_module so it can redirect any traffic including the “/site/data” component of the URL. This will ensure your analytics data is directed to the right place.
Once you’ve configured the above, you should have most of the pieces in place to capture and log your analytics traffic. Part 2 of this post will include details on the pro’s and con’s of this implementation as well as some important factors which need to be considered before entirely eliminating your manual web analytics test effort.