In part one of my “Automated Web Analytics Testing with Selenium” series, I described the steps necessary to configure a custom web analytics beacon to enable automated testing of your site’s web analytics code. If you missed part one, you can find it here:
http://stdev.wordpress.com/2010/01/15/selenium-automation-webanalytics/
Since establishing the beacon, it has collected over 50,000 unique variables across countless pages, multiple test environments and from a combination of automated and manual test clients. While performance was a consideration in the original solution design, exposing the beacon to multiple test clients and introducing the possibility of parallel result processing quickly revealed a scalability problem. Given this (and for maintainability reasons), I decided to port the beacon from PHP, MySQL and Apache to .Net, MSSQL and IIS6. If you’re interested in the details, let me know and I can email you some info.
Below is a summary of some of my findings from the project with some important considerations for anyone wishing to develop their own custom analytics beacon.
Be Selective
In the beginning, it’s likely that your beacon will be logging ALL analytics data being sent to it. This isn’t a problem, but it can turn into one if left too long. Logging duplicate or irrelevant data will unnecessarily increase the size of your database and potentially increase processing times.
After you’ve gathered a good amount of data, perform some analysis to determine whether there’s anything being logged that you don’t care about. For example, it’s very likely that you’re beacon is storing analytics variables which detail page access times. In terms of testing your analytics code, these dynamic variables probably don’t reveal much and can be ignored.
In addition, ensure that any url’s you’re using to identify pages are stripped of any variable query string data before logging. You’ll likely be using these urls to identify unique pages so this variable data can mislead your beacon into logging another instance of what is actually an existing page.
Manage Undefined Variables
Undefined variables might show up in your analytics query string data. You wont know where and you won’t know when so it’s important to handle this data appropriately within your beacon code. Before deciding whether or not to store undefined variables (in some fashion), think of the situations where they’ll be used. For example, if the undefined variable is a host, url or page name field, it’s very likely that these core attributes will be used to query the database at a later stage. Ultimately, analytics data doesn’t mean much if you don’t know where it came from or what page it relates too but use your own judgement as to whether to keep or discard these records.
Single Page, Multiple Instances
In the web analytics world, a single page can have several different instances based on the path taken through the site to reach it. In order to log data for each of these instances, ensure that you log based on unique url and referrer, otherwise your beacon will overwrite the same instance of the page with each set of new variables that it receives.
Database Indexing
Naturally, exposing your beacon to multiple test clients increases the amount of data being processed and written to the database. Combine this with an all-inclusive logging approach (at least in the early stages of data collection) and chances are, you’ll be storing thousands of records. To prove the importance of indexing, my performance tests revealed that once the database grew to 100,000+ records, beacon processing and response times jumped from < 1 second to approximately 10 seconds. In my specific example, this ultimately resulted on our test automation timing out waiting for the analytics request to complete. In this particular case, indexing was a simple solution.
For those considering automating their manual analytics testing effort, the above steps should help you develop an effective and efficient analytics logging solution. In the last post of this series, I’ll detail the final steps required to fully automate your analytics testing, including baselining your resultset and displaying your test results.
Tags: automation, google analytics, omniture, selenium, web analytics






Hi there,
I would like to learn more about how you ported the beacon from PHP, MySQL and Apache to .Net, MSSQL and IIS6.
Thanks
Antonio
I would like to know how you built the beacon in a .NET framework environment.
Thanks
Raman