Under the Hood: Preventative care against visit/visitor inflation

Previously, I’ve discussed how visit and visitor measurement works in SiteCatalyst. However, there are some nuances that I haven’t mentioned, and which can wreak havoc on an otherwise solid implementation. I have seen numerous cases of good development hampered by minor errors which cause visit and visitor inflation, so I’d like to discuss briefly a few ways that you can ensure that you aren’t duplicating these metrics.

Make sure that key configuration variables are

always

consistent—even across collection methods

This is a common problem which is, unfortunately, easy to make. You’ve implemented a first-party data collection domain in JavaScript (in your s_code.js file) using the s.trackingServer and s.trackingServerSecure variables. For example:
s.visitorNamespace="awesomesite" s.dc="112" s.trackingServer="metric.myawesomesite.com" s.trackingServerSecure="smetric.myawesomesite.com"

This will cause the s_vi cookie to be set on, and read from, the “myawesomesite.com” domain. Now let’s say you begin to implement SiteCatalyst in a Flash application that will live on your site. You download the ActionSource code and component from SiteCatalyst. In your excitement, you forget to implement the two variables discussed above:

s.visitorNamespace="awesomesite" s.dc="112"

This will cause your Flash application to use a different data collection domain than your JavaScript code. The result? One visitor ID value for JavaScript image requests and a separate visitor ID value for Flash image requests, which means that you’ve duplicated visits and unique visitors for anyone who touches both your Flash and your non-Flash implementation.

The same thing applies (although a bit differently) when mixing JavaScript and Data Insertion (XML) API implementation. There, you would want to read and parse the visitor ID cookie value in your API implementation and pass it using the element. If you pass a different value, or leave out of your implementation, SiteCatalyst will not tie the data passed via the API to the data passed using JavaScript, even if all of this data came from the same user.

Leave that s.visitorNamespace variable alone

This applies primarily to those implementations based on third-party data collection domains. As described briefly above, the data collection domain determines where the visitor ID is set; as such, it must be consistent from page view to page view and from visit to visit. The s.visitorNamespace variable controls the subdomain for data collection. Because it frequently matches or suggests the name of your organization, you may feel tempted to change it when your company’s name changes or when you are introducing a new brand.

s.visitorNamespace="myawesomesite" s.dc="112"

These two variables (given the absence of the s.trackingServer and s.trackingServerSecure variables) will cause the data collection domain to be “myawesomesite.112.2o7.net.” Changing the visitorNamespace variable will not prevent data collection, so it’s possible that a change here would go undetected. But since the data collection domain has changed, every visitor will be treated as brand new. This is commonly called “visitor cliffing.” Since every visitor becomes a new visitor, unique visitor metrics will spike; “lifetime” reports, such as Return Frequency, Visit Number, and Original Referring Domain will be “reset.”

Within a single site, stick to one global JavaScript file

I cannot think of a good reason to have multiple s_code.js files on a single web site, but I do see this from time to time. I understand that certain pages may require different variables to be set in different ways (e.g., you use “cid=” in the query parameter to capture campaign tracking codes on pages owned by your team, but another team uses “campaign=” and cannot change this), but this should still be possible by setting variables dynamically within a single file. Using multiple s_code.js files only increases the likelihood of a discrepancy between key configuration variables, such as those discussed above. For example, if you use a first-party data collection domain, this may be set in one file, but not in another, which would cause the second file to rely on a third-party domain. The effect would be similar to the point above regarding the inclusion of s.trackingServer and s.trackingServerSecure in all forms of data collection; because the two JavaScript files would not contain the same configuration variables, you would end up with multiple visitor IDs and, thus, multiple visits and visitor counts.

So what can you do to detect these problems when they occur (so that you can correct them using the information above)?

1. Always debug using visits and visitors—not just page views
Checking your Pages report and your Custom Traffic reports is important, but they may not always tell the whole story—at least when showing the Page Views metric. A visit may involve a number of page views for a certain line item in a Custom Traffic report, but if the number of visits matches the number of Page Views, you might have a problem with visit and visitor measurement. Also, make sure to compare visits and visitors to previous time periods. Some spikes are natural and good—the result of marketing efforts—but sudden spikes in visits and visitors that correspond to implementation changes made on your web site can be a sign of trouble.

2. Check “lifetime” reports
If you typically see 60% first-time visits, and that number jumps to 90% all of a sudden, it probably isn’t that you’re driving away your loyal customers. Rather, you may be seeing visitor cliffing in action.

3. Use a packet monitor while browsing your site
There are a number of debugging tools that will show you the data collection domain that your SiteCatalyst implementation uses. You can use these tools to detect changes—both against historical data collection, and from image request to image request on your site. For example, Tamper Data for Firefox will show all image requests and will update with new ones as you browse. Simply look at the beginning of each request URL to confirm that the data collection domain is consistent. For bonus credit, you can double-click the “Cookies” details for each request to ensure that the s_vi value is consistent from request to request.

Fortunately, issues involving visit and visitor inflation are rare. These issues probably aren’t affecting your data, but when they do arise, they can be difficult to troubleshoot and can give even the best analysts severe headaches. I don’t want that to happen, so I hope this helps you understand the process that you might follow in validating an implementation against visit and visitor inflation, as well as what you can do if you are seeing strange things in your data.

As always, please leave a comment with any questions, thoughts, or suggestions that you may have! I’m also available Twitter, FriendFeed, LinkedIn, or by e-mailing omniture care [at] omniture dot com.