Should you put your Sitecore site behind a load balancing proxy you will run into a bit of a problem with analytics where all the Engagement Analytics app would ever report would be your own load balancing proxy’s IP address. Needless to say that is fairly big loss as you are unable to reap a host of benefits of the Sitecore CEP like page personalization, automation and reporting.

While our website was running on Sitecore 6.4 this problem was already solved and we have implemented something akin to the solution from Jeroen?s blog. Meanwhile Sitecore has reimplemented its analytics engine from the grounds up in version 6.5, and I should say they did a great job on that. However in the process the API for the analytics has changed to a degree that made our current solution obsolete, basically what we got logged looked as follows:

BrokenAnalytics


Unfortunately, and I suspect it’s because 6.5 is not a recommended version yet, this is not yet very well documented. Standing on the shoulders of giants, I?ve found a hint on the solution on John West?s blog. However after implementing this ? the default report still showed the same? no luck? so I started looking in the analytics database. and found that the data in fact is not kept on the Page Request level but rather on the Visit level. Even better(!)? So there is no need for me to set it on every call to the server like before but instead I can dig into the improved analytics pipeline and plug only once when the first call to a server is made by the visitor. For that purpose I need to inherit a CreateVisitProcessor class and override the following method:

public override void Process(CreateVisitArgs args)

The body of the method is somewhat similar to John?s implementation however for the report to show proper visitor ISP/company name you need to overwrite the RDNS field of the VisitRow object  – IP/GeoIP is not enough.

The body of my class looks as follows:

private static string[] headers = { "true-client-ip", "X-Forwarded-For" };

public override void Process(CreateVisitArgs args)
{
  VisitorDataSet.VisitsRow visit = args.Visit;
  HttpRequest request = args.Request;

  if (visit == null || visit.Ip == null)
  {
    return;
  }

  string forwardedFor = string.Empty;
  string headerUsed = string.Empty;

  // looking through all the headers where the true client IPs might be sent
  // the headers are ordered from highest to lowest priority
  foreach (string header in headers)
  {
    forwardedFor = request.Headers[header];
    if (!string.IsNullOrEmpty(forwardedFor))
    {
      headerUsed = header;
      break;
    }
  }

  if (!string.IsNullOrEmpty(forwardedFor))
  {
  // do the best effort for parsing x-forwarded-for
  // we need to check for multiple coma separates addresses
  // as there can be multiple proxies between the server and the client
  // this might not be 100% reliable as soem proxies and CDN's don't
  // append their IP's in the back but rather put themselves in the front
  string rdns = forwardedFor.Split(',').First().Trim();
  byte[] bForwardedFor = rdns.Split('.').Select(p => byte.Parse(p)).ToArray();

  var proxyIp = visit.Ip;
  visit.GeoIp = Tracker.Visitor.DataContext.GetGeoIp(bForwardedFor);
  visit.Ip = bForwardedFor;
  // the following needs to be set for the reverse dns lookup done by Sitecore
  // be performed on the right IP
  visit.RDNS = rdns;

  // feel free to eliminate that or change the default logging level to Debug
  // to stop it from poluting your logs
  Sitecore.Diagnostics.Log.Info(String.Format(
    "Proxy overridden IP address: original [{0}.{1}.{2}.{3}] "+
    "replaced by [{4}] from header '{5}'.",
    proxyIp[0], proxyIp[1], proxyIp[2], proxyIp[3],
    forwardedFor, headerUsed), visit);
  }
}

Now you may wonder why do I use more than one signature? The problem is that both CDN-s and proxy servers have a tendency to abuse the “x-forwarded-for” header, for example some append their IP in the beginning rather than append it as a last. This, together with potentially having a number of proxies on the way is why we usually configure the CDN instances for our clients to put the real user IP address in an additional field that we can reliably parse.
Pawel Cegielski (our local Cognifide’s CDN guru) will writing more about that on his blog.

Now to integrate it with the Sitecore CreateVisit pipeline add the reference to your class as the first line in the createVisit section of the sitecore.analytics.config file similarly to the following:

<createVisit>
  <processor type="Cognifide.SiteCore.Framework.Facilities.Analytics.CreateVisitForProxy,Cognifide.SiteCore.Framework"/>
  ...
</createVisit>

Your analytics should now work as if the proxy or CDN was never there…

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...



This entry (Permalink) was posted on Wednesday, October 5th, 2011 at 8:52 pm and is filed under .Net Framework, ASP.NET, C#, Code Samples, Sitecore, Software Development, Solution. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response , or trackback from your own site.