Sitecore as a BI Source: An ode to the Version 8-era xDB
November 15, 2017
Disclaimer: This post refers to Sitecore 8.1. With xConnect coming in version 9, many of the ETL components below will simplify
So you’ve heard some great things about the Sitecore xDB and the behavioural data it collects. If you are undertaking a Business Intelligence or Data Warehousing initiative, Sitecore xDB can contribute a lot of value.
I write this article not as a BI expert but from my experience helping organizations leverage Sitecore for this scenario. Whether you have a BI team, or are simply designing dashboards within a tool such as Tableau or PowerBI, I hope you can benefit from our learnings and recommendations.
If your organization has planned a digital data initiative, that’s great! It means you’re probably on track to climb a step on the maturity ladder in building the elusive “one view of the customer”. It probably also means your organization is moving towards more data-driven decision making.
If you’ve been tasked with investigating the role Sitecore can play, I would recommend first familiarizing yourself with the following. It’s important to understand a) what Sitecore collects and b) required tagging to enrich Sitecore data.
- Google Analytics and Sitecore Analytics: A side by side approach
- Quick wins for a richer Sitecore experience database
- Bonus article: Visualizing xDB and CRM data in Power BI, where Jason Wilkerson demonstrates a simple pull of Interaction and Contact records with a custom interface.
A little bit about the project featured in this story
Our client is a membership association on Sitecore 8.1. They asked for our help in building a KPI measurement framework and implementing related business intelligence dashboards. We saw this as a great opportunity to demonstrate the value of xDB and how it can fit inside a data warehouse.
Luckily, our client was starting at a great place. They had just revised and re-issued fresh corporate objectives to which we could map the KPIs.
Part 1: The KPI framework
As part of our project, we did a “Step 1” where we ran several business stakeholder workshops to build consensus around a set of KPIs mapped back to the newly revised corporate objectives. It was a very valuable exercise for multiple departments to sit down together to understand what was most important to measure.
A primary objective was to maximize usage of Sitecore-specific metrics and filters so they could be easily implemented and measured. Engagement value and content profiling were front and center.
The client stakeholders found the Engagement Value Pyramid exercise (something we do with all our clients) especially valuable in ranking the relative importance of onsite behaviours and actions.
Our final deliverable was a robust framework that identified 19 key performance indicators each specifying multiple segments and filters including profile/pattern matching. We also identified how these could be implemented both in Sitecore Analytics and Google Analytics for maximum flexibility.
Below is a simplified representation of one of the defined KPIs. We defined a total of 19.
xDB Field Mapping
Thought Leadership [redacted]
Build on the influence of [organization] as a voice of the profession on public interest issues
New vs. Returning
Part 2: Choosing the xDB as a data source
There was a legitimate question of whether to use Google Analytics (GA) or Sitecore Analytics (SA) as the source for the BI dashboards. The organization, like many, had established maturity in using Google Analytics and we discussed using Google Data Studio for short-term drill-down reports for specific campaigns.
However, GA came with some limitations. For example, the data queries would begin to apply sampling after a certain threshold to keep performance in check, meaning only a subset of collected data is used in the reporting.
The advantages of using Sitecore were clear:
- Full, unrestricted access to all behavioural data and ability to do a nightly pull into the data warehouse
- Consistency between use of engagement value both as a KPI and later for onsite personalization (as opposed to using only GA goals)
- Extensibility of contact data model, useful later with a future ecommerce phase
Next up was to check in with the client’s Business Intelligence team. They needed to understand two things:
- The dashboard requirements as set by the KPI framework
- How the data could be queried from the xDB and which transformations would be required
The BI platform in use was Cognos using SQL Server 2012 for the data warehouse. They’d face the challenge of transforming flat Mongo data back into relational data; luckily, a Mongo connector was found and we entered a biweekly rhythm of requirements analysis and education sessions. My goal was to impart as many of my learnings from my xDB data mining pilot as possible.
Part 3: Accessing xDB data
Much like I had done in the pilot, I assessed the different ways to access xDB data:
- Option 1: Connecting to the SQL Reporting Database (aggregated facts/dimensions) using native connectors in Excel or Tableau
- The way Sitecore aggregates data serves Sitecore’s out-of-the-box Analytics reports very nicely. Unfortunately, given the client’s specific dashboard requirements, the way that Sitecore aggregates the data into those tables did not support the pivoting and filtering of data that the dashboards needed. For example, the tables aggregated around specific metrics, or they removed too many other fields that we wanted in the dataset.
- Option 2: Using the Reporting API to fetch data
- This API will deprecate once xConnect is released in Sitecore 9
- I would not recommend investing in a code/ETL layer using this older API
- Option 3: Waiting for xConnect. Unfortunately this was not an option for this project timeframe.
- Option 4: (Winner!) Querying Mongo directly. Yes, we’ll invest in the 8.1 Mongo spec which will evolve with upgrades, and we’ll need to do some customizations/API calls to get the full data we need, but this option emerged as the clear winner.
- We had a need for more freedom in manipulating and pivoting the entire dataset in multiple ways and found we could not derive all data from the aggregated tables in the way we needed. We settled on making as much of the raw data available in the data warehouse so the BI team could have complete freedom in building the dashboards and joining to other business data.
Check your version of xDB:
- On-premise – You should be able to query directly; you’ll need to be cautious of your replica sets and how you’ve configured the redundancy. Those details are out of scope for this article but be aware that non-primary sets may have a lag in replication and as such, potential data loss if you query them.
- xDB Cloud Edition or xDB Managed Cloud or third-party Mongo hosting – you should be able to perform direct queries to the Mongo cluster depending on your SLA. A couple of considerations:
- Check whether your Mongo host has configured all members of the replica set to be data-bearing and eligible to be Primary.
- Mongo recommends that Read connections be made to the Primary member in the replica set, because replication is asynchronous. That said, if your Mongo host keeps both nodes as “primary-eligible” they seem to have a solid failover process that minimizes any “write” data loss.
Ensure a proper testing environment and data integrity
As with any healthy DevOps, the BI team should have a development xDB environment with realistic data from which to build a specification and perform testing. As best you can, make sure they are similar and you have the same services (such as GeoIP) and tagging (such as engagement value) in both.
A second recommendation involves a data integrity check between Google Analytics and Sitecore Analytics, if both are in use; organizations will often be using both to make decisions.
The manner in which these platforms collect data is different so we can’t expect an exact match, however a delta of 7-10% between volumes of sessions or visits seems to be a consistent, achievable heuristic across the clients I’ve worked with. It’s important that we compare apples to apples. If you observe major differences in traffic volumes, it is often due to:
- Different ways bot and internal traffic is excluded
- Different ways logical website are segmented (for example, the websites configured in Google Analytics vs the logical sites hosted on one or more Sitecore instances)
- Sitecore is not collecting properly
- Web API calls to Sitecore might be counting as interactions
Field Mappings and Organic Iterations
I would recommend allowing for several weeks of iterating on the technical specification for xDB translation to KPIs in the data warehouse. Questions will come up organically and answers will need to be researched. In particular, the implementation will challenge the KPI definitions and clarifications will likely be needed (“what decisions will be made with this KPI? Why did we prioritize it as important?”)
To support the BI team, we created a field mappings document that indicated exactly how the expected dashboard data would map to Mongo fields. A partial screenshot is shown below.
Part 4: A word on non-native xDB data
In our KPI framework we did our best to map to Sitecore fields as we knew them. You and your team will need a combination of understanding how the metric will be used and Sitecore data structure to ensure the formulas are accurate. (This will definitely change with Sitecore 9, xConnect and other providers)
A key aspect of our transformation design was deciding how to pull in a few extra pieces of information not natively stored in the xDB. We’ve had good success in other scenarios using Page Events to extend the xDB model to store additional data.
One specific example was that we decided to use a Page Event to discern which visitors had looked at content tagged with a particular category. We couldn’t do it reliably by template type, URL or any other piece of data natively stored in the xDB. Ultimately we decided to capture the tag in a Page Event when the interaction was recorded so we’d have the snapshot of the tag string at the moment of capture instead of looking it up retroactively. This solved our problem nicely!
The end result
The final solution involved a pull from Mongo into the data warehouse every 24 hours, via the transformation pipeline that had been designed and built against the KPI Framework requirements. The Cognos dashboards have been designed and deployed to production; a sample visual is below.
Whether you pursue using xDB as part of your data strategy on Sitecore 8 or Sitecore 9, I hope this post has been helpful in framing an approach that has worked well for us. Best of luck in your xDB adventures!