Whitepaper: Planning your Sitecore xDB infrastructure

Vice President, Marketing Science
Valtech

July 27, 2015

Embarking on a Sitecore xDB implementation? Learn what you what need to know about infrastructure and other technical considerations:

What is the xDB?

The xDB is Sitecore’s new analytics database that drives the next-generation marketing features available in Sitecore 7.5 and 8. The xDB is based on MongoDB, a change in infrastructure to accommodate high-performing scaling of high volumes of collected experience data.

Strategically, the xDB is the engine powering the Sitecore Experience Platform, empowering marketers to optimize experience based on data-driven insights and a 360-degree view of the customer.

Tactically, the xDB is a key architectural component of Sitecore driving data collection of customer interactions, personalization, optimization and testing, campaigns and engagement plans.

If you are upgrading or building on Sitecore 8, the xDB will be a critical consideration in your delivery, deployment and infrastructure planning. Read on to find out more.

The xDB infrastructure footprint and flow

The xDB is logically comprised of a few different components that need to be taken into consideration. These include some of all of the following:

  • MongoDB collection database
  • MongoDB index (Solr)
  • Possible dedicated session state servers (MongoDB or SQL)
  • Possible processing and/or aggregation servers (Sitecore instances)
  • Reporting database (dedicated SQL Server instance)
  • Reporting services hosted on one of the Sitecore instances

The logical data flow is represented in the diagram below. MongoDB was introduced to handle collection of significant volumes of website interactions; the aggregation function transforms and writes raw Mongo data into a SQL Server reporting database which is made available to reporting applications via a reporting service.

Infrastructure footprint and flow

Image credit: Sitecore

 

Please see Step 2 – Calculate Your Scaling Needs below for further detail and visual representation of these components.

Planning your xDB installation

Our number one piece of advice is to plan early, and accommodate an in-depth infrastructure planning session ideally with a partner. Do not leave this until deployment time! The Sitecore 8 infrastructure footprint has expanded considerably, as has various options for on-premise deployments, cloud deployments or hybrids of the two.

1) Firstly, understand contact and interaction definitions

In order to configure, scale and customize your xDB properly, you will need to understand some new concepts and terminology.

  • An interaction is a browsing session or visit; for sizing purposes, it is measured as net new interactions created in a given month
  • A contact is a visitor that can be identified by a unique identifier such as UserID or email address

This Sitecore article outlines a full list of new xDB-related terminology.

One of the key enhancements in the xDB is the idea of contact identification persisting across sessions. In previous versions, the DMS had the concept of a cookie-based visitor, which led to challenges reconciling sessions on different devices to the same person.

Within the xDB, an individual can now be identified across devices and sessions via a unique ID such as email address. The xDB architecture supporting this is the Session State Server, and it is especially important in scaled environments with multiple CD nodes (see step 2).

You’ll want to extract your contact numbers and current/projected website traffic data as an input.

2) Calculate your scaling needs

If you choose to use Sitecore’s xDB Cloud offering, your subscription model will be calculated based on number of interactions or contacts and all scaling is accommodated within the cloud. You can discuss this further with your Sitecore rep (see step 2).

If you choose to host the xDB on-premise, you’ll need to do some additional work to plan correctly. Read on.

Whiteboard your topology

We strongly recommend starting with an interactive whiteboarding session with a knowledgeable partner and a cross-section of your team, including representatives from IT, infrastructure, development and marketing. 

The objective is to map out your Sitecore 8 and xDB topology in sufficient detail to anticipate procurement requirements, traffic, growth, usage and maintenance such that the correct Sitecore xDB license model can be applied.

While you are whiteboarding, it is helpful to project a starting point diagram on the wall for discussion and walk through each area as outlined below.

Ask critical questions

As you walk through your whiteboarding session and the topics in this whitepaper, there are several key questions that will drive your decision-making about scaled architecture.

  • What are your annual peaks and lows of traffic, specifically visits/month? How is this projected to grow year over year?
  • How many contacts do you have in your CRM? How is this projected to grow?
  • What sort of analytics reporting do you anticipate doing? Will you be using the out-of-the-box reporting interface, or extracting the raw data for use in a broader BI initiative? How intensively will the reports be called upon, on a daily or weekly basis?
  • Will you be hosting multiple websites per Sitecore instance in a multi-tenant architecture?
  • What are your high-availability, failover and disaster recovery requirements and policies?
  • Are you hosting your infrastructure in multiple data centers?

The key areas where the xDB infrastructure scales are outlined below. The diagram indicates a “large” configuration and scales down from there:

The diagram indicates a “large” configuration and scales down from there

1 - MongoDB collection database and session state

The collection database is a mandatory part of your infrastructure and the key questions to ask are related to high availability requirements.

  • If you do not require high availability, you can get by with one instance
  • If you do require high availability, you will require 3 instances to create a MongoDB replica set. If you are planning strategic use of Sitecore XP features, this is recommended; however, you can always scale up from one instance over time
  • As a basic rule of thumb, Sitecore calculates diskspace sizing projections using 5KB per interaction and 2.5KB per identified contact and these two items make up 80% of the diskspace
  • See these two articles for further collection database performance and hardware recommendations

You will also need to consider session state storage. It is necessary to configure both private and shared session states for the xDB. The following table gives a high-level overview of recommended session state modes:

Session State

Purpose

Stand-alone Sitecore instance

Content delivery cluster

Store options

Private

Holds information private to sessions (visit info, pageviews, goals, etc.)

InProc

InProc (sticky load balancer)

Out of proc (non-sticky load balancer)

MongoDB (on-prem)*

SQL (cloud)

Shared (not supported on CMS)

Holds information that may be shared by multiple visits on the same cluster (contacts/devices)

InProc InProc (sticky load balancer)

Out of proc (non-sticky load balancer)

MongoDB or SQL.

Provider must support SessionEnd event.**

* Per Sitecore, if you are running an on-premise solution with a MongoDB database as your collection database, you should use MongoDB as your session store.

**The standard SQL Server session state provider shipped with ASP.NET does not support SessionEnd and cannot be used with the xDB.

Important points:

  • As a rule of thumb, session database sizing can be estimated as each visit requiring 30 KB of storage in the session database, or calculating the maximum expected number of concurrent visits * 30.
  • For further details and a precise algorithm to estimate your storage requirements for a session state store, please refer to this article.
  • The MongoDB instance for your session store is separate from your collection database.

Further diagrams, details and instructions on configuration session state are available at these links.

2 - xDB processing servers

You can add xDB Processing Servers to your infrastructure in high traffic volume scenarios, to offload the aggregation and reporting operations.

  • Sitecore recommends a minimum of one processing server for traffic volumes of one million interactions/month
  • Any production server defined as part of your Sitecore license can be used as a processing server
  • Websites with smaller volumes of traffic (< 500,000 interactions/month) can get by without any processing servers; in this case, xDB aggregation, indexing and reporting operations occur on the Content Management node
3 - Reporting operations

Sitecore’s reporting architecture is designed to make xDB data available to any consumer via a Reporting Services layer that pulls from both the raw MongoDB data and the aggregated SQL Server data. Sitecore’s own reporting applications, such as Experience Analytics, use this services layer.

Key points to know:

  • Sitecore recommends that the reporting database be installed on a dedicated server for performance
  • The Reporting Services must run on a Sitecore instance and can run alongside other functions such as processing or content management. Reporting Services requires specific Sitecore configuration settings.
  • You can read further detail on the reporting architecture and configuration here and here.
4 - xDB Solr index

If you do scale out to separate processing servers, the processing service aggregates xDB data into an index that is physically located on your processing server. Your content management environment requires access to this index for several functions; if it is located separately from your processing server, you will need a Solr instance to provide this access.

  • You do not need Solr if you are using xDB Cloud Edition, or if you are running processing operations on your content management instance (not recommended for any scenario beyond Primary Edition licensing)

3) Check your Sitecore license

Depending when you purchased your Sitecore license, you may need to leverage the new xDB subscription model offered by Sitecore, or you may be grandfathered in. Either way, a conversation with your Sitecore sales rep is necessary at this juncture.

In brief, you will fall into one of two categories. The details of how the subscription models are sized and priced can be discussed with your Sitecore rep and partner and should take into account the sizing and scaling considerations outlined in all steps of this whitepaper.

New customers

Option 1: xDB Cloud subscription license*

Option 2: xDB On-Premise perpetual license

Existing customers pre-January 2015
*license must include DMS

A la carte: Unlimited contracts/interactions

xDB processing and aggregation server purchases may be required

*This license covers all xDB components as indicated in the right-hand side of the diagram above, and includes maintenance. Subscriptions are based on interactions/month or CRM contact volume.

4) Decide whether you will deploy on-premise or in the cloud

There are a number of deployment configurations for Sitecore 8. On-premise remains the most common configuration, although cloud deployments are gaining ground especially with smaller firms with lesser in-house IT capacity.

Wherever you are deploying your Sitecore environments, you will have an important decision to make regarding where your xDB infrastructure resides.

xDB Deployment Option
Pros
Cons
xDB Cloud

xDB infrastructure is completely managed: all components included for one monthly subscription price

No in-house MongoDB management skills required

Not yet compliant with privacy standards such as HIPAA

Cannot implement extensions to xDB processing or aggregation; modules such as Experience Extractor are not yet compatible

xDB On-Premise

Data storage can be kept completely on-premise for data residency policies

Management and extensions to entire infrastructure footprint can be centralized in-house

Perpetual licensing available

In-house infrastructure planning, procurement and skillsets are required for the xDB footprint; many organizations lack MongoDB expertise and resources

To help customers make this decision, Sitecore MVP Jason St-Cyr created this decision tree. Learn more in the accompanying blog post.

xDB Decision Tree

5) For on-premise deployments, choose your Mongo license approach

Customers have the option of obtaining Mongo in several different ways with different support implications. Several options are listed here; for more details see our blog post.

License Option
Description
Open Source (DIY)

MongoDB is an open source project. Anyone can download the fully functioning database. This option is best if your IT team is comfortable with the product and supporting open source systems – you can avoid the related licensing costs.

Another cost-effective option is to combine Mongo licensing models; that is, purchase commercial support for your production instances and leverage the open source license for non-prod environments.

Commercial License purchased through Sitecore

A commercial MongoDB license purchased through Sitecore will offer clients production support for both Sitecore and MongoDB via their Sitecore support channel. This production support offers a 2-hour, 24x365 SLA under AGPL license.

Commercial License purchased directly from MongoDB Inc.

MongoDB Inc. offers two commercial support options:

  • MongoDB Enterprise Advanced with an annual per server (512 GB RAM max) price of $10,000 USD. Support is 24/7 and 365 days a year; the SLA time is one hour.
  • MongoDB Management Service or MMS, which is a cloud service and provisioning tool (currently, the best supported cloud available is Amazon AWS).
MongoLab MongoDB-As-A-Service

Provisions MongoDB on-demand on AWS, Azure or Google. Fully managed with high availability, backup and scaling options. With this option, customers don't require MongoDB operational teams, but the Sitecore components of xDB (processing, reporting) would still need to be hosted on-premise, and privacy issues would need to be assessed.

6) Set up your GeoIP lookup service

The xDB uses a GeoIP lookup service to supplement each interaction record with information about geographical location, including city, country, region, ISP, company and other fields. This data can be used to personalize the end user experience. 

Prior to summer 2015, Sitecore had partnered with MaxMind to provision IP Geolocation services. Customers purchased packages of lookups directly from MaxMind and configured the license key within a configuration file.

This integration is now being phased out and replaced with a service purchased directly from Sitecore App Center, which is accessed from the start button inside of Sitecore. We recommend that clients start with the Sitecore service to avoid having to switch from Maxmind later.

The required steps to purchase the new service are as follows. 

  1. Purchase “Sitecore IP Geolocation Service” from the Sitecore App Center, found inside of Sitecore menu via start button
    • If you don’t have App Center access set up, please see the link above for instructions
  2. Download and install the appropriate version of the Sitecore IP Geolocation Service package from SDN here.
    • Be sure to follow the instructions in the Installation Guide. This package will overwrite the pipeline and set up the configuration as required.

7) Prepare for operational maintenance and performance optimization

If you’ve chosen to host MongoDB on-premise, you’ll be responsible for regular operational maintenance. For Sitecore holistically, we generally recommend the following as part of your plan:

  • Follow Sitecore’s CMS Performance Tuning Guide
  • Follow MongoDB’s Configuration, Maintenance and Analysis principles
  • A monthly or quarterly monitoring schedule including:
    • Monitoring and optimization of MongoDB per the published Operations Best Practices
    • Monitor cache for effective use. As memory is an expensive resource, optimal usage has a direct impact on cost.
    • Ensure all unnecessary Sitecore services and jobs are disabled to free processing and memory resources
    • Conduct a scan for unused content, providing the option to save disk space and save cost for primary and backup storage if removed
    • Archive old versions of content, freeing up space and ensuring ongoing performance of the content authoring environment

Frequently Asked Questions

My xDB is not collecting data. Help!

If you’ve set up your MongoDB and Sitecore configurations per Sitecore’s documentation, and you are not seeing traffic and interactions being saved to your Mongo instance, there are a few troubleshooting tips we’ve found useful.

  • Make sure MongoDB is running. Seems obvious, but particularly in development environments, this can be easy to forget if you did not configure Mongo as a service. Mongo is manually started from a command prompt and we use MongoVUE to watch traffic collection in real-time.
  • Make sure your main layout has the <sc:VisitorIdentification runat=”server”> control included in the <head> section.
  • In a testing scenario, you can adjust the session timeout configuration to flush data to Mongo faster than the 20-minute default setting. This is found in the web.config under <sessionState mode="InProc" cookieless="false" timeout="20"/>
Can I use a content delivery network (CDN) with Sitecore 8 XP and xDB?

Content deliver networks such as Akamai can help accelerate your website’s performance particularly for geographically dispersed visitors, by caching heavy assets such as images, videos and front-end code to local points of presence. Additionally, faster-loading websites are ranked higher from an SEO perspective.

You can absolutely use a CDN with Sitecore 8, but in order for Sitecore XP features to function properly (analytics, personalization, etc), all webpage requests must be served by Sitecore itself. Edge caching the entire website will result in Sitecore’s benefits being lost, and the recommended approach is to publish only assets to your CDN.

For more detail on this approach, see our blog post Integrating Sitecore with your CDN.

Do I need to launch with the xDB enabled?

The Sitecore Experience Platform is designed to work holistically with xDB to deliver full customer experience management capability, and we recommend a full xDB deployment whenever possible. However, some organizations may need to stagger their xDB deployment due to budget, skillset or resource constraints.

Sitecore offers the option to disable Analytics, meaning you can deploy your Sitecore solution without the xDB part of the equation in place. However, this comes with loss of functionality and caveats which are outlined in Sitecore’s Knowledge Base article How to use Sitecore XP 8 without xDB.

What if I need to conform to privacy legislation?

With an out-of-the-box xDB on-premise deployment, anonymous session information is stored including page visits, interactions, tests and engagement plan states. You have control over what identifying information is linked to them (for example, geolocation or an email address from a lead generation form), or whether the xDB data model is extended to pull in CRM data.

For an xDB Cloud deployment, the licensing agreement states that Sitecore cannot comply with HIPAA as it was not designed to do so. Any customers who wish to store HIPAA/PHI type data should use the xDB on-premise model.

If I was using a custom GeoIP solution with DMS, can I continue to use it with Sitecore 8 XP?

You should be able to, but regression testing with the new Sitecore version is recommended and if any issues are encountered, you should contact Sitecore support. The service that Sitecore sells will include regular enhancements that may or may not be tied closely to XP functionality, so this is something to be aware of as you progress with use of the xDB.

How do I link sessions with a unique identifier and merge visitors?

The Sitecore 8 analytics API has been updated, and if you can capture a visitor’s unique identifier, such as email address, Sitecore will merge the visit data with other existing data based on that unique ID via this method call Sitecore.Analytics.Tracker.Current.Session.Identify(<UniqueID>). As such, the contact is then locked in shared session, and the same session data is available and changeable across active interactions and devices.

If you are looking to merge sessions and provide a consistent session experience across devices, you will need to choose how to uniquely identify contacts and write the appropriate code to do so. 

What are the xDB considerations if I have a multi-site / multi-tenant architecture in Sitecore?

Interactions from all sites will be captured in the xDB and can be filtered by site within the reporting interface. The key point to consider is that xDB processing will merge your visitors (as described above) regardless of which site they’re browsing, so you will have a single view of that customer across all sites. If you’ve split your sites apart into separate Sitecore XP instances, you will also be splitting your view of the customer, which may or may not be desirable depending on how closely tied your sites and brands are.

When should I consider extending or customizing the xDB?

Part of the grand vision of the xDB is extending it as an aggregation engine to include a breadth of customer data for the 360-degree view of the customer in the experience profile. This data could be pulled from a CRM, from public data sets, or from any number of other relevant business / marketing applications. Please note that this involves custom development work.

The scope of this whitepaper does not include further detail on xDB extension, but first and foremost, organizations must decide and plan for their strategic use of this data and where it is best surfaced and utilized.

  • A simple scenario may sync CRM data back to the xDB for an organization doing individual-level lead nurturing, such that sales reps can refer to an individual’s behaviours.
  • A more sophisticated scenario may involve xDB raw data extracted and aggregated with other data as part of a broader business intelligence strategy (the Experience Extractor module is particularly helpful here).

Please see Sitecore’s documentation for further detail on extending aggregation.

Contact us

We would love to hear from you! Please fill out the form and the nearest person from office will contact you.

Let's reinvent the future