Definitive Guide to Sitecore Infrastructure and Licensing – Part 3: Scaling of the Database and Indexing Tiers

VP, Technology and Solution Delivery
Valtech

December 08, 2016

The Sitecore Experience platform supports a wide range of deployment options. In the opening post of this series we covered basic terminology, provided an inventory of solution components, explored Sitecore roles and methods of horizontal scaling. We now turn to a more in depth view of how Sitecore’s scale at the database (SQL Server, Oracle, Mongo) and indexing (Lucene, Solr) tier.

See Part 1 and Part 2.

Understanding Data Flows

There are two key data flows to understand in Sitecore.

The first content flow is the management and publication of content:

  1. Content authors will access content management to create, edit, update, delete and otherwise modify content. This activity impacts the master database.
  2. When content is approved, content publication will cause the publication role to read changes in the master database and update the web database. Publication can be triggered by content authors with permission, by workflow or scheduled system activities.

This is a pretty common scenario for any CMS. You will want to keep in mind that publication is a very writing  intensive activity for the master database and Sitecore publishes via direct communication with the relational database in use.

The second flow is the collection, processing and eventual reporting of xDB analytics:

  1. As visitors browse the site, interaction data is held in session (keep in mind this is SQL or Mongo).
  2. When the user session ends it is flushed to the Mongo collections.
  3. The Processing and Aggregation function takes the data from Mongo and suitably transforms it for inclusion in the reporting database.
  4. The reporting database is accessed by the Reporting Service.
  5. The Reporting Service powers the reporting interfaces in Content Management.

Databases and Collections

The purpose of this post is not to cover availability strategies for SQL Server, Oracle or Mongo. The respective vendors already provide an immense amount of documentation on options. Rather, here we focus on the different databases, by what Sitecore role they are used cover any Sitecore specific recommendations.

The following table lists all of the databases in the solution, the supported storage technologies, default recommendations for dedicated instances and the Sitecore feature roles that require access to the database.

Database
Storage
Sitecore Roles Requiring Access
Recommend Dedicated Server/Cluster (Y/N)
Core SQL / Oracle All, and shared N
Master SQL / Oracle Content Management, Processing, Aggregation N
Web ** SQL / Oracle

Content Management, Content Delivery, Reporting

N
Reporting SQL / Oracle Content Management, Processing, Aggregation, Reporting Y
Analytics Mongo All N
Tracking.Live Mongo All N
Tracking.History Mongo Content Management, Processing, Aggregation, Reporting N
Tracking.Contact Mongo Content Management, Processing, Aggregation, Content Delivery N
Session SQL / Mongo All N

** The web database is a bit of a special case in Sitecore. As the web database is the target of publication, a Sitecore solution will commonly have multiple web databases (they will have different names). Multiple web databases are most commonly used to support:

  • Preview targets; where content in a non-final workflow state can be sent to a special site for (p)review
  • Content delivery nodes that are too geographically dispersed to allow for sharing of a single instance of the web database.

If you take the simplest approach, there is a single relational database cluster (SQL or Oracle) and a single Mongo replica set with which all Sitecore instances communicate with. We’ve seen this model in simple solutions, but security policies, network latency and performance requirements will often necessitate a more robust deployment.

As you plan your database tier topology, keep in mind:

  1. Core is a single database and used by every instance in the environment. If latency or security prevents access by all instances, replication to alternative locations is your only fallback.
  2. The Reporting database can be quite large. Sitecore’s default recommendation is to place this on its own dedicated server/cluster.

If we return to our common 2 content delivery 1 management instance, the simplest database deployment would look like the following.

Definitive Guide to Sitecore Infrastructure and Licensing  Part 3 1

For organizations that want to scale Reporting or do not want to share database servers between content delivery, which is typically in a DMZ and the content management instance behind the firewall, a deployment might look more like one of the following. We’ll cover more on the network zones in the next post.

Definitive Guide to Sitecore Infrastructure and Licensing  Part 3

Continue to Part 4: Network zones and geographic distribution

Contact us

We would love to hear from you! Please fill out the form and the nearest person from office will contact you.

Let's reinvent the future