December 08, 2016
The Sitecore Experience platform supports a wide range of deployment options. In the opening post of this series we covered basic terminology, provided an inventory of solution components, explored Sitecore roles and methods of horizontal scaling. We now turn to a more in depth view of how Sitecore’s scale at the database (SQL Server, Oracle, Mongo) and indexing (Lucene, Solr) tier.
Understanding Data Flows
There are two key data flows to understand in Sitecore.
The first content flow is the management and publication of content:
- Content authors will access content management to create, edit, update, delete and otherwise modify content. This activity impacts the master database.
- When content is approved, content publication will cause the publication role to read changes in the master database and update the web database. Publication can be triggered by content authors with permission, by workflow or scheduled system activities.
This is a pretty common scenario for any CMS. You will want to keep in mind that publication is a very writing intensive activity for the master database and Sitecore publishes via direct communication with the relational database in use.
The second flow is the collection, processing and eventual reporting of xDB analytics:
- As visitors browse the site, interaction data is held in session (keep in mind this is SQL or Mongo).
- When the user session ends it is flushed to the Mongo collections.
- The Processing and Aggregation function takes the data from Mongo and suitably transforms it for inclusion in the reporting database.
- The reporting database is accessed by the Reporting Service.
- The Reporting Service powers the reporting interfaces in Content Management.
Databases and Collections
The purpose of this post is not to cover availability strategies for SQL Server, Oracle or Mongo. The respective vendors already provide an immense amount of documentation on options. Rather, here we focus on the different databases, by what Sitecore role they are used cover any Sitecore specific recommendations.
The following table lists all of the databases in the solution, the supported storage technologies, default recommendations for dedicated instances and the Sitecore feature roles that require access to the database.
Database |
Storage |
Sitecore Roles Requiring Access |
Recommend Dedicated Server/Cluster (Y/N) |
Core | SQL / Oracle | All, and shared | N |
Master | SQL / Oracle | Content Management, Processing, Aggregation | N |
Web ** | SQL / Oracle |
Content Management, Content Delivery, Reporting |
N |
Reporting | SQL / Oracle | Content Management, Processing, Aggregation, Reporting | Y |
Analytics | Mongo | All | N |
Tracking.Live | Mongo | All | N |
Tracking.History | Mongo | Content Management, Processing, Aggregation, Reporting | N |
Tracking.Contact | Mongo | Content Management, Processing, Aggregation, Content Delivery | N |
Session | SQL / Mongo | All | N |
** The web database is a bit of a special case in Sitecore. As the web database is the target of publication, a Sitecore solution will commonly have multiple web databases (they will have different names). Multiple web databases are most commonly used to support:
- Preview targets; where content in a non-final workflow state can be sent to a special site for (p)review
- Content delivery nodes that are too geographically dispersed to allow for sharing of a single instance of the web database.
If you take the simplest approach, there is a single relational database cluster (SQL or Oracle) and a single Mongo replica set with which all Sitecore instances communicate with. We’ve seen this model in simple solutions, but security policies, network latency and performance requirements will often necessitate a more robust deployment.
As you plan your database tier topology, keep in mind:
- Core is a single database and used by every instance in the environment. If latency or security prevents access by all instances, replication to alternative locations is your only fallback.
- The Reporting database can be quite large. Sitecore’s default recommendation is to place this on its own dedicated server/cluster.
If we return to our common 2 content delivery 1 management instance, the simplest database deployment would look like the following.
For organizations that want to scale Reporting or do not want to share database servers between content delivery, which is typically in a DMZ and the content management instance behind the firewall, a deployment might look more like one of the following. We’ll cover more on the network zones in the next post.
Continue to Part 4: Network zones and geographic distribution