Sitecore Publishing Target Replication

Publishing target replication

If you need to work with a publishing target that is placed remotely from your Sitecore CM instance, you have to consider how much will you publish and if publishing time is an issue. Publishing in Sitecore is in fact a granular operation that sends each item over the network separately – there is no “bulk” send as of now. This means that the publishing time increases with the increase of latency and decrease of network throughput. Let’s heave a look at how we can solve this.

With a latency of ~300 ms and network throughput 200 Mbit/s the publishing time for one item can get to a whopping 6 – 7 seconds. The time required to publish 1 000 items amounts then to more than 1,5 hours. While this can be a fit for some use cases, in the majority of cases, it is necessary to have the lowest possible publish times. To achieve that, the best way is to replicate a publishing target, using SQL Server replication.

Currently (up to Sitecore 8.1) Merge Replication is the supported way to set up SQL Server replication for Sitecore. The replication is set up to transfer a complete publishing target to one or more locations. Replicating content over WAN is more effective than publishing, as the operation is a bulk process rather than an atomic one. After the content is published to the publishing target that acts also as a Publication in the SQL Server replication, the Subscriber checks for any changes and synchronizes them.

For the above mentioned case of a WAN with ~300 ms latency and 200 Mbit throughput, heaving items without large binary content, the replication is able to synchronize the changes in no more than 5 minutes.

Architecture layouts

There are a two main scenarios, which can be considered for the replication of the web target – universal publishing target and dedicated publishing target.

Universal publishing target

The universal publishing target is a single publishing target (web database), where all content is published at all times. The content could then be replicated to one or multiple additional locations. The replicated web databases behave exactly like the standard ones and Sitecore is configured to have them as publishing targets.

Universal publishing target replication

The only difference compared to a traditional publishing target is that the replicated publishing target is not configured in the Master database. They are therefore available in the desktop environment for the users to switch to them and access them (e.g. to check, whether a determined piece of content has been already replicated), but they are not available in the publishing dialog as a selectable target.

Dedicated publishing target

The dedicated publishing target is a standalone web database, placed closely to the CM. This web database is then only used for achieving fast publish times; it is not used to serve content to any CD. It however replicates the published content to a remote web database, which is used to serve content to the nearby CDs.

Dedicated publishing target replication

In the scenario sketched in the image, there are two web databases on the first database server and a single one on the second database server. In this use case, content might be centrally managed and published (e.g. Europe) with a second publishing target representing a remote consumption destination (e.g. Asia).

Setup

Setup is very well described in the SQL replication guide available on the SDN (this is still very actual also for the 8.x series). However, there are a few gotchas that are worth mentioning:

  1. Core DB – the replication guide says that the Core DB can be optionally replicated. However, if the Core DB is a standalone instance, it will not be cleaned by the cleanup jobs of Sitecore – especially the EventQueue – unless it is directly linked to an application instance that performs these jobs. Since there can only be one Core database linked at any time to an instance and only one instance should perform the cleanup jobs (multiple instances performing the cleanup jobs must be avoided to prevent table deadlocks), replication of the Core database should be considered as the only way to go, when you have the necessity to have multiple Core DBs.
  2. Emptying the EventQueue – Remember to truncate the EventQueue table immediately before initializing the snapshot. EventQueues contain the timestamp column that could cause problems for the initialization.
  3. Starting a replication on a larger database – Starting a replication on a new empty database is very quick and easy. If you need to start it on a larger database (several GB) over a slower wire, you might want to increase the initial size of the database to accomodate the conent fully, without needing to expand. Also increase the autogrowth of the log and master data file from 1 MB to something bigger – 500 MB for example. It is worth to remember that on larger databases the initial snapshot synchronization (over SQL Server) is a bit flaky, so if you can, always plan for the replicated database earlier in the phase of the project, rather than later.
  4. Replicating the views – Sitecore does not say to replicate the views in their replication guide. If you however decide to replicate the views (currently, in the Web database there is only one view dbo.Fields), you need to remember to reinitialize the snapshot as soon as the view changes. Remember to check on every Sitecore update.

Closing thoughts

By replicating the publishing target, you can save yourself a tremendous amount of time while publishing, not only because you decrease the number of publishing targets to physically publish to, but also saving lots of network communication and moving content around in a bulk fashion.

2 thoughts on “Publishing target replication”

  1. Hi,

    We are running into mentioned issues, but using azure SQL SaaS. And there geo replication is possible but sitecore seems to want, a read write database for web. If that requirement can be removed that is also a nice way of doing replications..
    Is there a way to remove the write to web database from CD nodes.

    One other thing to consider is to use MongoDB for web databases, that also takes replication out of the way as MongoDB does that beautifylly.. (same write problems…)

    /M

    1. Hi Martin,

      this is an interesting comment – indeed the CDs need write access to the web database, as there are a few tables (e.g. EventQueue and Properties) that need to be updated (timestamps), by the jobs carried out by the CD itself. Although this is not a mission critical DB access, it is necessary to address these writes, so that they happen somewhere else, in order to have a read-only type of replication.
      However, the currently supported Merge Replication is a bi-directional type of replication, write access is not necessarily a problem. On other (currently unsupported) methods, such as AlwaysOn High Availability Groups, where only read access is granted, this represents a problem.

      Using MongoDB for web databases, I believe, would not help in this scenario, as the replication in MongoDB relies on a single Primary server that can take all writes. Only reads can be achieved with secondaries, so the scenario would be similar as with the other read-only kinds of replication…

Leave a Reply

Your email address will not be published. Required fields are marked *