A Generic Managed Cache Layer for Sitecore XM/XP That Almost Writes Itself Into Your Code

Best Practices CMS Code Samples Sitecore Software Development Solution

Sitecore XM/XP solutions accumulate expensive recurring computation over time. Index queries, link generation, lists that need to be sorted and filtered, custom item resolvers, multi-source aggregation. Each of these recomputes the same answer on every request, and most of the time the inputs have not changed since the previous publish. The opportunity is obvious: cache the output and invalidate it when the underlying content actually changes.

In our case the symptom that finally forced the issue was Solr query quotas. The client’s solution leans heavily on Solr for content aggregation, faceted search, and product listings, and was hammering it hard enough to bump against the allocated limits on a regular basis. Scaling Solr is neither cheap nor instant. But Solr was just the loudest example. The same recompute-every-request pattern applied to several other paths in the solution that were not yet in pain but would have been the next bottleneck.

The solution was obvious: cache the results. The challenge was less obvious: how do you introduce caching across dozens of services, each with their own data shapes and invalidation requirements, without turning the codebase into a maintenance headache?

It called for a caching layer that is so transparent to use that adding it to an existing service is practically a one-line change. And removing it (or disabling it for tests) is even simpler - you pass null instead of a cache reference and everything keeps working. No if (cache != null) guards scattered through your code, no special test doubles, no ceremony.

The Problem

Our Sitecore XP solution uses Solr extensively, not just for search but for product listings, category filtering, attribute faceting, and content aggregation. Every page render was triggering multiple Solr queries, many of which returned the same results for the same content that had not changed since the last publish.

The Solr queries themselves were well-optimized. The problem was volume. Hundreds of identical queries per minute across multiple CD instances, all hitting the same Solr cluster.

Solr was the most visible offender, but it was not alone. Around the same hot path were several other recompute-on-every-request patterns that the same cache layer would eventually wrap as well:

  • Generated links where the resolved URL depends on context (language, site, version) and the link provider does non-trivial work to produce it.
  • Filtered and sorted lists built from sets of items pulled out of the content tree, where the source items rarely change but the filter and sort logic runs per request and the result set is identical across requests.
  • Custom item resolvers that walk relationships and apply business rules to choose a winning item for a given input.
  • Aggregation queries that fan out across multiple databases or providers and reduce the results into a single answer.

All of these share one property: the output is a pure function of content that only changes on publish. Cache the output, invalidate it when the right templates change, save the work. Solr was the headline cost; the others were the next bottlenecks waiting their turn.

We needed a caching layer, but the solution had to satisfy several constraints:

  • Easy to add - developers should be able to wrap existing service calls with caching without restructuring anything
  • Easy to remove - for unit tests, we do not want the cache involved at all
  • Serialization-aware - Sitecore’s cache infrastructure requires every entry to report its size, which means storing serialized strings rather than raw objects. This turns out to be more feature than constraint: serializing forces a clean break between the cached value and any live Sitecore object graph it came from, which keeps volatile objects (items, fields, providers) out of the cache where they would otherwise risk memory leaks or stale references
  • Template-sensitive flushing - when a content author saves a Product item, the product cache should clear automatically, but the unrelated menus cache should not
  • Custom flushing - some services need finer control over what gets flushed and when
  • Multi-instance aware - cache clears on the CM should propagate to all CD instances

The Core: ManagedCache and IManagedCacheService

The foundation is a ManagedCache class that extends Sitecore.XA.Foundation.Caching.DictionaryCache. It adds two capabilities that the stock SXA cache does not have: a list of template IDs that trigger automatic clearing, and a pluggable ManagedCacheInvalidationDelegate for custom flushing logic.

 1public class ManagedCache : Sitecore.XA.Foundation.Caching.DictionaryCache, IManagedCache
 2{
 3    public List<ID> InvalidatingTemplateIds { get; }
 4    private readonly ManagedCacheInvalidationDelegate _cacheClearer;
 5
 6    public ManagedCache(string name, long maxSize,
 7        ManagedCacheInvalidationDelegate cacheClearer = null) : base(name, maxSize)
 8    {
 9        _cacheClearer = cacheClearer;
10        InvalidatingTemplateIds = new List<ID>();
11        LastCleared = DateTime.UtcNow;
12    }
13
14    public bool Clear(Item item, ItemEventType eventType, bool remote)
15    {
16        bool result = false;
17        if (InvalidatingTemplateIds.Count > 0 &&
18            InvalidatingTemplateIds.Exists(id => CheckItemInheritance(item, id)))
19        {
20            result = true;
21            Clear();
22        }
23
24        if (_cacheClearer != null)
25        {
26            result = _cacheClearer(this, item, eventType, remote) || result;
27        }
28
29        return result;
30    }
31}

The Clear(Item item, ...) method is the decision point. When a Sitecore item event fires (save, delete, rename, publish), the event handler calls this method on every managed cache. The cache first checks its template list - if the changed item inherits from any of the registered templates, it clears itself. Then it calls the custom delegate if one was provided. Both mechanisms can coexist on the same cache.

The IManagedCacheService manages the lifecycle of all caches. You ask it for a cache by name, and it either returns an existing one or creates a new one with your specified size, flushing delegate, and template IDs:

 1public interface IManagedCacheService
 2{
 3    IManagedCache GetCache(string cacheName, string defaultMaxSize = "50MB",
 4        ManagedCacheInvalidationDelegate clearer = null, ID[] invalidatingTemplateIds = null);
 5
 6    bool ClearCaches(Item item, ItemEventType eventType, bool remote);
 7    List<string> GetCacheNames(bool includeUnmanaged = false);
 8    IManagedCache GetCacheByName(string cacheName, bool includeUnmanaged = false);
 9    void ClearCache(string cacheName, bool includeUnmanaged = false);
10    void RaiseClearCacheEvent(string cacheName, string userName);
11    void RaiseClearCacheEventOnRemotes(string cacheName, string userName);
12    bool IsManagedCache(IManagedCache cache);
13}

The service is registered as a singleton via Sitecore’s DI container. Every service that needs caching takes IManagedCacheService as a constructor parameter, calls GetCache(...) once in the constructor, and then uses the returned IManagedCache throughout its lifetime.

The Secret Sauce: ManagedCacheExtensions

Here is where the design gets interesting. The GetOrSetIfNotCached<T> extension method is what makes the cache essentially invisible to the consuming code:

 1public static T GetOrSetIfNotCached<T>(this IManagedCache cache, string cacheKey,
 2    Func<T> getObject, Func<T, string> objectToString, Func<string, T> stringToObject)
 3    where T : class
 4{
 5    if (cache != null)
 6    {
 7        var cacheValue = cache.Get(cacheKey);
 8        if (cacheValue != null)
 9        {
10            var sCachedObject = cacheValue.Value;
11            if (sCachedObject.IsNullOrEmpty())
12                return null;
13
14            return stringToObject(sCachedObject);
15        }
16    }
17
18    var value = getObject();
19
20    if (cache != null && value != null)
21    {
22        string cachedValue = objectToString(value);
23        cache.Set(cacheKey, new DictionaryCacheValue { Value = cachedValue });
24    }
25
26    return value;
27}

Notice the if (cache != null) checks. When the cache reference is null, this method simply calls getObject() and returns the result. No caching, no serialization, no overhead. This is what makes it transparent:

  • In production, IManagedCacheService.GetCache(...) returns a real cache, and every call goes through the cache-check-then-fetch-and-store flow.
  • In unit tests, you can pass null for the cache (or simply not register IManagedCacheService), and all your service logic runs without caching. You test your business logic, not the caching layer.
  • When cache is disabled via config, GetCache(...) returns null, and the same transparency applies.

The serialization pair (objectToString / stringToObject) is the developer’s responsibility. For simple objects, ToString() and a parse method work. For complex objects, JSON serialization does the job. There are two reasons the layer pushes you toward serialized strings rather than letting you stash arbitrary objects.

First, Sitecore’s cache infrastructure needs to know the size of each entry in order to enforce MaxSize and evict cleanly under memory pressure. Strings are trivial to size; arbitrary object graphs are not.

Second, and more importantly in practice, the serialized contract forces a decision about what actually belongs in the cache. Sitecore Item instances are the classic trap: they hold references back to the database, to lazily fetched fields, and to language/version state that can mutate underneath you. Caching them whole risks memory leaks (the cache pins a live object graph indefinitely) and stale-reference bugs (the cached item points at a Database that was later reset). The serialized contract sidesteps both. You pick a slim representation (an ID, a URI, a plain DTO) and re-hydrate from the database on the read side. For Sitecore items in particular, re-fetching by ID is cheap thanks to the prefetch cache; the work you saved was the upstream sorting, filtering, or index query that produced the ID in the first place.

Usage Example 1: Template-Sensitive Flushing

Here is how a product listing provider uses the cache with automatic template-based flushing. The cache will auto-clear whenever any item inheriting from the Product or ProductCategory templates is saved, deleted, or published:

 1public class CachedProductListingProvider
 2{
 3    private readonly IManagedCache _productCache;
 4    private readonly IManagedCache _categoryCache;
 5
 6    public CachedProductListingProvider(IManagedCacheService cacheService)
 7    {
 8        // These caches auto-flush when Product or ProductCategory items change
 9        _productCache = cacheService?.GetCache("ProductSolrCache", "50MB",
10            null, new[] { Product.ItemTemplateId, ProductCategory.ItemTemplateId });
11
12        _categoryCache = cacheService?.GetCache("CategorySolrCache", "50MB",
13            null, new[] { ProductCategory.ItemTemplateId });
14    }
15
16    public List<ProductSearchResultItem> GetProducts(Database database)
17    {
18        return _productCache.GetOrSetIfNotCached(
19            $"ProductSolrCache::{database.Name}",
20            () => QuerySolrForProducts(database),
21            JsonConvert.SerializeObject,
22            JsonConvert.DeserializeObject<List<ProductSearchResultItem>>
23        );
24    }
25}

The null passed as the clearer parameter means no custom flush logic. The template ID array does all the work. When a content author publishes a Product item, the event handler fires, the ManagedCache.Clear(Item, ...) method checks template inheritance, finds a match, and clears the cache.

Notice how JsonConvert.SerializeObject and JsonConvert.DeserializeObject<T> are passed directly as the serialization pair. For most use cases, that is all you need.

Usage Example 2: Custom Flushing Delegate

Sometimes template-based flushing is not enough. An output cache service, for example, needs to selectively flush only the cache entries that are affected by a specific content change, rather than clearing everything:

 1public class OutputCacheService
 2{
 3    private readonly IManagedCache _cache;
 4
 5    public OutputCacheService(IManagedCacheService managedCacheService)
 6    {
 7        _cache = managedCacheService.GetCache("OutputCache",
 8            clearer: (cache, item, eventType, remote) =>
 9            {
10                // Custom logic: only flush entries affected by this item
11                FlushEntriesForItem(item);
12                return true;
13            });
14    }
15}

The ManagedCacheInvalidationDelegate receives the cache reference, the changed item, the event type (save, delete, rename, etc.), and whether the event is remote. You return true if you handled the clearing, false if the cache should remain untouched for this particular change.

This delegate runs after the template check, so you can combine both approaches: register invalidating templates for broad flushing and a delegate for fine-grained control.

Usage Example 3: Selective Template Check in the Delegate

A product resolver service uses a custom delegate that checks template inheritance manually, giving it full control over when to flush:

 1public class ProductResolverService
 2{
 3    private readonly IManagedCache _cache;
 4
 5    public ProductResolverService(IManagedCacheService cacheService)
 6    {
 7        _cache = cacheService.GetCache("ProductResolveCache", "50MB",
 8            (cache, item, eventType, remote) =>
 9            {
10                if (item.TemplateID == Product.ItemTemplateId)
11                {
12                    cache.Clear();
13                    return true;
14                }
15                return false;
16            });
17    }
18
19    public Product ResolveProduct(string productId, Database database)
20    {
21        return _cache.GetOrSetIfNotCached<Product>(
22            $"ProductResolveCache::{database.Name}::{productId}",
23            () => GetProductByIdInternal(productId, database),
24            product => product?.ID?.ToString(),
25            sCachedItemId =>
26                ID.TryParse(sCachedItemId, out ID itemId)
27                    ? database.GetItem(sCachedItemId)
28                    : null
29        );
30    }
31}

Notice the serialization approach here. Instead of serializing the entire Product object, we store just the Sitecore item ID as a string. On cache hit, we resolve the item from the database. This is the recommended pattern whenever the cached value is (or wraps) a Sitecore item. The expensive part of ResolveProduct is the search-index call that locates the correct item; once you have the ID, database.GetItem(id) is a fast prefetch-cache hit. Storing the ID rather than the item itself also keeps the cache free of live object references, so you do not have to worry about a cached entry pinning a stale Database or holding lazily loaded fields in memory forever.

The same pattern generalizes beyond single items. For a filtered-and-sorted list of items, cache the ordered list of IDs (or URIs) rather than the items themselves. For a resolved link, cache the URL string rather than the LinkField that produced it. For an aggregation, cache the reduced result, not the providers that produced it. The rule of thumb: cache the inexpensive identifier that took expensive work to compute, and re-hydrate to a live object only at the very edge.

Configuration: Enabling, Disabling, and Sizing Caches

Every cache in the layer is steered from a Sitecore config patch. Nothing about cache size, on/off state, or even the existence of a particular cache is hard-coded in C#. The values baked into GetCache(...) calls (the defaultMaxSize = "50MB" you saw above, the cache name, the template list) are defaults. Config wins.

The IManagedCacheService implementation reads three families of settings every time you call GetCache(cacheName, ...):

 1<sitecore>
 2  <settings>
 3    <!-- Global kill switch. When false, every GetCache(...) call returns null. -->
 4    <setting name="PerformantSitecore.ManagedCache.Enabled" value="true" />
 5
 6    <!-- Per-cache on/off. When false, GetCache("ProductSolrCache", ...) returns null. -->
 7    <setting name="PerformantSitecore.ManagedCache.ProductSolrCache.Enabled" value="true" />
 8    <setting name="PerformantSitecore.ManagedCache.CategorySolrCache.Enabled" value="true" />
 9    <setting name="PerformantSitecore.ManagedCache.OutputCache.Enabled" value="false" />
10
11    <!-- Per-cache size override. Overrides the defaultMaxSize passed to GetCache(...). -->
12    <setting name="PerformantSitecore.ManagedCache.ProductSolrCache.MaxSize" value="200MB" />
13    <setting name="PerformantSitecore.ManagedCache.CategorySolrCache.MaxSize" value="20MB" />
14  </settings>
15</sitecore>

The resolution order is the obvious one. If the global Enabled flag is false, every call returns null, full stop. Otherwise the service checks the per-cache Enabled flag (defaulting to true if not set) and, when the cache is on, looks for a per-cache MaxSize override before falling back to the defaultMaxSize argument from the C# call. Sizes use Sitecore’s standard human-readable form (50MB, 1GB, 512KB) parsed via Sitecore.StringUtil.ParseSizeString.

Because the consuming code already treats null as “no cache” thanks to the GetOrSetIfNotCached<T> design, flipping PerformantSitecore.ManagedCache.ProductSolrCache.Enabled to false in a config patch is genuinely all you need to do to disable just that cache. No code changes, no rebuilds, no special branches in the services that depend on it. They keep calling _productCache.GetOrSetIfNotCached(...), the cache reference is null, and the getObject() lambda runs every time exactly as it would have in a unit test.

A few patterns this enables that turn out to matter in practice:

  • Per-environment tuning. Production gets large MaxSize values, the CM gets smaller caches than the CDs (less render traffic, more authoring churn), and lower environments may disable specific caches entirely to make publish-to-display issues easier to reproduce without cache interference.
  • Incident triage. When a cache is suspected of serving stale data and you cannot yet explain why, disabling it via a config patch deploys faster than reverting code, and it lets you confirm or rule out the cache as the cause within one app pool recycle.
  • Targeted A/B. Because each cache is independently switchable, you can disable one Solr-backed cache to measure its actual contribution to throughput and Solr query volume, without affecting any other cached path.
  • Per-role config. The patch file lives wherever Sitecore config patches live, so you can scope it with role rules (role:require="ContentDelivery" etc.) when you want CD instances to behave differently from the CM.

The config patch file is also where the event handler wiring lives (the ManagedCacheInvalidationHandler discussed next), so the whole layer ships as a single XML patch alongside the assembly.

The Event Plumbing

For the automatic flushing to work, Sitecore item events need to reach the ManagedCacheService. The ManagedCacheInvalidationHandler event handler handles this. It hooks into all relevant Sitecore item events (saved, deleted, moved, renamed, copied, published) and calls ClearCaches(item, eventType, remote) on the service. The service iterates all managed caches and lets each one decide whether to flush.

The event handler supports both local and remote events. In a multi-instance deployment (CM + multiple CDs), when a cache is manually cleared on the CM, you can propagate that clear to all CDs via RaiseClearCacheEvent(...). This uses Sitecore’s event queue, so it works across instances without any custom messaging infrastructure.

The whole thing is wired in via a single Sitecore config patch file. No code changes needed to your existing event handlers.

Remote Flushing: How It Actually Travels Between Instances

Template-driven invalidation handles the publish-time case, but there is a second path that matters just as much in a CM + multi-CD topology: somebody clears a cache on one machine and every other machine needs to drop the same cache. Two methods on IManagedCacheService cover this, and they are deliberately different:

1void RaiseClearCacheEvent(string cacheName, string userName);
2void RaiseClearCacheEventOnRemotes(string cacheName, string userName);

RaiseClearCacheEvent clears the cache locally and queues an event for every other instance. RaiseClearCacheEventOnRemotes only queues the event and does not touch the local cache, which is the right behavior when you have already cleared in-process (for example, immediately after a manual flush via an admin tool) and just need the remaining instances to follow.

Under the hood both methods do the same two things:

  1. They serialize a tiny payload (cache name plus the user name that initiated the clear) and call Sitecore.Eventing.EventManager.QueueEvent("managedcache:clear:remote", ...).
  2. The matching event subscriber (registered alongside the item event handlers in the config patch) deserializes the payload on each remote instance, resolves the cache via IManagedCacheService.GetCacheByName(cacheName), and calls Clear() on it if it exists.
 1public void RaiseClearCacheEventOnRemotes(string cacheName, string userName)
 2{
 3    var args = new ManagedCacheClearRemoteEventArgs(cacheName, userName);
 4    EventManager.QueueEvent("managedcache:clear:remote", args);
 5}
 6
 7// Subscriber, registered via the config patch:
 8public void OnRemoteClearCache(object sender, EventArgs args)
 9{
10    var clearArgs = ((ManagedCacheClearRemoteEventArgs)args).ToArgs<ManagedCacheClearRemoteEventArgs>();
11    if (clearArgs == null || string.IsNullOrEmpty(clearArgs.CacheName))
12        return;
13
14    Log.Info($"Remote clear-cache event received for '{clearArgs.CacheName}' " +
15             $"(raised by {clearArgs.UserName})", this);
16
17    var cache = _managedCacheService.GetCacheByName(clearArgs.CacheName);
18    cache?.Clear();
19}

A few things worth flagging because they shape what you can and cannot do with this:

  • The payload carries a cache name, not a key pattern. Remote subscribers always do a full Clear() of the named cache. If you need surgical per-entry invalidation across instances, you cannot ride this channel. You have to call into each instance directly.
  • Delivery is asynchronous. Sitecore’s EventQueue poll interval (EventQueue.SecurityInvalidationInterval and friends) governs latency. In a typical configuration remotes catch up within a few seconds, but the local call returns immediately and you should not assume the rest of the farm has cleared by the time your code moves on.
  • Subscribers only act on managed caches. The remote handler routes through GetCacheByName(cacheName) on IManagedCacheService, which does not see Sitecore’s built-in CacheManager caches by default. That is intentional: the platform already has its own cache-clearing events for those.
  • It is naturally idempotent. Receiving two “clear ProductSolrCache” events back to back just clears an already-empty cache the second time. There is no need to deduplicate at the call site.

The shape of this design is what makes the Cache Insights API covered in the follow-up post so cheap to build. Once you have a per-cache “clear by name” message bus already running, exposing a propagate=true switch on an HTTP flush endpoint is a one-line extension: do the local clear, then call RaiseClearCacheEventOnRemotes(cacheName, userName) and let the event queue do the rest.

Results

After wrapping the heaviest Solr-backed services with this cache, the results were significant:

  • Solr query volume dropped dramatically - most pages now serve entirely from cache, with Solr queries only firing after publishes or when the cache is cold
  • Page render times improved - eliminating serialization round-trips to Solr on every request made a noticeable difference
  • Infrastructure costs decreased - lower Solr load meant we could stay within quota allocations without scaling
  • Developer adoption was fast - the transparency of the extension method meant that adding caching to a new service was a 10-minute change, not a half-day refactor

Getting the Code

The full implementation is available in the PerformantSitecore repository on GitHub. The Foundation.ManagedCache project contains everything described here. Drop the DLL and config file into your Sitecore 10.x solution, register the DI service, and start wrapping your expensive calls.

This is for Sitecore XM/XP 10.x running on .NET Framework 4.8. It requires the SXA caching package (Sitecore.XA.Foundation.Caching) since it builds on top of the SXA DictionaryCache class.

The repository also contains a Foundation.SqlDataProvider module that caches GetChildIdsByName calls at the data provider level, reducing SQL round-trips by 7-10x. That is a different optimization targeting a different bottleneck, covered in a separate blog post.

Comments