Consider you have a set of data that is changing dynamically for each page request and you need to cache that data the fastest way possible. You can’t cache dynamic and unpredictable data as a whole, can you? Hence, we would put each data entry into cache separately to be able to fetch it separately and dynamically. But this means bombing your cache infrastructure with with requests.
Caching Text Elements
Let’s get more concrete. To translate tilllate.com into all different languages, we use text elements (like gettext). For storage we are using MySQL and thus each text element is a row in the translation table. While this storage is very easy to maintain, it is quite silly to use in production, where you have ~100 text elements per page and peaks of 1500 page requests per second with a resulting 150’000 MySQL queries per second. Don’t even ask, we don’t do it. But even for a highly scalable memcached infrastructure, a 150’000 requests per second just isn’t easy to digest.
Let’s talk about a better solution. It consists of three concepts: Two-Tiered Caching, Incremental Caching and Cache Versioning.
This is an obvious one – bear with me, it gets more interesting.
To reduce the number of cache retrievals, you should group your data items together, fetch them all at once and store them in a local data structure. For example, an associative array in PHP is always faster and causes less I/O than a memcached call. We call this feature Application Cache.
This means, that we have two layers of caching – the Application Cache and memcached – hence the name Two-Tiered Caching. But there’s problem with this solution: how should we group our data entries?
I was talking about dynamicly changing data. For tilllate, we never know what text elements are going to be used on a page. There are always random text elements or customized messages and errors, that will change the set of needed text elements. Because of this, it is hard to predict, what to put in cache. Certainly not all our text elements at once, because they weigh a few megabytes.
Instead, we will exploit the fact, that the used text elements for one page are always roughly the same. So if the amount of dynamic (differing) data per page is rather small, it makes sense to just include that to the cache as well. This is what I call Incremental Caching:
- Just retrieve the text elements for a page from memcached and store them into an application cache
- Whenever you have a cache miss, get the data from MySQL and store it into a delta buffer
- At the end of page, add application cache and delta buffer together and write it back to memcached, if needed
But Maarten, with Two-Tiered Caching and Incremental Caches you are writing overlapping data to memcached. That’s no good, think about invalidation!
That’s right, overlapping data in cache isn’t the best of all ideas. In this case it certainly makes sense, but we have to think about invalidation. What if we need to invalidate all our text elements after a translator’s change? Or just the member-related text elements? It would be nice to have some tagging functionality for memcached to be able to invalidate elements by tag. memcached doesn’t offer this – which is good news, because it would probably scale badly.
Fortunately, there is a trick to do it: Cache Versioning. Inspired by Koz, we created a system that lets you have virtual tags, just by adding a version to your cache keys.
For example, we turn the cache key tilllate_translation_for_memberpage into tilllate_translation_for_memberpage_1523 by adding a version. Whenever the text elements change, we bump this version, which invalidates all cache keys for our text elements. The cache version numbers can be stored in MySQL and be cached as well. Problem solved.
Sounds great – give me the code!
We have an implementation to enable virtual memcached tags for Zend_Cache. I’ll have to look through the code and see if I can turn it into a proposal.