30. November 2007, 10:20, by Maarten Manders

Caching of Dynamic Data Sets

Consider you have a set of data that is changing dynamically for each page request and you need to cache that data the fastest way possible. You can’t cache dynamic and unpredictable data as a whole, can you? Hence, we would put each data entry into cache separately to be able to fetch it separately and dynamically. But this means bombing your cache infrastructure with with requests.

Caching Text Elements

Let’s get more concrete. To translate tilllate.com into all different languages, we use text elements (like gettext). For storage we are using MySQL and thus each text element is a row in the translation table. While this storage is very easy to maintain, it is quite silly to use in production, where you have ~100 text elements per page and peaks of 1500 page requests per second with a resulting 150’000 MySQL queries per second. Don’t even ask, we don’t do it. But even for a highly scalable memcached infrastructure, a 150’000 requests per second just isn’t easy to digest.

Let’s talk about a better solution. It consists of three concepts: Two-Tiered Caching, Incremental Caching and Cache Versioning.

Two-Tiered Caching

This is an obvious one – bear with me, it gets more interesting.

To reduce the number of cache retrievals, you should group your data items together, fetch them all at once and store them in a local data structure. For example, an associative array in PHP is always faster and causes less I/O than a memcached call. We call this feature Application Cache.

This means, that we have two layers of caching – the Application Cache and memcached – hence the name Two-Tiered Caching. But there’s problem with this solution: how should we group our data entries?

Incremental Caching

I was talking about dynamicly changing data. For tilllate, we never know what text elements are going to be used on a page. There are always random text elements or customized messages and errors, that will change the set of needed text elements. Because of this, it is hard to predict, what to put in cache. Certainly not all our text elements at once, because they weigh a few megabytes.

Instead, we will exploit the fact, that the used text elements for one page are always roughly the same. So if the amount of dynamic (differing) data per page is rather small, it makes sense to just include that to the cache as well. This is what I call Incremental Caching:

  • Just retrieve the text elements for a page from memcached and store them into an application cache
  • Whenever you have a cache miss, get the data from MySQL and store it into a delta buffer
  • At the end of page, add application cache and delta buffer together and write it back to memcached, if needed

Cache Versioning

But Maarten, with Two-Tiered Caching and Incremental Caches you are writing overlapping data to memcached. That’s no good, think about invalidation!

That’s right, overlapping data in cache isn’t the best of all ideas. In this case it certainly makes sense, but we have to think about invalidation. What if we need to invalidate all our text elements after a translator’s change? Or just the member-related text elements? It would be nice to have some tagging functionality for memcached to be able to invalidate elements by tag. memcached doesn’t offer this – which is good news, because it would probably scale badly.

Fortunately, there is a trick to do it: Cache Versioning. Inspired by Koz, we created a system that lets you have virtual tags, just by adding a version to your cache keys.

For example, we turn the cache key tilllate_translation_for_memberpage into tilllate_translation_for_memberpage_1523 by adding a version. Whenever the text elements change, we bump this version, which invalidates all cache keys for our text elements. The cache version numbers can be stored in MySQL and be cached as well. Problem solved.

Sounds great – give me the code!

We have an implementation to enable virtual memcached tags for Zend_Cache. I’ll have to look through the code and see if I can turn it into a proposal.

Filed under: PHP,Web Development

6 Comments

  1. […] PHP: Caching of Dynamic Data Sets Caching of Dynamic Data Sets (tags: cache php programming technique resource tutorial) […]

    Pingback by links for 2007-12-01 - smalls blogger — 1. December 2007 @ 01:32

  2. I don’t know your specific needs, but since page translations are something not so big, and they do not tend to change so much in time, I prefer to store them locally in shared memory (with APC, for example). Of course, everything is loaded with a rational TTL, and you get the benefit of not doing any request just for translations. So this would be Three-Tiered caching.

    Also, if you deploy a nice set of classes for Caching, you can move a certain object from one tier to another without any hassle.

    Regards.

    Comment by Mauro — 1. December 2007 @ 20:35

  3. […] the Tilllate Blog, there’s a new post discussing the use of caching in applications, specifically for dynamic data. Consider you have a […]

    Pingback by developercast.com » Tilllate Blog: Caching of Dynamic Data Sets — 5. December 2007 @ 18:36

  4. i realy do not understand why you need the translated strings in the database. what is the benefit compared to some plain files in the filesystem (csv/mo)?

    Comment by me — 3. January 2008 @ 07:47

  5. […] clever caching and proper code we managed to reduce the number of queries on all pages. With a clever design, […]

    Pingback by techblog.tilllate.com » Trevi is online!- — 7. January 2008 @ 08:39

  6. @me: At our scale it doesn’t matter if you save textelements to a textfile or a database. You get some benefits from the textfile like speed and light weight handling. But more complex handling (think distributing over the cluster), Performance issue (processing some megs of textfile needs also resources) and backup (backup of the database is free, because we do it anyway) kill this benefits. At our scale you have to cache it anyway and handling of the database is just easier for us because we have to do it anyway.

    Comment by Leo Büttiker — 24. January 2008 @ 10:02

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

© 2014 tilllate Schweiz AG - Powered by WordPress