30. November 2007, 10:20, by Maarten Manders
Consider you have a set of data that is changing dynamically for each page request and you need to cache that data the fastest way possible. You can’t cache dynamic and unpredictable data as a whole, can you? Hence, we would put each data entry into cache separately to be able to fetch it separately and dynamically. But this means bombing your cache infrastructure with with requests.
Caching Text Elements
Let’s get more concrete. To translate tilllate.com into all different languages, we use text elements (like gettext). For storage we are using MySQL and thus each text element is a row in the translation table. While this storage is very easy to maintain, it is quite silly to use in production, where you have ~100 text elements per page and peaks of 1500 page requests per second with a resulting 150’000 MySQL queries per second. Don’t even ask, we don’t do it. But even for a highly scalable memcached infrastructure, a 150’000 requests per second just isn’t easy to digest.
Let’s talk about a better solution. It consists of three concepts: Two-Tiered Caching, Incremental Caching and Cache Versioning.
21. November 2007, 23:59, by Silvan Mühlemann
After a day of speaking about about memcached, squid and sharding at the tilllate offices, Peter was still fresh enough to hold a talk about Advanced MySQL Query Optimization for the Webtuesday (notably on a Wednesday).
Even though we announced the talk just three days ago I’ve never seen the room at namics packed so much. Even the Ruby guys around Tristan cancelled their Höck and joined us! No surprise: Peter is well known through his Mysql Performance Blog.
20. November 2007, 09:47, by Leo Büttiker
Google has probably the biggest web-cluster on earth. But sometimes that’s just not enough. But let me start from the beginning.
After we launched the new AJAX-Gallery it was clear that it was a big success. But are we able to measure this success? It shouldn’t be a big problem, we have several statistic tools to measure how many pages we deliver. For the Swiss part of tilllate we use WEMF NetMetrix to get official numbers, our own tool (“Prince analytics”) to answer some questions we can’t with other tools and at the end the Google Analytics to answer our daily questions.
18. November 2007, 19:29, by Silvan Mühlemann
Busy week for the open source IT pros around Zürich: On Thursday Vint Cerf will talk at Google. On Wednesday, Nov 21st tilllate is happy to announce a presentation of Peter Zaitsev of the MySQL Performance Blog. He will talk about query optimization for high traffic sites.
Peter Zaitsev was manager of the High Performance Group at MySQL Inc. He specializes in MySQL Server performance as well as in performance of application stacks using MySQL, especially LAMP. Web sites handling millions of visitors a day dealing with terabytes of data and hundreds of servers is king of applications he loves the most.
11. November 2007, 18:01, by Silvan Mühlemann
After a long day of meetings and other tedious manager work the perfect way to relax is to code. The best is a mini-projects where you see your results after an hour or so. I call these tasks “Plausch-Projekte” (“plah-oosh project” =”fun projects”).
This week my plah-oosh projects were two metric tools for Ganglia. Besides Nagios Ganglia is the main monitoring tool for our cluster. We monitor something like 20 metrics like load, memory, disk usage, network activity.
Ciprian and Stefan recently built a script to monitor apache (bytes/sec, hits/sec, idle processes etc.) via the /server-status interface. Based on their work I hacked two scripts:
ganglia_mysql_metrics.php monitors multiple mysql parameters like queries/sec, slow queries/sec, threads connected:
ganglia_squid_metrics.php reports regularly about squid metrics: Requests/sec, service time, available file descriptors:
The scripts are quick and dirty code. Procedural. Not well documented. Does only read the mcast_port from the config file and ignores the rest. But it might be a good base to be used on your cluster too. Just call them every minute via the crontab.
7. November 2007, 23:04, by Silvan Mühlemann
Every few months at tilllate we play the query optimization game. At this game I use the slow query log to find out those queries the most load on the servers.
With the queries I found I then either: optimize the query or cache the results to avoid the query.
I prefer the former because caching means data duplication. Which is not very DRY.