5. January 2008, 22:16, by Leo Büttiker

Why paging needs a lot of performance

On the view of your database the worst thing you can do in your web app is paging. Paging is horrible in the view of performance. To explain let me make a little example:

SELECT SQL_CALC_FOUND_ROWS gb.*,
 u.username,
        u.uid,
        u.geschlecht,
        u.mitfoto,
        [... some more fields...]
FROM member_gold_guestbook gb
LEFT JOIN users u ON u.uid=gb.uid_from
[... some more left joins...]
WHERE gb.uid_to='22152'
AND visible='1'
LIMIT 0,10;

That’s not that bad at all, but when you go to page 300 your database server will hate you for this. The database server has not only to calculate the 10 items you want to show but also all 3000 previous items.

Sure you may argue nobody will ever go to page 300. Somebody will not, but “googlebot” and his evil brothers will. And the bad thing is that you can do nothing against it, as long as you need paging. There are just a few tricks that may reduce your server load a bit.

Optimize it!

A fast query that uses good indexes is just faster, no matter where you use it. But on paging where you might calculate thousands of lines it really does matter a lot.

Avoid ORDER BY and GROUP BY because you have to do this (depending on index use) on every line in your table, the limit can’t help you here.

Cache it! But still the query has to be run once, especially for fulltext searches, even the first run of a query might put a lot of load on your servers.

YAGNI (You aint gone need it!)

Nobody will ever see page 300, so why you don’t set an upper limit for you paging. Your user will not mind and I’m sure the bots will find another way to your stuff. If you really care about SEO, you probably can do a separate list for bots that perform with less operations on the database (remove unneeded joins, don’t sort it).

Don’t join tables you don’t need, sure that might sound obvious, but I’ve seen this too many times to not mention this here.

Count is evil

Never, never use SQL_CALC_FOUND_ROWS it’s just equally slow then “SELECT COUNT(*) FROM table”, but you have to do it on every page. The count(*) variant you can cache at last, so you don’t have to do it on every page.

And there’s even another way to avoid the count on paging. It’s the way Facebook does paging on some places. Facebook don’t give you the usually list of pages from 1 to n there, were you can click at any page. They just give you the page before, the current page and the next page, if there’s one. On the application side it’s very easy to find out if you have a page before the current, when you are on the second page there’s one before (so no surprise here). But what about the next page if you don’t want to make a count. Easy stuff, let me predict you display 10 items per page, so query 11 items instead of 10 per page. This one extra item will cost you nearly nothing and now you can count the returned rows in your application. If you have more than ten rows you have a next page and you can happily throw number eleven away.

This is a crossposting of the blog article “Why your database says paging sucks!” on my private blog.

Filed under: Database,PHP

14 Comments

  1. I don’t think I completely agree with your comments here.

    Sure if you do limits on a query that is poorly indexed your performance is going to suffer. That doesn’t mean you can’t add a more useful index. Also, if you are paging without ORDER BY then you technically lose predictability in your results. Depending on the volatility of the data being reported on it is possible for some rows to just get missed due to the used indexes being changed.

    Another thing that should be considered with your advice is depending on the architecture of the system in question it is pretty likely that you don’t want to push 3000 records across your network. As far as the database is concerned it takes longer to send 3000 records than it does to just read past them…

    Comment by Mike Lively — 6. January 2008 @ 02:10

  2. Thanks for your comment Mike.

    Sometimes performance is also with indexes pretty poor, specialy when you have to do fulltextsearch (we will switch to sphinx soon, to get ride of them). But you’re right sometimes you can get a lot speed out of indexes. And sure not all “tricks” above will work together with your business requirement. There’s often not a good idea to leave ORDER BY away, because it will make your query pretty usless. But sometimes you can and it will save you a lot.

    I did suggested to do not paging and transmit just every record thats a (very, very) bad idea as well. It’s just that I want to explain that paging can be bad for performance even when you don’t send all 3000 records over the wire.

    Comment by Leo Büttiker — 6. January 2008 @ 11:30

  3. Another thing that I have done before, specifically with pagination in search results, is to process all results that would be displayed, store the primary key id’s somewhere (session data, memcache, something along those lines) and then splice out the id’s you need and do a lookup WHERE pk IN (list), quite effective and saves a lot of hassle at times

    Comment by Trophaeum — 6. January 2008 @ 12:40

  4. If there aren’t many writes (per second) on this table just use indexes as needed!
    Leaving ORDER BY away is a very bad idea because the order you are getting is not deterministic then.

    If you have big LIMITs for paging just use the right keys and try to split it into two querys (depending on how complex your first query ist): One to get the ID of the first record you are looking for and the second one like “SELECT blabla FROM table WHERE >= ORDER BY LIMIT 10. I experienced this to be much faster.

    The most important things are indexes – Get them right and your database will do incredible things.

    Comment by Freddy — 7. January 2008 @ 19:18

  5. @Trophaeum: Cool idea!
    @Freddy: Yes, indexes are very helpfull. Specialy when you can use them for the order as well. Sometimes deterministic is not so important, think about search results or lists that are mainly for googlebot indexing. I’m not 100 percent sure what you want to do with spliting queries, “ORDER BY id” and LIMIT will only work on very simple queries (and unfortunatly we don’t have anymore simple queries ;-)

    Comment by Leo Büttiker — 8. January 2008 @ 08:17

  6. I’ve just recently experienced this very issue.

    SQL_CALC_FOUND_ROWS is going to be problematic when paginating large datasets. Reason being MySQL is required to evaluate all the rows in the table even though you may limit results returned to say 10. See the mysqlperformanceblog.

    In this case I agree use COUNT(*) as separate query for determining the total no. of rows matched by a LIMITed query. At least on subsequent pages the query which performs the COUNT(*) can be query-cached.

    But you havent figured out the real reason why ORDER BY/GROUP BY clauses affect performance so severly.

    You need compound indexes. If you really really dont need an ORDER BY use ORDER BY NULL to negate the implicit ORDER BY created when performing a GROUP BY

    Comment by Quinton — 8. January 2008 @ 19:32

  7. […]  Why paging needs a lot of performance (0 visite) […]

    Pingback by Pourquoi la pagination coûte autant de performances — 10. January 2008 @ 10:35

  8. I presume gb.id is the unique ID for every gb-entry.
    If you do “SELECT gb.*, […] WHERE gb.uid_to=’22152′
    AND visible=’1′ LIMIT 0,10;” for page 1 you might probably do something like “SELECT gb.*, […] WHERE gb.uid_to=’22152′
    AND visible=’1′ LIMIT 2000,10;” or “[…] LIMIT 10 OFFSET 2000″ für page 200 which performs VERY bad in MYSQL (especially if you use ORDER BY gb.id which you should to assure the right order).
    Its much faster to do a “SELECT gb.id FROM [] ORDER BY gb.id DESC LIMIT 2000,1″ first (if and only if there is a key on “id”) and do “SELECT gb.*, […] WHERE […] AND id < [value from query 1] ORDER BY id DESC LIMIT 10;”. Query 1 only uses the index and is very very fast. Query 2 is the expensiv one doesn’t care anymore about the gb-entrys from the first 200 pages (because their ID is higher).

    I hope its more clear now.

    Comment by Freddy — 10. January 2008 @ 20:25

  9. Sure. And using a database will cost some performance as well…

    Optimization has to be done on the database level, some databases are better than others in this regard. I don’t buy your arguments. Obviously you are mixing problems related to your fulltext search with problems related to paging, ordering and indexes. And the solutions you provide are all but usable in real web applications.

    In the end, I wonder what is the point of this article. Querying a database is slow?

    Comment by Bertrand Mansion — 14. January 2008 @ 09:02

  10. @Quinton: ORDER BY/GROUP BY with LIMIT is slow because it can’t take profit from the LIMIT offset.
    @Freddy: Thanks I get it now. I have to try it out.
    @Bertrand: Yes, querying a database is slow! The point of this article is that ORDER BY/GROUP BY with LIMIT my causes very slow queries. You see this faster in queries with fulltextsearch, but it’s the same for fields with indexes, but not in that dramatic way. As long as you have no problems with doing paging, great! But when you run into performance issues you probably have to rethink how unusable my tips are.

    Comment by Leo Büttiker — 14. January 2008 @ 16:11

  11. You write – count is evil.. it’s not true.. yeah, maybe in mysql and funny SQL_CALC_FOUND_ROWS but in postgresql you don’t found it ;)

    Comment by cms — 15. January 2008 @ 11:31

  12. Doing to many counts is evil, allways! And with no prober caching strategie you run into this very easy, no mather wich db you use. Well MySQL’s SQL_CALC_FOUND_ROWS make *erhm* running easier.

    Comment by Leo Büttiker — 25. January 2008 @ 08:05

  13. You *could* also fiddle with the robots.txt file, to ward off the problems with an over-eager googlebot and his dastardly cousins ;)

    Comment by Mark — 31. March 2008 @ 09:19

  14. Well that’s true Mark. But it’s a question if you realy need this pages then, if they’re not vistited by bots they might be also not visited by any humans (’cause usualy they do not found time to click trough 1000 pages)

    Comment by Leo Büttiker — 13. April 2008 @ 12:03

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

© 2017 tilllate Schweiz AG - Powered by WordPress