I have a site where a user can upload videos. I want to be able to filter users by the count of videos they have uploaded. For example, there will be a page displaying all users (paginated, of course), and in addition to being able to sort users alphabetically or by most recent, one option is to 'sort user by # of video credits'. These are the two tables related to this that I have related to this -- one for userprofile and one for videoinfo (which includes an FK to userprofile).
I am currently doing a COUNT(*) on the videos, but it is taking forever to do. How would you suggest doing this efficiently? (My thought was to do it programmatically, by adding a field in the userprofile called count and add +1 or -1 to the user after each video is uploaded/deleted, but I'd rather do it in the db). Thank you very much for your help, and am a huge fan of the site! Here is the current query:
asked Jul 04 '11 at 21:57 david542 |
Assuming the "troublesome" query looks like this:
First of all, there is no magic index for this kind of problem. An index can only contain "static" information. However, there are still some ways to improve performance. The fastest way is to make the video count part of the users table/view--as you suggested. Other databases have some extend of native support for that (E.g, Oracle Materialized Views or SQL Servers "Indexed Views"). There is no such feature in MySQL, so you would need to implement that on your own. Either in the application, or in the DB (with a trigger). If that is not possible or wanted, the next solution is to make sure the expensive part (group by/order by) is done as efficient as possible. There are two optimizations you can use:
That allows canning the index, leveraging the index order to sorting and using any temporary memory. Your index select v.uploaded_by_id, count(*) cnt from userprofile_videoinfo v group by v.uploaded_by_id; Note that the execution plan says 'Using index' only, no sort.
The sort itself must be done, but in combination with LIMIT 10 makes it possible to save memory, because the final result set is anyway limited to 10 records. The database must only keep the 10 biggest values, discarding any other rows. Howerver, I'm not exactly sure if MySQL really does that in this particular case: http://dev.mysql.com/doc/refman/5.6/en/limit-optimization.html select v.uploaded_by_id, count() cnt from userprofile_videoinfo v group by v.uploaded_by_id order by count() desc limit 10; You cannot get faster than that. The next step is easy, just add the user details, e.g., using the above query as inline-view:
The important factors are:
All of that, just assuming that your original query looks like above. That is, however, the most efficient query you can get. If that is still too slow, you have to maintain the video count somewhere. answered Jul 05 '11 at 10:08 Markus Winand ♦♦ @Markus, wow is this helpful, thank you! I updated my question to show the query I was using -- basically the same, however I was using a LEFT JOIN because I wanted to show all users (even if count = 0). Anyways, I will study your solution, especially the inline part which I really haven't forayed into with my limited SQL skills. It turns out after doing some tests that it was the ORM that was making it so slow. I am using django, and it was generating a GROUP BY that was really killing the speed, so if I tried a raw SQL call and it goes much faster (at least, compared to the ORM).
(Jul 05 '11 at 10:23)
david542
@david525 Remember the tip from the Book: Get to know your ORM [and take control of joins] The ORM is often, well, very often, generating sub-optimal SQL :(
(Jul 05 '11 at 10:26)
Markus Winand ♦♦
|