I have a site where a user can upload videos. I want to be able to filter users by the count of videos they have uploaded. For example, there will be a page displaying all users (paginated, of course), and in addition to being able to sort users alphabetically or by most recent, one option is to 'sort user by # of video credits'.

These are the two tables related to this that I have related to this -- one for userprofile and one for videoinfo (which includes an FK to userprofile).

CREATE TABLE `userprofile_userprofile` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `full_name` varchar(100) NOT NULL,
   ...
 )

CREATE TABLE `userprofile_videoinfo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(256) NOT NULL,
  `uploaded_by_id` int(11) NOT NULL,
   ...
  PRIMARY KEY (`id`),
  KEY `userprofile_videoinfo_e43a31e7` (`uploaded_by_id`),
  CONSTRAINT `uploaded_by_id_refs_id_492ba9396be0968c` FOREIGN KEY (`uploaded_by_id`) REFERENCES `userprofile_userprofile` (`id`)
)

I am currently doing a COUNT(*) on the videos, but it is taking forever to do. How would you suggest doing this efficiently? (My thought was to do it programmatically, by adding a field in the userprofile called count and add +1 or -1 to the user after each video is uploaded/deleted, but I'd rather do it in the db). Thank you very much for your help, and am a huge fan of the site!

Here is the current query:

SELECT u.id, COUNT(v.uploaded_by_id)
FROM userprofile_userprofile u
LEFT OUTER JOIN userprofile_videoinfo v
ON v.uploaded_by_id = u.id
GROUP BY u.id
ORDER BY count(v.uploaded_by_id) DESC

asked Jul 04 '11 at 21:57

david542's gravatar image

david542
16114

edited Jul 05 '11 at 10:15


One Answer:

Assuming the "troublesome" query looks like this:

select u.id, u.full_name, count(*)
  from userprofile_userprofile u
  join userprofile_videoinfo v ON (u.id = v.uploaded_by_id)
 group by u.id, u.full_name
 order by count(*) desc
 limit 10;

First of all, there is no magic index for this kind of problem. An index can only contain "static" information. However, there are still some ways to improve performance.

The fastest way is to make the video count part of the users table/view--as you suggested. Other databases have some extend of native support for that (E.g, Oracle Materialized Views or SQL Servers "Indexed Views"). There is no such feature in MySQL, so you would need to implement that on your own. Either in the application, or in the DB (with a trigger).

If that is not possible or wanted, the next solution is to make sure the expensive part (group by/order by) is done as efficient as possible. There are two optimizations you can use:

  1. implementing GROUP BY using an Index. http://dev.mysql.com/doc/refman/5.6/en/group-by-optimization.html

That allows canning the index, leveraging the index order to sorting and using any temporary memory.

Your index userprofile_videoinfo_e43a31e7 is just fine for that, because it allows grouping by the user:

    select v.uploaded_by_id, count(*) cnt
      from userprofile_videoinfo v
     group by v.uploaded_by_id;

Note that the execution plan says 'Using index' only, no sort.

  1. sorting and limiting

The sort itself must be done, but in combination with LIMIT 10 makes it possible to save memory, because the final result set is anyway limited to 10 records. The database must only keep the 10 biggest values, discarding any other rows. Howerver, I'm not exactly sure if MySQL really does that in this particular case: http://dev.mysql.com/doc/refman/5.6/en/limit-optimization.html

select v.uploaded_by_id, count() cnt
      from userprofile_videoinfo v
     group by v.uploaded_by_id
     order by count() desc
     limit 10;

You cannot get faster than that. The next step is easy, just add the user details, e.g., using the above query as inline-view:

select u.id, u.full_name, v.cnt
  from userprofile_userprofile u
  JOIN (select v.uploaded_by_id, count(*) cnt
          from userprofile_videoinfo v
         group by v.uploaded_by_id
         order by count(*) desc
         limit 10
        ) v on (u.id = v.uploaded_by_id)
 order by v.cnt desc;

The important factors are:

  1. avoid accessing all users by limiting on the video table first
  2. Make sure not to access any not-indexed columns from the video table

All of that, just assuming that your original query looks like above. That is, however, the most efficient query you can get. If that is still too slow, you have to maintain the video count somewhere.

answered Jul 05 '11 at 10:08

Markus%20Winand's gravatar image

Markus Winand ♦♦
93651120

@Markus, wow is this helpful, thank you! I updated my question to show the query I was using -- basically the same, however I was using a LEFT JOIN because I wanted to show all users (even if count = 0). Anyways, I will study your solution, especially the inline part which I really haven't forayed into with my limited SQL skills.

It turns out after doing some tests that it was the ORM that was making it so slow. I am using django, and it was generating a GROUP BY that was really killing the speed, so if I tried a raw SQL call and it goes much faster (at least, compared to the ORM).

(Jul 05 '11 at 10:23) david542

@david525 Remember the tip from the Book: Get to know your ORM [and take control of joins] The ORM is often, well, very often, generating sub-optimal SQL :(

(Jul 05 '11 at 10:26) Markus Winand ♦♦