Hi, Great site you have here. My question regards the use of the "order by RAND()", particularly in MySQL. There is often the need for websites to display items on their pages in a random order to give the visitor the sense of "always new stuff", etc. However, on a table with 10 thousand rows, ordering by rand means that the mysql server must read the entire table and because it's not using an index it's quite painfull.

Is there a way to use index and still getting the items ordered by a random function?

asked Jan 31 '11 at 12:01

myke123's gravatar image

myke123
16113


2 Answers:

Hi Myke,

That's an exciting question.

The short answer is: No, ORDER BY Rand() can not be indexed as such.

However, there might be another way to get a random result without ORDER BY RAND(). You didn't post an actual example, so I try to give you a skeleton that you might be able to apply to your case:

instead of

select id, columns
  from your_table
 order by rand()
 limit 1;

try to select a random ID that is indexed and (approximately) evenly distributed:

select id, columns
  from your_table
 where id >= rand() * (select max(id) from your_table) - 1
 order by id
 limit 1

Note the following:

  • The upper bound of the ID is fetched from the table itself, but that's very fast in all databases. In MySQL it should turn out to be 'optimized away' in the execution plan. (Oracle would use a min()/max() scan that is as efficient as an INDEX UNIQUE SCAN).
  • it's no problem if the ID doesn't start at 0 or has "holes" (missing ID's) because the generated random ID is used as starting point only.
  • the index range scan starts at the randomly generated number, but stops after the first record it matches (LIMIT 1).
  • You cannot fetch more than one random record in one shot. if you would increase the LIMIT, you would just receive the subsequent IDs. Edit: but you can use CTEs to fetch more than one in one shot: on github
  • just for paranoia, I subtracted 1 from the random number for that case that floating point arithmetic (rounding) leads to a random ID slightly higher than max(id)--but that shifts the probabilities a little bit.
  • unless your ID range is from 0 - max(id), without duplicates or missing IDs, you don't receive a real random value. However, you said that this is just used to provide a wrong impression to the uses, so I guess that's OK for that case.

answered Jan 31 '11 at 13:08

Markus%20Winand's gravatar image

Markus Winand ♦♦
93651120

edited Feb 23 '15 at 12:06

Hi Markus,

Thank you so much for replying to my question and for giving me the example model, so detailed and well-explained. Usually people want some sort of example to be provided by the one who asks the question and I know I did not gave one but you went the extra mile and provided your example, so tnx.

It is indeed pitty the solution cannot be used to obtain more records in one shot, but it is a great starting point.

I will try your example and make some tests to see what happens.

answered Feb 01 '11 at 07:25

myke123's gravatar image

myke123
16113