We have a table of events (95M rows,growing fast, MS SQL). Simplified columns:
We often start with a simply query:
and then nest it to find the event directly previous to that ones (the Max event id smaller than the current and same user id):
These queries are very slow, my question is how can I learn about why they are slow? Is there a better way to approach this (other DB?) ? Should I look into MapReduce scenario to speed up (query could be done per user and then joined) Update (execution plan): http://pastebin.com/5CdPCe0S From the execution plan I learned, that most of the query is spent on a LIKE condition. I will first try to create an index on that. asked Nov 30 '11 at 13:45 Jan |
I'll post your execution plan here, and add some comments:
You are having three Clustered Index Scans. Remember that SCAN is "bad" in SQL Server Short Reference of Execution Plan operations. These are the obvious points to start with:
The
That is part of a Hash-Join. Without knowing the data, it is hard to tell if it is done because there is no proper index for other join methods, or because the result set is large enought so that hash-join makes sense. There is, of course, also an explaination how to index Hash-Joins.
It is also hard to tell, why exactly. Hard to believe there is no index on s.id? Besides that, I wonder if there is an index on h.user_id? The whole join sequence might better start off there? But that depends on the data distribution as well. The join on Hope that gives you the right hints. Have a look at my Instant Coaching Service as well. That a very efficient way to join your application and data knowledge with my database experience to find the best solution quickly. answered Dec 01 '11 at 07:55 Markus Winand ♦♦ Awesome detail Markus, I feel bad for you to have spent this time before I even fixed all the obvious "missing" index. But now I am well equiped to prepare better next time we have a slower query.
(Dec 09 '11 at 15:01)
Jan
|
Learn how to read the execution plan http://use-the-index-luke.com/sql/explain-plan/sql-server/getting-an-execution-plan Then setup Indexes as needed. "Use the index Luke" >> indeed! answered Nov 30 '11 at 15:31 Jan |
@Jan, the first step is to look at the execution plan. Here is some guide how to do that: http://use-the-index-luke.com/sql/explain-plan/sql-server/getting-an-execution-plan If you add that to your question, I'll have a look at it.
@Markus thanks for the tip, that give me something to investigate. 87% of cost are on a "clustered Index scan" where I select rows like that name LIKE 'Notification %'. I will try to find out how to speed that up