Question

Hello,

i have a big performance problem on an oracle 11g database.

I have a table with 4 columns:

uuid as raw(16)
loc_id as varchar(40)
starttime as timestamp with timezone
endtime as timestamp with timezone
eventtext as varchar2(40)

I have about 12 million records in the table. Now I have a statement like this:

SELECT *
  FROM events
 where loc_id like '960%'
   and starttime <  end_of_analysis
   and endtime   >= start_of_analysis;

The problem is that if the query period is far in the past, the performance is much better than a query period in the present or in the future. My index consists loc_id, starttime and endtime. Is there any way to increase the performance of this statement?

Answer 1

Hi!

You are running into the chess-board problem, described in this article on index merge.

That means, you have more than one independent range condition. Your particular query has actually three independent (on different columns) range conditions (because LIKE 'xxx%' is a range condition as well).

There are only two things you could do differently:

Index every column on it's own and go for BITMAP conversions (briefly described in the article).
Re-arrange the column in the index. This is the only case where the most selective column should be in the first position. It is, however, strange that your query is sensitive to the query period, as loc_id seems to be in the first position in the index? Is the number of rows returned different, if you query for a period from the past, compared to a query in the present?

Posting an execution plan is always helpful: how-to get an Oracle execution plan.

Answer 2

Hello!

Thanks for the quick answer.

The number of rows returned is the same but there are much less different loc_ids in the table than rows (About 200 different loc_ids and 12 Million rows) and the loc_ids differ mostly in the last digits (e.g. 00000000000300100001, 00000000000300200001, 00000000000300300001, ...) because its an location code: 000000000003 means germany and 000000000001 means USA. And till now there are only data from germany inserted in the table.

Here is the execution plan

Plan hash value: 594245215
| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT            |                    | 21068 |  2983K| 29292   (1)| 00:05:52 |
|   1 |  TABLE ACCESS BY INDEX ROWID| LOCATION_EVENTS    | 21068 |  2983K| 29292   (1)| 00:05:52 |
|*  2 |   INDEX RANGE SCAN          | LOCATION_EVENTS_I7 |   683 |       | 29239   (1)| 00:05:51 |

Predicate Information (identified by operation id):
   2 - access("SE"."LOCATION_ID" LIKE '000000000003%' AND 
              SYS_EXTRACT_UTC("EVENT_ENDTIME")>=SYS_EXTRACT_UTC(TO_TIMESTAMP_TZ('11.10.2011 
              00:00:00,000000')) AND SYS_EXTRACT_UTC("EVENT_STARTTIME")<SYS_EXTRACT_UTC(TO_TIMESTAMP_TZ(
              '11.10.2011 06:00:00,000000')))
       filter(SYS_EXTRACT_UTC("EVENT_ENDTIME")>=SYS_EXTRACT_UTC(TO_TIMESTAMP_TZ('11.10.201
              1 00:00:00,000000')) AND SYS_EXTRACT_UTC("EVENT_STARTTIME")<SYS_EXTRACT_UTC(TO_TIMESTAMP_T
              Z('11.10.2011 06:00:00,000000')) AND "SE"."LOCATION_ID" LIKE '000000000003%')

Answer 3

I guess, it is the same execution plan, no matter which dates you query? Are your statistics up-to date?

Currently, your LIKE filter selects every row, because you table contains records for Germany only, right? Is that going to change? If yes, can you test with more appropriate test data? Testing with incomplete data results in wrong optimizations very often, so I advice to take/generate appropriate test data.

If I understood your LIKE filter correctly, the INDEX RANGE SCAN will actually scan the entire index and then access the table to fetch the matching rows. A full table scan could be faster, in that case. However, if you add other location codes, it will read a part of the index only, so that the full table scan doesn't make sense anymore.

If you LIKE filter will stay unselective, you should use another column on the first index position (since you have range conditions only, you should take the most selective column -- the only exception to the myth).