When creating an index on a column, should the id/primary key column of that table be included in the Index?

(The primary key serves as a foreign key to another table)

On the one hand I think yes the id column should be included. Because index returns only the 'internal row number' and the database would have to lookup the actual id (foreign key) in the table.

On the other hand I feel like no, because this is the most probable use case for an index an I feel like it is unnecessary.

Update: using MS SQL Server 2008 R2 the primary key does not need to be included in the secondary index, because of the default creation of a primary non-clustered index on the primary key column.

Instead of this "internal row number" the primary key is returned from the seconardy index.

asked Jan 03 '12 at 13:19

Jan's gravatar image

Jan
26226

edited Jan 03 '12 at 15:24


One Answer:

The answer is: May be.

It seems like you wonder about Index-Only Scans in conjunction with joins.

Let's start with a special case: Secondary Index on Index-Organized Tables (aka. Clustered Indexes) ALWAYS include the clustering key (often the primary key) automatically. That means, if you use MySQL with InnoDB, all secondary indexes include the PRIMARY KEY automatically.

On the other hand, the primary key alone might not prevent the table access. E.g.:

SELECT t1.id, t1.a, t1.b, t2.x
  FROM t1
  JOIN t2 ON (t1.id = t2.id)
 WHERE t2.a = ?

That query would probably benefit from an Index on T1.A. Including T1.ID into the index will not improve performance, because T1.B is still fetched from the table itself.

Therefore, the question to be asked is not if you should include the primary key into the index. You should ask if the query would benefit from an Index-Only Scan. If yes, you need to include all required columns to the index to make it work.

Let me know if I misunderstood your question. I'm especially puzzled about the "because this is the most probable use case for an index an I feel like it is unnecessary" part.

answered Jan 03 '12 at 13:48

Markus%20Winand's gravatar image

Markus Winand ♦♦
93651120

Great answer, thanks a ton. I use MS SQL Server 2008. From the chapter in the book, I understand that there should be a default non-clustered on the id/PK. Therefore my secondary index retrieves that id/PK and everything is fine even without including it specifically. Actually now that I understood it, I will test it with the execution plan.

(Jan 03 '12 at 15:18) Jan