Hi guys

I hope someone can help me understand a performance issue I'm having. I regularly have to import many millions of rows into an empty InnoDB table. Somewhere between 50 and 500 million. The tables are always very narrow, and only contain one index - a PK on a varchar(10).

The strategy I have settled upon is removing the PK, importing using LOAD DATA and then adding the PK. My question is why is adding the PK when the data is all imported so much faster than importing while the PK is on the table? What additional work is MySQL doing that causes such a large time difference?

Any help greatly appreciated.

Thanks Andy

asked Oct 01 '12 at 18:23

sponge's gravatar image

sponge
1111


Hi!

Your strategy — drop index, insert, recreate index — is quite common for ETL processes.

Creating an index usually just reads everything, sorts everything and writes the index. If the index is maintained during INSERT, the database has to update the index for every row individually.

See also: http://use-the-index-luke.com/sql/dml/insert

answered Oct 17 '12 at 18:54

Markus%20Winand's gravatar image

Markus Winand ♦♦
5565714

It also has to do checks to fulfill uniqueness constraints. While inserting row 50 000 000, database has to check against 49 999 999 records if there is no duplicates. Of course, it uses the index, but that's still close to exponential progression.

answered Mar 25 '13 at 19:25

Lukas's gravatar image

Lukas
1

Your answer
toggle preview