Walter Vaughan <wvaughan@steelerubber.com> wrote:
> Xiaolei Li wrote:
>
>> I'm in the process of selecting a DBMS for a project. We've narrowed
>> down our choices to MySQL, Oracle, and PostgreSQL. One thing we're not
>> sure is the speed of bulk loading.
>
> Well I can address MySQL 5.0.27 and PostgreSQL 8.1.5 bulk loading on a freeBSD
> server. PostgreSQL is can be about twice as fast in our testing (YMMV) when
> mySQL is using InnoDB tables.
Bulk loading performance varies very much depending on how *exactly*
it is done. The MySQL manual contains a whole chapter on that topic.
In a nutshell:
- turn off AUTO-COMMIT !
- you can COMMIT in batches or just once after loading all data. I
suggest to COMMIT every 10.000 (or so) rows.
- if you import from SQL commands (SQL dump) - using MySQLs proprietary
multi-value INSERT will speedup the load
- prepared statements are good too, but can insert just one row per
execute. Multi-value INSERT and PS are approx. equally fast.
- LOAD DATA INFILE will be even faster, because parsing the raw file is
faster than parsing SQL
- keys should be deactivated for bulk loading
A MySQL issue could be the fact that MySQL uses only one thread - that
is: only one cpu core - per connection. So if you have multiple cpu
cores you should split your data and load in multiple connections.
I did so on a 16-core box and it scales quite nicely.
XL
--
Axel Schwenke, Senior Software Developer, MySQL AB
Online User Manual:
http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums:
http://forums.mysql.com/