Unix Technical Forum

copy from performance on large tables with indexes

This is a discussion on copy from performance on large tables with indexes within the Pgsql Performance forums, part of the PostgreSQL category; --> Hello, Postgres: 8.2 os: Linux 4CPU, 4 GB RAM, Raid 1, 32 bit system work_mem: 600 Mb I have ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 10:57 AM
Marc Mamin
 
Posts: n/a
Default copy from performance on large tables with indexes


Hello,


Postgres: 8.2
os: Linux 4CPU, 4 GB RAM, Raid 1, 32 bit system
work_mem: 600 Mb


I have some tables which may become quite large (currently up to 6 Gb) .
I initially fill them using copy from (files) .

The import is fast enough as I only have a primary key on the table:
about 18 minutes
(over 300 Mb/minute)

Then I need 5 additional indexes on it. Creation time: 30 minutes


subsequently I compute some aggregations which need 4 hours and 30
minutes additional time


And now the problem:

If I get additional data for the table, the import become much more
slower due to the indexes (about 30 times slower !):

The performance degradation is probably due to the fact that all
indexs are too large to be kept in memory.
Moreover I guess that the indexes fill factors are too high (90%)

During this second import, I have about 20% iowait time.



The usual solution is to drop the indexes before the second import and
rebuild them afterwards, but I feel unconfident doing this as I don't
know how the system will react if some SELECT statements occures when
the index are missing. I can hardly avoid this.


So my idea for the second import process:


1) make a copy of the table:

create table B as select * from table A;
alter table B add constraint B_pk primary key (id);


2) import the new data in table B

copy B from file;

3) create the required indexes on B

create index Bix_1 on B..
create index Bix_2 on B..
create index Bix_2 on B..
create index Bix_2 on B..

4) replace table A with table B

alter table A renam to A_trash;
alter table B renam to A;
drop table A_trash;

(and rename the indexes to get the original state)





This seems to work but with side effects:

The only objects that refer to the tables are functions and indexes.

If a function is called within a same session before and after the table
renaming, the second attempt fails (or use the table A_trash if it still
exists). So I should close the session and start a new one before
further processing. Errors in other live sessions are acceptable, but
maybe you know a way to avoid them?)



And now a few questions :-)

- do you see any issue that prevent this workflow to work?

- is there any other side effect to take care of ?

- what is the maximum acceptable value for the parameter work_mem for my
configuration
(see the complete configuration below)

- has anybody built a similar workflow ?

- could this be a feature request to extend the capabilities of copy
from ?



Thanks for your time and attention,

Marc Mamin



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:55 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com