This is a discussion on Insert performance within the Pgsql Performance forums, part of the PostgreSQL category; --> Dear all, After many tests and doc reading, i finally try to get help from you... Here is my ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Dear all, After many tests and doc reading, i finally try to get help from you... Here is my problem. With some heavy insert into a simple BD (one table, no indexes) i can't get better perf than 8000 inserts/sec. I'm testing it using a simple C software which use libpq and which use: - Insert prepared statement (to avoid too many request parsing on the server) - transaction of 100000 inserts My server which has the following config: - 3G RAM - Pentium D - 64 bits, 3Ghz - database data on hardware raid 0 disks - x_log (WAL logs) on an other single hard drive The server only use 30% of the CPU, 10% of disk access and not much RAM... So i'm wondering where could be the bottle neck and why i can't get better performance ? I really need to use inserts and i can't change it to use COPY... Any advice is welcome. Sorry in advance for my bad understanding of database ! Thanks in advance. Regards, Joël.W |
| |||
| hatman wrote: > Dear all, > > After many tests and doc reading, i finally try to get help from > you... > > Here is my problem. With some heavy insert into a simple BD (one > table, no indexes) i can't get better perf than 8000 inserts/sec. I'm > testing it using a simple C software which use libpq and which use: > - Insert prepared statement (to avoid too many request parsing on the > server) > - transaction of 100000 inserts Are each of the INSERTs in their own transaction? If so, you'll be limited by the speed of the disk the WAL is running on. That means you have two main options: 1. Have multiple connections inserting simultaneously. 2. Batch your inserts together, from 10 to 10,000 per transaction. Are either of those possible? -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| joël Winteregg wrote: > Hi Richard, > >>> Here is my problem. With some heavy insert into a simple BD (one >>> table, no indexes) i can't get better perf than 8000 inserts/sec. I'm >>> testing it using a simple C software which use libpq and which use: >>> - Insert prepared statement (to avoid too many request parsing on the >>> server) >>> - transaction of 100000 inserts >> Are each of the INSERTs in their own transaction? >> > > No, as said above transactions are made of 100000 inserts... Hmm - I read that as just meaning "inserted 100000 rows". You might find that smaller batches provide peak performance. >> If so, you'll be limited by the speed of the disk the WAL is running on. >> >> That means you have two main options: >> 1. Have multiple connections inserting simultaneously. > > Yes, you're right. That what i have been testing and what provide the > best performance ! I saw that postgresql frontend was using a lot of CPU > and not both of them (i'm using a pentium D, dual core). To the opposit, > the postmaster process use not much resources. Using several client, > both CPU are used and i saw an increase of performance (about 18000 > inserts/sec). > > So i think my bottle neck is more the CPU speed than the disk speed, > what do you think ? Well, I think it's fair to say it's not disk. Let's see - the original figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds plausible to me for a round-trip to process a simple command - are you running the client app on the same machine, or is it over the network? Two other things to bear in mind: 1. If you're running 8.2 you can have multiple sets of values in an INSERT http://www.postgresql.org/docs/8.2/s...ql-insert.html 2. You can do a COPY from libpq - is it really not possible? -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| joël Winteregg wrote: > >>> No, as said above transactions are made of 100000 inserts... >> Hmm - I read that as just meaning "inserted 100000 rows". You might find >> that smaller batches provide peak performance. > > Ahh ok ;-) sorry for my bad english... (yeah, i have been testing > several transaction size 10000, 20000 and 100000) Not your bad English, my poor reading :-) >>>> If so, you'll be limited by the speed of the disk the WAL is running on. >>>> >>>> That means you have two main options: >>>> 1. Have multiple connections inserting simultaneously. >>> Yes, you're right. That what i have been testing and what provide the >>> best performance ! I saw that postgresql frontend was using a lot of CPU >>> and not both of them (i'm using a pentium D, dual core). To the opposit, >>> the postmaster process use not much resources. Using several client, >>> both CPU are used and i saw an increase of performance (about 18000 >>> inserts/sec). >>> >>> So i think my bottle neck is more the CPU speed than the disk speed, >>> what do you think ? >> Well, I think it's fair to say it's not disk. Let's see - the original >> figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds >> plausible to me for a round-trip to process a simple command - are you >> running the client app on the same machine, or is it over the network? > > I did both test. On the local machine (using UNIX sockets) i can reach > 18000 insert/sec with 10 clients and prepared statements. The same test > using clients on the remote machine provide me 13000 inserts/sec. OK, so we know what the overhead for network connections is. > Now, with multiple client (multi-threaded inserts) my both CPU are quite > well used (both arround 90%) so i maybe think that disk speeds are now > my bottleneck. What do you think ? or maybe i will need a better CPU ? > >> Two other things to bear in mind: >> 1. If you're running 8.2 you can have multiple sets of values in an INSERT >> http://www.postgresql.org/docs/8.2/s...ql-insert.html > > Yeah, i'm running the 8.2.3 version ! i didn't know about multiple > inserts sets ! Thanks for the tip ;-) Ah-ha! Give it a go, it's designed for this sort of situation. Not sure it'll manage thousands of value clauses, but working up from 10 perhaps. I've not tested it for performance, so I'd be interesting in knowing how it compares to your other results. >> 2. You can do a COPY from libpq - is it really not possible? >> > > Not really but i have been testing it and inserts are flying (about > 100000 inserts/sec) !! What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| * Richard Huxton <dev@archonet.com> [070306 12:22]: > >>2. You can do a COPY from libpq - is it really not possible? > >> > >Not really but i have been testing it and inserts are flying (about > >100000 inserts/sec) !! > > What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? Well, there are some issues. First your client needs to support it. E.g. psycopg2 supports only some specific CSV formatting in it's methods. (plus I had sometimes random psycopg2 crashes, but guarding against these is cheap compared to the speedup from COPY versus INSERT) Plus you need to be sure that your data will apply cleanly (which in my app was not the case), or you need to code a fallback that localizes the row that doesn't work. And the worst thing is, that it ignores RULES on the tables, which sucks if you use them Andreas ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Andreas Kostyrka wrote: > * Richard Huxton <dev@archonet.com> [070306 12:22]: >>>> 2. You can do a COPY from libpq - is it really not possible? >>>> >>> Not really but i have been testing it and inserts are flying (about >>> 100000 inserts/sec) !! >> What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? > Well, there are some issues. First your client needs to support it. > E.g. psycopg2 supports only some specific CSV formatting in it's > methods. (plus I had sometimes random psycopg2 crashes, but guarding against > these is cheap compared to the speedup from COPY versus INSERT) > Plus you need to be sure that your data will apply cleanly (which in > my app was not the case), or you need to code a fallback that > localizes the row that doesn't work. > > And the worst thing is, that it ignores RULES on the tables, which > sucks if you use them Ah, but two things deal with these issues: 1. Joel is using libpq 2. COPY into a holding table, tidy data and INSERT ... SELECT -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| * Richard Huxton <dev@archonet.com> [070306 13:47]: > Andreas Kostyrka wrote: > >* Richard Huxton <dev@archonet.com> [070306 12:22]: > >>>>2. You can do a COPY from libpq - is it really not possible? > >>>> > >>>Not really but i have been testing it and inserts are flying (about > >>>100000 inserts/sec) !! > >>What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? > >Well, there are some issues. First your client needs to support it. > >E.g. psycopg2 supports only some specific CSV formatting in it's > >methods. (plus I had sometimes random psycopg2 crashes, but guarding against > >these is cheap compared to the speedup from COPY versus INSERT) > >Plus you need to be sure that your data will apply cleanly (which in > >my app was not the case), or you need to code a fallback that > >localizes the row that doesn't work. > >And the worst thing is, that it ignores RULES on the tables, which > >sucks if you use them > > Ah, but two things deal with these issues: > 1. Joel is using libpq > 2. COPY into a holding table, tidy data and INSERT ... SELECT Clearly COPY is the way for bulk loading data, BUT you asked, so I wanted to point out some problems and brittle points with COPY. (and the copy into the holding table doesn't solve completly the problem with the dirty inconsistent data) Andreas ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Hi Andreas, Thanks for the info about COPY !! On Mar 6, 1:23 pm, andr...@kostyrka.org (Andreas Kostyrka) wrote: > * Richard Huxton <d...@archonet.com> [070306 12:22]:> >>2. You can do a COPY from libpq - is it really not possible? > > > >Not really but i have been testing it and inserts are flying (about > > >100000 inserts/sec) !! > > > What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? > > Well, there are some issues. First your client needs to support it. > E.g. psycopg2 supports only some specific CSV formatting in it's > methods. (plus I had sometimes random psycopg2 crashes, but guarding against > these is cheap compared to the speedup from COPY versus INSERT) > Plus you need to be sure that your data will apply cleanly (which in > my app was not the case), or you need to code a fallback that > localizes the row that doesn't work. > > And the worst thing is, that it ignores RULES on the tables, which > sucks if you use them Ok, but what about constraints (foreign keys and SERIAL id) using a copy statement ? do we need to handle auto-generated id (SERIAL) manually ? Thanks for your feedback. Regards, Joël |
| |||
| Hi Richard, > > >>> No, as said above transactions are made of 100000 inserts... > >> Hmm - I read that as just meaning "inserted 100000 rows". You might find > >> that smaller batches provide peak performance. > > > Ahh ok ;-) sorry for my bad english... (yeah, i have been testing > > several transaction size 10000, 20000 and 100000) > > Not your bad English, my poor reading :-) > > > > >>>> If so, you'll be limited by the speed of the disk the WAL is runningon. > > >>>> That means you have two main options: > >>>> 1. Have multiple connections inserting simultaneously. > >>> Yes, you're right. That what i have been testing and what provide the > >>> best performance ! I saw that postgresql frontend was using a lot of CPU > >>> and not both of them (i'm using a pentium D, dual core). To the opposit, > >>> the postmaster process use not much resources. Using several client, > >>> both CPU are used and i saw an increase of performance (about 18000 > >>> inserts/sec). > > >>> So i think my bottle neck is more the CPU speed than the disk speed, > >>> what do you think ? > >> Well, I think it's fair to say it's not disk. Let's see - the original > >> figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds > >> plausible to me for a round-trip to process a simple command - are you > >> running the client app on the same machine, or is it over the network? > > > I did both test. On the local machine (using UNIX sockets) i can reach > > 18000 insert/sec with 10 clients and prepared statements. The same test > > using clients on the remote machine provide me 13000 inserts/sec. > > OK, so we know what the overhead for network connections is. > > > Now, with multiple client (multi-threaded inserts) my both CPU are quite > > well used (both arround 90%) so i maybe think that disk speeds are now > > my bottleneck. What do you think ? or maybe i will need a better CPU ? > > >> Two other things to bear in mind: > >> 1. If you're running 8.2 you can have multiple sets of values in an INSERT > >>http://www.postgresql.org/docs/8.2/s...ql-insert.html > > > Yeah, i'm running the 8.2.3 version ! i didn't know about multiple > > inserts sets ! Thanks for the tip ;-) > > Ah-ha! Give it a go, it's designed for this sort of situation. Not sure > it'll manage thousands of value clauses, but working up from 10 perhaps. > I've not tested it for performance, so I'd be interesting in knowing how > it compares to your other results. Yeah, as soon as possible i will give it a try ! Thanks for the feedback ;-) > > >> 2. You can do a COPY from libpq - is it really not possible? > > > Not really but i have been testing it and inserts are flying (about > > 100000 inserts/sec) !! > > What's the problem with the COPY? Could you COPY into one table then > insert from that to your target table? The main problem comes from our "real time" needs. We are getting information as a data flow from several application and we need to store them in the DB without buffering them too much... I have been testing the COPY using several statement (i mean using copy to add only a few rows to a specific table and then using it on an other table to add a few rows, etc...) and the perf are as bad as an insert ! COPY seems to be designed to add many many rows to the same table and not a few rows to several tables... So that's my main problem. Regards, Joël |
| ||||
| >>> 1. If you're running 8.2 you can have multiple sets of values in an >>> INSERT >>> http://www.postgresql.org/docs/8.2/s...ql-insert.html >> >> >> Yeah, i'm running the 8.2.3 version ! i didn't know about multiple >> inserts sets ! Thanks for the tip ;-) > No kidding --- thanks for the tip from me as well !!! I didn't know this was possible (though I read in the docs that it is ANSI SQL standard), and I'm also having a similar situation. Two related questions: 1) What about atomicity? Is it strictly equivalent to having multiple insert statements inside a transaction? (I assume it should be) 2) What about the issue with excessive locking for foreign keys when inside a transaction? Has that issue disappeared in 8.2? And if not, would it affect similarly in the case of multiple-row inserts? In case you have no clue what I'm referring to: Say that we have a table A, with one foreign key constraint to table B --- last time I checked, there was an issue that whenever inserting or updating table A (inside a transacion), postgres sets an exclusive access lock on the referenced row on table B --- this is overkill, and the correct thing to do would be to set a read-only lock (so that no-one else can *modify or remove* the referenced row while the transaction has not been finished). This caused unnecessary deadlock situations --- even though no-one is modifying table B (which is enough to guarantee that concurrent transactions would be ok), a second transacion would fail to set the exclusive access lock, since someone already locked it. My solution was to sort the insert statements by the referenced value on table B. (I hope the above explanation clarifies what I'm trying to say) I wonder if I should still do the same if I go with a multiple-row insert instead of multiple insert statements inside a transaction. Thanks, Carlos -- ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |