Unix Technical Forum

Insert performance

This is a discussion on Insert performance within the Pgsql Performance forums, part of the PostgreSQL category; --> Dear all, After many tests and doc reading, i finally try to get help from you... Here is my ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 10:17 AM
hatman
 
Posts: n/a
Default Insert performance

Dear all,

After many tests and doc reading, i finally try to get help from
you...

Here is my problem. With some heavy insert into a simple BD (one
table, no indexes) i can't get better perf than 8000 inserts/sec. I'm
testing it using a simple C software which use libpq and which use:
- Insert prepared statement (to avoid too many request parsing on the
server)
- transaction of 100000 inserts

My server which has the following config:
- 3G RAM
- Pentium D - 64 bits, 3Ghz
- database data on hardware raid 0 disks
- x_log (WAL logs) on an other single hard drive

The server only use 30% of the CPU, 10% of disk access and not much
RAM... So i'm wondering where could be the bottle neck and why i can't
get better performance ?
I really need to use inserts and i can't change it to use COPY...

Any advice is welcome. Sorry in advance for my bad understanding of
database !

Thanks in advance.

Regards,


Joël.W

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 10:20 AM
Richard Huxton
 
Posts: n/a
Default Re: Insert performance

hatman wrote:
> Dear all,
>
> After many tests and doc reading, i finally try to get help from
> you...
>
> Here is my problem. With some heavy insert into a simple BD (one
> table, no indexes) i can't get better perf than 8000 inserts/sec. I'm
> testing it using a simple C software which use libpq and which use:
> - Insert prepared statement (to avoid too many request parsing on the
> server)
> - transaction of 100000 inserts


Are each of the INSERTs in their own transaction?

If so, you'll be limited by the speed of the disk the WAL is running on.

That means you have two main options:
1. Have multiple connections inserting simultaneously.
2. Batch your inserts together, from 10 to 10,000 per transaction.

Are either of those possible?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 10:20 AM
Richard Huxton
 
Posts: n/a
Default Re: Insert performance

joël Winteregg wrote:
> Hi Richard,
>
>>> Here is my problem. With some heavy insert into a simple BD (one
>>> table, no indexes) i can't get better perf than 8000 inserts/sec. I'm
>>> testing it using a simple C software which use libpq and which use:
>>> - Insert prepared statement (to avoid too many request parsing on the
>>> server)
>>> - transaction of 100000 inserts

>> Are each of the INSERTs in their own transaction?
>>

>
> No, as said above transactions are made of 100000 inserts...


Hmm - I read that as just meaning "inserted 100000 rows". You might find
that smaller batches provide peak performance.

>> If so, you'll be limited by the speed of the disk the WAL is running on.
>>
>> That means you have two main options:
>> 1. Have multiple connections inserting simultaneously.

>
> Yes, you're right. That what i have been testing and what provide the
> best performance ! I saw that postgresql frontend was using a lot of CPU
> and not both of them (i'm using a pentium D, dual core). To the opposit,
> the postmaster process use not much resources. Using several client,
> both CPU are used and i saw an increase of performance (about 18000
> inserts/sec).
>
> So i think my bottle neck is more the CPU speed than the disk speed,
> what do you think ?


Well, I think it's fair to say it's not disk. Let's see - the original
figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds
plausible to me for a round-trip to process a simple command - are you
running the client app on the same machine, or is it over the network?

Two other things to bear in mind:
1. If you're running 8.2 you can have multiple sets of values in an INSERT
http://www.postgresql.org/docs/8.2/s...ql-insert.html

2. You can do a COPY from libpq - is it really not possible?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 10:20 AM
Richard Huxton
 
Posts: n/a
Default Re: Insert performance

joël Winteregg wrote:
>
>>> No, as said above transactions are made of 100000 inserts...

>> Hmm - I read that as just meaning "inserted 100000 rows". You might find
>> that smaller batches provide peak performance.

>
> Ahh ok ;-) sorry for my bad english... (yeah, i have been testing
> several transaction size 10000, 20000 and 100000)


Not your bad English, my poor reading :-)

>>>> If so, you'll be limited by the speed of the disk the WAL is running on.
>>>>
>>>> That means you have two main options:
>>>> 1. Have multiple connections inserting simultaneously.
>>> Yes, you're right. That what i have been testing and what provide the
>>> best performance ! I saw that postgresql frontend was using a lot of CPU
>>> and not both of them (i'm using a pentium D, dual core). To the opposit,
>>> the postmaster process use not much resources. Using several client,
>>> both CPU are used and i saw an increase of performance (about 18000
>>> inserts/sec).
>>>
>>> So i think my bottle neck is more the CPU speed than the disk speed,
>>> what do you think ?

>> Well, I think it's fair to say it's not disk. Let's see - the original
>> figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds
>> plausible to me for a round-trip to process a simple command - are you
>> running the client app on the same machine, or is it over the network?

>
> I did both test. On the local machine (using UNIX sockets) i can reach
> 18000 insert/sec with 10 clients and prepared statements. The same test
> using clients on the remote machine provide me 13000 inserts/sec.


OK, so we know what the overhead for network connections is.

> Now, with multiple client (multi-threaded inserts) my both CPU are quite
> well used (both arround 90%) so i maybe think that disk speeds are now
> my bottleneck. What do you think ? or maybe i will need a better CPU ?
>
>> Two other things to bear in mind:
>> 1. If you're running 8.2 you can have multiple sets of values in an INSERT
>> http://www.postgresql.org/docs/8.2/s...ql-insert.html

>
> Yeah, i'm running the 8.2.3 version ! i didn't know about multiple
> inserts sets ! Thanks for the tip ;-)


Ah-ha! Give it a go, it's designed for this sort of situation. Not sure
it'll manage thousands of value clauses, but working up from 10 perhaps.
I've not tested it for performance, so I'd be interesting in knowing how
it compares to your other results.

>> 2. You can do a COPY from libpq - is it really not possible?
>>

>
> Not really but i have been testing it and inserts are flying (about
> 100000 inserts/sec) !!


What's the problem with the COPY? Could you COPY into one table then
insert from that to your target table?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 10:20 AM
Andreas Kostyrka
 
Posts: n/a
Default Re: Insert performance

* Richard Huxton <dev@archonet.com> [070306 12:22]:
> >>2. You can do a COPY from libpq - is it really not possible?
> >>

> >Not really but i have been testing it and inserts are flying (about
> >100000 inserts/sec) !!

>
> What's the problem with the COPY? Could you COPY into one table then insert from that to your target table?

Well, there are some issues. First your client needs to support it.
E.g. psycopg2 supports only some specific CSV formatting in it's
methods. (plus I had sometimes random psycopg2 crashes, but guarding against
these is cheap compared to the speedup from COPY versus INSERT)
Plus you need to be sure that your data will apply cleanly (which in
my app was not the case), or you need to code a fallback that
localizes the row that doesn't work.

And the worst thing is, that it ignores RULES on the tables, which
sucks if you use them (e.g. table partitioning).

Andreas

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 10:20 AM
Richard Huxton
 
Posts: n/a
Default Re: Insert performance

Andreas Kostyrka wrote:
> * Richard Huxton <dev@archonet.com> [070306 12:22]:
>>>> 2. You can do a COPY from libpq - is it really not possible?
>>>>
>>> Not really but i have been testing it and inserts are flying (about
>>> 100000 inserts/sec) !!

>> What's the problem with the COPY? Could you COPY into one table then insert from that to your target table?

> Well, there are some issues. First your client needs to support it.
> E.g. psycopg2 supports only some specific CSV formatting in it's
> methods. (plus I had sometimes random psycopg2 crashes, but guarding against
> these is cheap compared to the speedup from COPY versus INSERT)
> Plus you need to be sure that your data will apply cleanly (which in
> my app was not the case), or you need to code a fallback that
> localizes the row that doesn't work.
>
> And the worst thing is, that it ignores RULES on the tables, which
> sucks if you use them (e.g. table partitioning).


Ah, but two things deal with these issues:
1. Joel is using libpq
2. COPY into a holding table, tidy data and INSERT ... SELECT

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 10:20 AM
Andreas Kostyrka
 
Posts: n/a
Default Re: Insert performance

* Richard Huxton <dev@archonet.com> [070306 13:47]:
> Andreas Kostyrka wrote:
> >* Richard Huxton <dev@archonet.com> [070306 12:22]:
> >>>>2. You can do a COPY from libpq - is it really not possible?
> >>>>
> >>>Not really but i have been testing it and inserts are flying (about
> >>>100000 inserts/sec) !!
> >>What's the problem with the COPY? Could you COPY into one table then insert from that to your target table?

> >Well, there are some issues. First your client needs to support it.
> >E.g. psycopg2 supports only some specific CSV formatting in it's
> >methods. (plus I had sometimes random psycopg2 crashes, but guarding against
> >these is cheap compared to the speedup from COPY versus INSERT)
> >Plus you need to be sure that your data will apply cleanly (which in
> >my app was not the case), or you need to code a fallback that
> >localizes the row that doesn't work.
> >And the worst thing is, that it ignores RULES on the tables, which
> >sucks if you use them (e.g. table partitioning).

>
> Ah, but two things deal with these issues:
> 1. Joel is using libpq
> 2. COPY into a holding table, tidy data and INSERT ... SELECT


Clearly COPY is the way for bulk loading data, BUT you asked, so I
wanted to point out some problems and brittle points with COPY.

(and the copy into the holding table doesn't solve completly the
problem with the dirty inconsistent data)

Andreas

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 10:20 AM
hatman
 
Posts: n/a
Default Re: Insert performance

Hi Andreas,

Thanks for the info about COPY !!

On Mar 6, 1:23 pm, andr...@kostyrka.org (Andreas Kostyrka) wrote:
> * Richard Huxton <d...@archonet.com> [070306 12:22]:> >>2. You can do a COPY from libpq - is it really not possible?
>
> > >Not really but i have been testing it and inserts are flying (about
> > >100000 inserts/sec) !!

>
> > What's the problem with the COPY? Could you COPY into one table then insert from that to your target table?

>
> Well, there are some issues. First your client needs to support it.
> E.g. psycopg2 supports only some specific CSV formatting in it's
> methods. (plus I had sometimes random psycopg2 crashes, but guarding against
> these is cheap compared to the speedup from COPY versus INSERT)
> Plus you need to be sure that your data will apply cleanly (which in
> my app was not the case), or you need to code a fallback that
> localizes the row that doesn't work.
>
> And the worst thing is, that it ignores RULES on the tables, which
> sucks if you use them (e.g. table partitioning).


Ok, but what about constraints (foreign keys and SERIAL id) using a
copy statement ? do we need to handle auto-generated id (SERIAL)
manually ?

Thanks for your feedback.

Regards,

Joël

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 10:21 AM
hatman
 
Posts: n/a
Default Re: Insert performance

Hi Richard,

>
> >>> No, as said above transactions are made of 100000 inserts...
> >> Hmm - I read that as just meaning "inserted 100000 rows". You might find
> >> that smaller batches provide peak performance.

>
> > Ahh ok ;-) sorry for my bad english... (yeah, i have been testing
> > several transaction size 10000, 20000 and 100000)

>
> Not your bad English, my poor reading :-)
>
>
>
> >>>> If so, you'll be limited by the speed of the disk the WAL is runningon.

>
> >>>> That means you have two main options:
> >>>> 1. Have multiple connections inserting simultaneously.
> >>> Yes, you're right. That what i have been testing and what provide the
> >>> best performance ! I saw that postgresql frontend was using a lot of CPU
> >>> and not both of them (i'm using a pentium D, dual core). To the opposit,
> >>> the postmaster process use not much resources. Using several client,
> >>> both CPU are used and i saw an increase of performance (about 18000
> >>> inserts/sec).

>
> >>> So i think my bottle neck is more the CPU speed than the disk speed,
> >>> what do you think ?
> >> Well, I think it's fair to say it's not disk. Let's see - the original
> >> figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds
> >> plausible to me for a round-trip to process a simple command - are you
> >> running the client app on the same machine, or is it over the network?

>
> > I did both test. On the local machine (using UNIX sockets) i can reach
> > 18000 insert/sec with 10 clients and prepared statements. The same test
> > using clients on the remote machine provide me 13000 inserts/sec.

>
> OK, so we know what the overhead for network connections is.
>
> > Now, with multiple client (multi-threaded inserts) my both CPU are quite
> > well used (both arround 90%) so i maybe think that disk speeds are now
> > my bottleneck. What do you think ? or maybe i will need a better CPU ?

>
> >> Two other things to bear in mind:
> >> 1. If you're running 8.2 you can have multiple sets of values in an INSERT
> >>http://www.postgresql.org/docs/8.2/s...ql-insert.html

>
> > Yeah, i'm running the 8.2.3 version ! i didn't know about multiple
> > inserts sets ! Thanks for the tip ;-)

>



> Ah-ha! Give it a go, it's designed for this sort of situation. Not sure
> it'll manage thousands of value clauses, but working up from 10 perhaps.
> I've not tested it for performance, so I'd be interesting in knowing how
> it compares to your other results.


Yeah, as soon as possible i will give it a try ! Thanks for the
feedback ;-)

>
> >> 2. You can do a COPY from libpq - is it really not possible?

>
> > Not really but i have been testing it and inserts are flying (about
> > 100000 inserts/sec) !!

>
> What's the problem with the COPY? Could you COPY into one table then
> insert from that to your target table?


The main problem comes from our "real time" needs. We are getting
information as a data flow from several application and we need to
store them in the DB without buffering them too much...
I have been testing the COPY using several statement (i mean using
copy to add only a few rows to a specific table and then using it on
an other table to add a few rows, etc...) and the perf are as bad as
an insert !
COPY seems to be designed to add many many rows to the same table and
not a few rows to several tables... So that's my main problem.

Regards,

Joël

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-19-2008, 10:21 AM
Carlos Moreno
 
Posts: n/a
Default Re: Insert performance


>>> 1. If you're running 8.2 you can have multiple sets of values in an
>>> INSERT
>>> http://www.postgresql.org/docs/8.2/s...ql-insert.html

>>
>>
>> Yeah, i'm running the 8.2.3 version ! i didn't know about multiple
>> inserts sets ! Thanks for the tip ;-)

>


No kidding --- thanks for the tip from me as well !!!

I didn't know this was possible (though I read in the docs that it is ANSI
SQL standard), and I'm also having a similar situation.

Two related questions:

1) What about atomicity? Is it strictly equivalent to having multiple
insert
statements inside a transaction? (I assume it should be)

2) What about the issue with excessive locking for foreign keys when
inside a transaction? Has that issue disappeared in 8.2? And if not,
would it affect similarly in the case of multiple-row inserts?

In case you have no clue what I'm referring to:

Say that we have a table A, with one foreign key constraint to table
B --- last time I checked, there was an issue that whenever inserting
or updating table A (inside a transacion), postgres sets an exclusive
access lock on the referenced row on table B --- this is overkill, and
the correct thing to do would be to set a read-only lock (so that
no-one else can *modify or remove* the referenced row while the
transaction has not been finished).

This caused unnecessary deadlock situations --- even though no-one
is modifying table B (which is enough to guarantee that concurrent
transactions would be ok), a second transacion would fail to set the
exclusive access lock, since someone already locked it.

My solution was to sort the insert statements by the referenced value
on table B.

(I hope the above explanation clarifies what I'm trying to say)

I wonder if I should still do the same if I go with a multiple-row
insert instead of multiple insert statements inside a transaction.

Thanks,

Carlos
--


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:43 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com