This is a discussion on general PG network slowness (possible cure) (repost) within the Pgsql Performance forums, part of the PostgreSQL category; --> I set up pg to replace a plain gdbm database for my application. But even running to the same ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I set up pg to replace a plain gdbm database for my application. But even running to the same machine, via a unix socket * the pg database ran 100 times slower Across the net it was * about 500 to 1000 times slower than local gdbm with no cpu use to speak of. I'd heard that networked databases are slow. I might have left it at that if curiosity hadn't led me to write a network server for gdbm databases, and talk to _that_ just to get a comparison. Lo and behold and smack me with a corncob if it wasn't _slower_ than pg. On a whim I mapped the network bandwidth per packet size with the NPtcp suite, and got surprising answers .. at 1500B, naturally, the bandwidth was the full 10Mb/s (minus overheads, say 8.5Mb/s) of my pathetic little local net. At 100B the bandwidth available was only 25Kb/s. At 10B, you might as well use tin cans and taut string instead. I also mapped the network flows using ntop, and yes, the average packet size for both gdbm and pg in one direction was only about 100B or so. That's it! Clearly there are a lot of short queries going out and the answers were none too big either ( I had a LIMIT 1 in all my PG queries). About 75% of traffic was in the 64-128B range while my application was running, with the peak bandwidth in that range being about 75-125Kb/s (and I do mean bits, not bytes). Soooo ... I took a look at my implementation of remote gdbm, and did a very little work to aggregate outgoing transmissions together into lumps. Three lines added in two places. At the level of the protocol where I could tell how long the immediate conversation segment would be, I "corked" the tcp socket before starting the segment and "uncorked" it after the segment (for "cork", see tcp(7), setsockopt(2) and TCP_CORK in linux). Surprise, ... I got a speed up of hundreds of times. The same application that crawled under my original rgdbm implementation and under PG now maxed out the network bandwidth at close to a full 10Mb/s and 1200 pkts/s, at 10% CPU on my 700MHz client, and a bit less on the 1GHz server. So * Is that what is holding up postgres over the net too? Lots of tiny packets? And if so * can one fix it the way I fixed it for remote gdbm? The speedup was hundreds of times. Can someone point me at the relevant bits of pg code? A quick look seems to say that fe-*.c is interesting. I need to find where the actual read and write on the conn->sock is done. Very illuminating gnuplot outputs available on request. Peter ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Peter T. Breuer wrote: > I set up pg to replace a plain gdbm database for my application. But > even running to the same machine, via a unix socket > > * the pg database ran 100 times slower For what operations? Bulk reads? 19-way joins? > Across the net it was > > * about 500 to 1000 times slower than local gdbm > > with no cpu use to speak of. Disk-intensive or memory intensive? > I'd heard that networked databases are slow. I might have left it at > that if curiosity hadn't led me to write a network server for gdbm > databases, and talk to _that_ just to get a comparison. > > Lo and behold and smack me with a corncob if it wasn't _slower_ than pg. > > On a whim I mapped the network bandwidth per packet size with the NPtcp > suite, and got surprising answers .. at 1500B, naturally, the bandwidth > was the full 10Mb/s (minus overheads, say 8.5Mb/s) of my pathetic little > local net. At 100B the bandwidth available was only 25Kb/s. At 10B, > you might as well use tin cans and taut string instead. This sounds like you're testing a single connection. You would expect "dead time" to dominate in that scenario. What happens when you have 50 simultaneous connections? Or do you think it's just packet overhead? > I also mapped the network flows using ntop, and yes, the average packet > size for both gdbm and pg in one direction was only about 100B or > so. That's it! Clearly there are a lot of short queries going out and > the answers were none too big either ( I had a LIMIT 1 in all my PG > queries). I'm not sure that 100B query-results are usually the bottleneck. Why would you have LIMIT 1 on all your queries? > About 75% of traffic was in the 64-128B range while my application was > running, with the peak bandwidth in that range being about 75-125Kb/s > (and I do mean bits, not bytes). None of this sounds like typical database traffic to me. Yes, there are lots of small result-sets, but there are also typically larger (several kilobytes) to much larger (10s-100s KB). > Soooo ... I took a look at my implementation of remote gdbm, and did > a very little work to aggregate outgoing transmissions together into > lumps. Three lines added in two places. At the level of the protocol > where I could tell how long the immediate conversation segment would be, > I "corked" the tcp socket before starting the segment and "uncorked" it > after the segment (for "cork", see tcp(7), setsockopt(2) and TCP_CORK in > linux). I'm a bit puzzled, because I'd have thought the standard Nagle algorithm would manage this gracefully enough for short-query cases. There's no way (that I know of) for a backend to handle more than one query at a time. > Surprise, ... I got a speed up of hundreds of times. The same application > that crawled under my original rgdbm implementation and under PG now > maxed out the network bandwidth at close to a full 10Mb/s and 1200 > pkts/s, at 10% CPU on my 700MHz client, and a bit less on the 1GHz > server. > > So > > * Is that what is holding up postgres over the net too? Lots of tiny > packets? I'm not sure your setup is typical, interesting though the figures are. Google a bit for pg_bench perhaps and see if you can reproduce the effect with a more typical load. I'd be interested in being proved wrong. > And if so > > * can one fix it the way I fixed it for remote gdbm? > > The speedup was hundreds of times. Can someone point me at the relevant > bits of pg code? A quick look seems to say that fe-*.c is > interesting. I need to find where the actual read and write on the > conn->sock is done. You'll want to look in backend/libpq and interfaces/libpq I think (although I'm not a developer). -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Fri, May 25, 2007 at 10:50:58AM +0200, Peter T. Breuer wrote: > I set up pg to replace a plain gdbm database for my application. Postgres and gdbm are completely different. You want to rethink your queries so each does more work, instead of running a zillion of them over the network. /* Steinar */ -- Homepage: http://www.sesse.net/ ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| "Also sprach Richard Huxton:" [Charset ISO-8859-1 unsupported, filtering to ASCII...] > Peter T. Breuer wrote: > > I set up pg to replace a plain gdbm database for my application. But > > even running to the same machine, via a unix socket > > > > * the pg database ran 100 times slower > > For what operations? Bulk reads? 19-way joins? The only operations being done are simple "find the row with this key", or "update the row with this key". That's all. The queries are not an issue (though why the PG thread choose to max out cpu when it gets the chance to do so through a unix socket, I don't know). > > Across the net it was > > > > * about 500 to 1000 times slower than local gdbm > > > > with no cpu use to speak of. > > Disk-intensive or memory intensive? There is no disk as such... it's running on a ramdisk at the server end. But assuming you mean i/o, i/o was completely stalled. Everything was idle, all waiting on the net. > > On a whim I mapped the network bandwidth per packet size with the NPtcp > > suite, and got surprising answers .. at 1500B, naturally, the bandwidth > > was the full 10Mb/s (minus overheads, say 8.5Mb/s) of my pathetic little > > local net. At 100B the bandwidth available was only 25Kb/s. At 10B, > > you might as well use tin cans and taut string instead. > > This sounds like you're testing a single connection. You would expect > "dead time" to dominate in that scenario. What happens when you have 50 Indeed, it is single, because that's my application. I don't have 50 simultaneous connections. The use of the database is as a permanent storage area for the results of previous analyses (static analysis of the linux kernel codes) from a single client. Multiple threads accessing at the same time might help keep the network drivers busier, which would help. They would always see their buffers filling at an even rate and be able to send out groups of packets at once. > simultaneous connections? Or do you think it's just packet overhead? It's not quite overhead in the sense of the logical layer. It's a physical layer thing. I replied in another mail on this thread, but in summary, tcp behaves badly with small packets on ethernet, even on a dedicated line (as this was). One needs to keep it on a tight rein. > > I also mapped the network flows using ntop, and yes, the average packet > > size for both gdbm and pg in one direction was only about 100B or > > so. That's it! Clearly there are a lot of short queries going out and > > the answers were none too big either ( I had a LIMIT 1 in all my PG > > queries). > > I'm not sure that 100B query-results are usually the bottleneck. > Why would you have LIMIT 1 on all your queries? Because there is always only one answer to the query, according to the logic. So I can always tell the database manager to stop looking after one, which will always help it. > > About 75% of traffic was in the 64-128B range while my application was > > running, with the peak bandwidth in that range being about 75-125Kb/s > > (and I do mean bits, not bytes). > > None of this sounds like typical database traffic to me. Yes, there are > lots of small result-sets, but there are also typically larger (several > kilobytes) to much larger (10s-100s KB). There's none here. > > Soooo ... I took a look at my implementation of remote gdbm, and did > > a very little work to aggregate outgoing transmissions together into > > lumps. Three lines added in two places. At the level of the protocol > > where I could tell how long the immediate conversation segment would be, > > I "corked" the tcp socket before starting the segment and "uncorked" it > > after the segment (for "cork", see tcp(7), setsockopt(2) and TCP_CORK in > > linux). > > I'm a bit puzzled, because I'd have thought the standard Nagle algorithm > would manage this gracefully enough for short-query cases. There's no On the contrary, Nagle is also often wrong here because it will delay sending in order to accumulate more data into buffers when only a little has arrived, then give up when no more data arrives to be sent out, then send out the (short) packet anyway, late. There's no other traffic apart from my (single thread) application. What we want is to direct the sending exactly,n this situation saying when to not send, and when to send. Disable Nagle for a start, use async read (noblock), and sync write, with sends from the socket blocked from initiation of a message until the whole message is ready to be sent out. Sending the message piecemeal just hurts too. > way (that I know of) for a backend to handle more than one query at a time. That's not the scenario. > > Surprise, ... I got a speed up of hundreds of times. The same application > > that crawled under my original rgdbm implementation and under PG now > > maxed out the network bandwidth at close to a full 10Mb/s and 1200 > > pkts/s, at 10% CPU on my 700MHz client, and a bit less on the 1GHz > > server. > > > > So > > > > * Is that what is holding up postgres over the net too? Lots of tiny > > packets? > > I'm not sure your setup is typical, interesting though the figures are. > Google a bit for pg_bench perhaps and see if you can reproduce the > effect with a more typical load. I'd be interested in being proved wrong. But the load is typical HERE. The application works well against gdbm and I was hoping to see speedup from using a _real_ full-fledged DB instead. Well, at least it's very helpful for debugging. > > And if so > > > > * can one fix it the way I fixed it for remote gdbm? > > > > The speedup was hundreds of times. Can someone point me at the relevant > > bits of pg code? A quick look seems to say that fe-*.c is > > interesting. I need to find where the actual read and write on the > > conn->sock is done. > > You'll want to look in backend/libpq and interfaces/libpq I think > (although I'm not a developer). I'll look around there. Specific directions are greatly appreciated. Thanks. Peter ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Peter T. Breuer wrote: > > The only operations being done are simple "find the row with this key", > or "update the row with this key". That's all. The queries are not an > issue (though why the PG thread choose to max out cpu when it gets the > chance to do so through a unix socket, I don't know). > There is no disk as such... it's running on a ramdisk at the server > end. But assuming you mean i/o, i/o was completely stalled. Everything > was idle, all waiting on the net. > Indeed, it is single, because that's my application. I don't have > 50 simultaneous connections. The use of the database is as a permanent > storage area for the results of previous analyses (static analysis of > the linux kernel codes) from a single client. >> I'm not sure your setup is typical, interesting though the figures are. >> Google a bit for pg_bench perhaps and see if you can reproduce the >> effect with a more typical load. I'd be interested in being proved wrong. > > But the load is typical HERE. The application works well against gdbm > and I was hoping to see speedup from using a _real_ full-fledged DB > instead. I'm not sure you really want a full RDBMS. If you only have a single connection and are making basic key-lookup queries then 90% of PostgreSQL's code is just getting in your way. Sounds to me like gdbm (or one of its alternatives) is a good match for you. Failing that, sqlite is probably the next lowest-overhead solution. Of course, if you want to have multiple clients interacting and performing complex 19-way joins on gigabyte-sized tables with full-text indexing and full transaction control then you *do* want a RDBMS. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| "Also sprach Richard Huxton:" > I'm not sure you really want a full RDBMS. If you only have a single > connection and are making basic key-lookup queries then 90% of > PostgreSQL's code is just getting in your way. Sounds to me like gdbm Yep - I could happily tell it not to try and compile a special lookup scheme each time, for example! (how that?). I could presumably also help it by preloading the commands I will run and sending over the params only with a "do a no. 17 now!". > (or one of its alternatives) is a good match for you. Failing that, > sqlite is probably the next lowest-overhead solution. Not a bad idea. but PG _will_ be useful when folk come to analyse the result of the analyses being done. What is slow is getting the data into the database now via simple store, fetch and update. > Of course, if you want to have multiple clients interacting and > performing complex 19-way joins on gigabyte-sized tables with full-text Well, the dbs are in the tens of MB from a single run over a single file (i.e analysis of a single 30KLOC source). The complete analysis space is something like 4000 times that, for 4300 C files in the linux kernel source. And then there is all the linux kernel versions. Then there is godzilla and apache source .. > indexing and full transaction control then you *do* want a RDBMS. We want one anyway. The problem is filling the data and the simple fetch and update queries on it. I really think it would be worthwhile getting some developer to tell me where the network send is done in PG. Peter ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| "Peter T. Breuer" <ptb@inv.it.uc3m.es> writes: > Soooo ... I took a look at my implementation of remote gdbm, and did > a very little work to aggregate outgoing transmissions together into > lumps. We do that already --- for a simple query/response such as you are describing, each query cycle will involve one physical client->server message followed by one physical server->client message. The only way to aggregate more is for the application code to merge queries together. Migrating a dbm-style application to a SQL database is often a real pain, precisely because the application is designed to a mindset of "fetch one record, manipulate it, update it", where "fetch" and "update" are assumed to be too stupid to do any of the work for you. The way to get high performance with a SQL engine is to push as much of the work as you can to the database side, and let the engine process multiple records per query; and that can easily mean rewriting the app from the ground up :-( regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Peter T. Breuer wrote: > "Also sprach Richard Huxton:" >> I'm not sure you really want a full RDBMS. If you only have a single >> connection and are making basic key-lookup queries then 90% of >> PostgreSQL's code is just getting in your way. Sounds to me like gdbm > > Yep - I could happily tell it not to try and compile a special lookup > scheme each time, for example! (how that?). I could presumably also > help it by preloading the commands I will run and sending over the > params only with a "do a no. 17 now!". PREPARE/EXECUTE (or the equivalent libpq functions). Also - if you can have multiple connections to the DB you should be able to have several queries running at once. >> (or one of its alternatives) is a good match for you. Failing that, >> sqlite is probably the next lowest-overhead solution. > > Not a bad idea. but PG _will_ be useful when folk come to analyse the > result of the analyses being done. What is slow is getting the data > into the database now via simple store, fetch and update. I'd have an hourly/daily bulk-load running from the simple system into PG. If you have to search all the data from your app that's not practical of course. >> Of course, if you want to have multiple clients interacting and >> performing complex 19-way joins on gigabyte-sized tables with full-text > > Well, the dbs are in the tens of MB from a single run over a single > file (i.e analysis of a single 30KLOC source). The complete analysis > space is something like 4000 times that, for 4300 C files in the linux > kernel source. And then there is all the linux kernel versions. Then > there is godzilla and apache source .. If you're doing some sort of token analysis on source-code you probably want to look into how tsearch2 / trigram / Gist+GIN indexes work. It might be that you're doing work in your app that the DB can handle for you. >> indexing and full transaction control then you *do* want a RDBMS. > > We want one anyway. The problem is filling the data and the simple > fetch and update queries on it. OK > I really think it would be worthwhile getting some developer to tell me > where the network send is done in PG. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Peter T. Breuer escribió: > I really think it would be worthwhile getting some developer to tell me > where the network send is done in PG. See src/backend/libpq/pqcomm.c (particularly internal_flush()). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| "Also sprach Alvaro Herrera:" > > I really think it would be worthwhile getting some developer to tell me > > where the network send is done in PG. > > See src/backend/libpq/pqcomm.c (particularly internal_flush()). Yes. Thanks. That looks like it. It calls secure_write continually until the buffer is empty. Secure_write is located ibe-secure.c, but I'm not using ssl, so the call reduces to just n = send(port->sock, ptr, len, 0); And definitely all those could be grouped if there are several to do. But under normal circumstances the send will be pushing against a lttle resistance (the copy to the driver/protocol stack buffer is faster than the physical network send, by a ratio of GB/s to MB/s, or 1000 to 1), and thus all these sends will probably complete as a single unit once they have been started. It's worth a try. I thought first this may be too low level, but it looks as though internal_flush is only triggered when some other buffer is full, or deliberately, so it may be useful to block until it fires. I'll try it. Peter ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |