This is a discussion on Linux/PostgreSQL scalability issue - problem with 8 cores within the Pgsql Performance forums, part of the PostgreSQL category; --> Jakub Ouhrabka wrote: > We've tried several times to get stacktrace from some of the running > backends during ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Jakub Ouhrabka wrote: > We've tried several times to get stacktrace from some of the running > backends during spikes, we got always this: > > 0x00002b005d00a9a9 in semop () from /lib/libc.so.6 > #0 0x00002b005d00a9a9 in semop () from /lib/libc.so.6 > #1 0x000000000054fe53 in PGSemaphoreLock (sema=0x2b00a04e5090, > interruptOK=0 '\0') at pg_sema.c:411 > #2 0x0000000000575d95 in LWLockAcquire (lockid=SInvalLock, > mode=LW_EXCLUSIVE) at lwlock.c:455 > #3 0x000000000056fbfe in ReceiveSharedInvalidMessages > (invalFunction=0x5e9a30 <LocalExecuteInvalidationMessage>, > resetFunction=0x5e9df0 <InvalidateSystemCaches>) at sinval.c:159 > #4 0x0000000000463505 in StartTransactionCommand () at xact.c:1439 > #5 0x000000000056fa4b in ProcessCatchupEvent () at sinval.c:347 > #6 0x000000000056fb20 in CatchupInterruptHandler > (postgres_signal_arg=<value optimized out>) at sinval.c:221 > #7 0x00002b005cf6f110 in killpg () from /lib/libc.so.6 > #8 0x0000000000000000 in ?? () Perhaps it would make sense to try to take the "fast path" in SIDelExpiredDataEntries with only a shared lock rather than exclusive. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Jakub Ouhrabka <kuba@comgate.cz> writes: > We've tried several times to get stacktrace from some of the running > backends during spikes, we got always this: > 0x00002b005d00a9a9 in semop () from /lib/libc.so.6 > #0 0x00002b005d00a9a9 in semop () from /lib/libc.so.6 > #1 0x000000000054fe53 in PGSemaphoreLock (sema=0x2b00a04e5090, > interruptOK=0 '\0') at pg_sema.c:411 > #2 0x0000000000575d95 in LWLockAcquire (lockid=SInvalLock, > mode=LW_EXCLUSIVE) at lwlock.c:455 > #3 0x000000000056fbfe in ReceiveSharedInvalidMessages > (invalFunction=0x5e9a30 <LocalExecuteInvalidationMessage>, > resetFunction=0x5e9df0 <InvalidateSystemCaches>) at sinval.c:159 > #4 0x0000000000463505 in StartTransactionCommand () at xact.c:1439 > #5 0x000000000056fa4b in ProcessCatchupEvent () at sinval.c:347 > #6 0x000000000056fb20 in CatchupInterruptHandler > (postgres_signal_arg=<value optimized out>) at sinval.c:221 CatchupInterruptHandler, eh? That seems to let NOTIFY off the hook, and instead points in the direction of sinval processing; which is to say, propagation of changes to system catalogs. Does your app create and destroy a tremendous number of temp tables, or anything else in the way of frequent DDL commands? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Jakub Ouhrabka <jakub.ouhrabka@comgate.cz> writes: >>> Does your app create and destroy a tremendous number of temp tables, >>> or anything else in the way of frequent DDL commands? > Hmm. I can't think of anything like this. Maybe there are few backends > which create temp tables but not tremendous number. I don't think our > applications issue DDL statements either. Can LOCK TABLE IN EXCLUSIVE > MODE cause this? No. I did some experimenting to see exactly how large the sinval message buffer is in today's terms, and what I find is that about 22 cycles of create temp table foo (f1 int, f2 text); drop table foo; is enough to force a CatchupInterrupt to a sleeping backend. This case is a bit more complex than it appears since the text column forces the temp table to have a toast table; but even with only fixed-width columns, if you were creating one temp table a second that would be plenty to explain once-a-minute-or-so CatchupInterrupt processing. And if you've got a lot of backends that normally *don't* wake up that often, then they'd be sitting and failing to absorb the sinval traffic in any more timely fashion. So I'm betting that we've got the source of the spike identified. You could check this theory out by strace'ing some of the idle backends and seeing if their activity spikes are triggered by receipt of SIGUSR1 signals. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > Perhaps it would make sense to try to take the "fast path" in > SIDelExpiredDataEntries with only a shared lock rather than exclusive. I think the real problem here is that sinval catchup processing is well designed to create contention :-(. Once we've decided that the message queue is getting too full, we SIGUSR1 all the backends at once (or as fast as the postmaster can do it anyway), then they all go off and try to touch the sinval queue. Backends that haven't awoken even once since the last time will have to process the entire queue contents, and they're all trying to do that at the same time. What's worse, they take and release the SInvalLock once for each message they take off the queue. This isn't so horrid for one-core machines (since each process will monopolize the CPU for probably less than one timeslice while it's catching up) but it's pretty obvious where all the contention is coming from on an 8-core. Some ideas for improving matters: 1. Try to avoid having all the backends hit the queue at once. Instead of SIGUSR1'ing everybody at the same time, maybe hit only the process with the oldest message pointer, and have him hit the next oldest after he's done reading the queue. 2. Try to take more than one message off the queue per SInvalLock cycle. (There is a tuning tradeoff here, since it would mean holding the lock for longer at a time.) 3. Try to avoid having every backend run SIDelExpiredDataEntries every time through ReceiveSharedInvalidMessages. It's not critical to delete entries until the queue starts getting full --- maybe we could rejigger the logic so it only happens once when somebody notices the queue is getting full, or so that only the guy(s) who had nextMsgNum == minMsgNum do it, or something like that? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| > You could check this theory > out by strace'ing some of the idle backends and seeing if their > activity spikes are triggered by receipt of SIGUSR1 signals. Yes, I can confirm that it's triggered by SIGUSR1 signals. If I understand it correctly we have following choices now: 1) Use only 2 cores (out of 8 cores) 2) Lower the number of idle backends - at least force backends to do something at different times to eliminate spikes - is "select 1" enough to force processing the queue? 3) Is there any chance of this being fixed/improved in 8.3 or even 8.2? It's a (performance) bug from our point of view. I realize we're first who noticed it and it's not typical use case to have so many idle backends. But large installation with connection pooling is something similar and that's not that uncommon - one active backend doing DDL can then cause unexpected spikes during otherwise quiet hours... 4) Sure we'll try to reduce the number of DDL statements (which in fact we're not sure where exactly they are comming from) but I guess it would only longer the time between spikes but not make them any smoother. Any other suggestions? Thanks, Kuba ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Jakub Ouhrabka <kuba@comgate.cz> writes: > Yes, I can confirm that it's triggered by SIGUSR1 signals. OK, that confirms the theory that it's sinval-queue contention. > If I understand it correctly we have following choices now: > 1) Use only 2 cores (out of 8 cores) > 2) Lower the number of idle backends - at least force backends to do > something at different times to eliminate spikes - is "select 1" enough > to force processing the queue? Yeah, if you could get your clients to issue trivial queries every few seconds (not all at the same time) the spikes should go away. If you don't want to change your clients, one possible amelioration is to reduce the signaling threshold in SIInsertDataEntry --- instead of 70% of MAXNUMMESSAGES, maybe signal at 10% or 20%. That would make the spikes more frequent but smaller, which might help ... or not. > 3) Is there any chance of this being fixed/improved in 8.3 or even 8.2? I doubt we'd risk destabilizing 8.3 at this point, for a problem that affects so few people; let alone back-patching into 8.2. There are some other known performance problems in the sinval signaling (for instance, that a queue overflow results in cache resets for all backends, not only the slowest), so I think addressing all of them at once would be the thing to do. That would be a large enough patch that it would certainly need to go through beta testing before I'd want to inflict it on the world... This discussion has raised the priority of the problem in my mind, so I'm thinking it should be worked on in 8.4; but it's too late for 8.3. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Hi Tom, > I doubt we'd risk destabilizing 8.3 at this point, for a problem that > affects so few people; let alone back-patching into 8.2. understand. > OK, that confirms the theory that it's sinval-queue contention. We'we tried hard to identify what's the cause of filling sinval-queue. We went through query logs as well as function bodies stored in the database. We were not able to find any DDL, temp table creations etc. We did following experiment: stop one of our clients, so there started to be queue of events (aka rows in db) for it to process. Then we started the client again, it started processing the queue - that means calling simple selects, updates and complex plpgsql function(s). And at this moment, the spike started even it shouldn't start to meet usual periodicity. It's consistent. We are pretty sure that this client is not doing any DDL... What should we look for to find the cause? Thanks for any hints, Kuba ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Jakub Ouhrabka <kuba@comgate.cz> writes: > We'we tried hard to identify what's the cause of filling sinval-queue. > We went through query logs as well as function bodies stored in the > database. We were not able to find any DDL, temp table creations etc. Strange. The best idea that comes to mind is to add some debugging code to SendSharedInvalidMessage to log the content of each message that's sent out. That would at least tell us *what* is going into the queue, even if not directly *why*. Try something like (untested) void SendSharedInvalidMessage(SharedInvalidationMessage *msg) { bool insertOK; + elog(LOG, "sending inval msg %d %u %u %u %u %u", + msg->cc.id, + msg->cc.tuplePtr.ip_blkid.bi_hi, + msg->cc.tuplePtr.ip_blkid.bi_lo, + msg->cc.tuplePtr.ip_posid, + msg->cc.dbId, + msg->cc.hashValue); + LWLockAcquire(SInvalLock, LW_EXCLUSIVE); insertOK = SIInsertDataEntry(shmInvalBuffer, msg); LWLockRelease(SInvalLock); regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Hi Tom, > Strange. The best idea that comes to mind is to add some debugging > code to SendSharedInvalidMessage to log the content of each message > that's sent out. That would at least tell us *what* is going into > the queue, even if not directly *why*. we've patched postgresql and run one of our plpgsql complex procedures. There are many of sinval messages - log output is below. What does it mean? Thanks, Kuba LOG: sending inval msg 30 0 26 13 30036 4294936593 LOG: sending inval msg 29 0 26 13 30036 337030170 LOG: sending inval msg 30 0 25 46 30036 4294936593 LOG: sending inval msg 29 0 25 46 30036 337030170 LOG: sending inval msg 30 0 26 13 30036 4294936593 LOG: sending inval msg 29 0 26 13 30036 337030170 LOG: sending inval msg 30 0 25 45 30036 4294936595 LOG: sending inval msg 29 0 25 45 30036 2019111801 LOG: sending inval msg 30 0 26 11 30036 4294936595 LOG: sending inval msg 29 0 26 11 30036 2019111801 LOG: sending inval msg 30 0 25 44 30036 4294936597 LOG: sending inval msg 29 0 25 44 30036 3703878920 LOG: sending inval msg 30 0 26 10 30036 4294936597 LOG: sending inval msg 29 0 26 10 30036 3703878920 LOG: sending inval msg 30 0 26 9 30036 4294936616 LOG: sending inval msg 29 0 26 9 30036 3527122063 LOG: sending inval msg 30 0 25 43 30036 4294936616 LOG: sending inval msg 29 0 25 43 30036 3527122063 LOG: sending inval msg 30 0 26 9 30036 4294936616 LOG: sending inval msg 29 0 26 9 30036 3527122063 LOG: sending inval msg 30 0 25 41 30036 4294936618 LOG: sending inval msg 29 0 25 41 30036 2126866956 LOG: sending inval msg 30 0 26 7 30036 4294936618 LOG: sending inval msg 29 0 26 7 30036 2126866956 LOG: sending inval msg 30 0 25 40 30036 4294936620 LOG: sending inval msg 29 0 25 40 30036 1941919314 LOG: sending inval msg 30 0 26 5 30036 4294936620 LOG: sending inval msg 29 0 26 5 30036 1941919314 LOG: sending inval msg 30 0 26 4 30036 4294936633 LOG: sending inval msg 29 0 26 4 30036 544523647 LOG: sending inval msg 30 0 25 39 30036 4294936633 LOG: sending inval msg 29 0 25 39 30036 544523647 LOG: sending inval msg 30 0 26 4 30036 4294936633 LOG: sending inval msg 29 0 26 4 30036 544523647 LOG: sending inval msg 30 0 25 38 30036 4294936635 LOG: sending inval msg 29 0 25 38 30036 2557582018 LOG: sending inval msg 30 0 26 3 30036 4294936635 LOG: sending inval msg 29 0 26 3 30036 2557582018 LOG: sending inval msg 30 0 25 37 30036 4294936637 LOG: sending inval msg 29 0 25 37 30036 2207280630 LOG: sending inval msg 30 0 26 2 30036 4294936637 LOG: sending inval msg 29 0 26 2 30036 2207280630 LOG: sending inval msg 30 0 26 1 30036 4294936669 LOG: sending inval msg 29 0 26 1 30036 1310188568 LOG: sending inval msg 30 0 25 36 30036 4294936669 LOG: sending inval msg 29 0 25 36 30036 1310188568 LOG: sending inval msg 30 0 26 1 30036 4294936669 LOG: sending inval msg 29 0 26 1 30036 1310188568 LOG: sending inval msg 30 0 25 35 30036 4294936671 LOG: sending inval msg 29 0 25 35 30036 2633053415 LOG: sending inval msg 30 0 25 48 30036 4294936671 LOG: sending inval msg 29 0 25 48 30036 2633053415 LOG: sending inval msg 30 0 25 33 30036 4294936673 LOG: sending inval msg 29 0 25 33 30036 2049964857 LOG: sending inval msg 30 0 25 47 30036 4294936673 LOG: sending inval msg 29 0 25 47 30036 2049964857 LOG: sending inval msg -1 0 30036 0 30700 3218341912 LOG: sending inval msg -2 2084 1663 0 30036 50335 LOG: sending inval msg -2 0 1663 0 30036 50336 LOG: sending inval msg -1 2075 30036 0 30702 30036 LOG: sending inval msg -2 0 1663 0 30036 50324 LOG: sending inval msg -1 0 30036 0 30702 30036 LOG: sending inval msg -2 0 1663 0 30036 50336 LOG: sending inval msg -2 0 1663 0 30036 50323 LOG: sending inval msg -1 0 30036 0 30700 30036 LOG: sending inval msg -2 0 1663 0 30036 50335 LOG: sending inval msg -2 0 1663 0 30036 50322 LOG: sending inval msg -1 0 30036 0 30698 30036 LOG: sending inval msg -2 0 1663 0 30036 50334 LOG: sending inval msg -1 0 30036 0 30677 3218341912 LOG: sending inval msg -2 2084 1663 0 30036 50332 LOG: sending inval msg -2 0 1663 0 30036 50333 LOG: sending inval msg -1 2075 30036 0 30679 30036 LOG: sending inval msg -2 0 1663 0 30036 50321 LOG: sending inval msg -1 0 30036 0 30679 30036 LOG: sending inval msg -2 0 1663 0 30036 50333 LOG: sending inval msg -2 0 1663 0 30036 50320 LOG: sending inval msg -1 0 30036 0 30677 30036 LOG: sending inval msg -2 0 1663 0 30036 50332 LOG: sending inval msg -2 0 1663 0 30036 50319 LOG: sending inval msg -1 0 30036 0 30675 30036 LOG: sending inval msg -2 0 1663 0 30036 50331 LOG: sending inval msg -1 0 30036 0 30660 3218341912 LOG: sending inval msg -2 2084 1663 0 30036 50329 LOG: sending inval msg -2 0 1663 0 30036 50330 LOG: sending inval msg -1 2075 30036 0 30662 30036 LOG: sending inval msg -2 0 1663 0 30036 50318 LOG: sending inval msg -1 0 30036 0 30662 30036 LOG: sending inval msg -2 0 1663 0 30036 50330 LOG: sending inval msg -2 0 1663 0 30036 50317 LOG: sending inval msg -1 0 30036 0 30660 30036 LOG: sending inval msg -2 0 1663 0 30036 50329 LOG: sending inval msg -2 0 1663 0 30036 50316 LOG: sending inval msg -1 0 30036 0 30658 30036 LOG: sending inval msg -2 0 1663 0 30036 50328 LOG: sending inval msg -1 0 30036 0 30624 3218341912 LOG: sending inval msg -2 2084 1663 0 30036 50326 LOG: sending inval msg -2 0 1663 0 30036 50327 LOG: sending inval msg -1 2075 30036 0 30626 30036 LOG: sending inval msg -2 0 1663 0 30036 50315 LOG: sending inval msg -1 0 30036 0 30626 30036 LOG: sending inval msg -2 0 1663 0 30036 50327 LOG: sending inval msg -2 0 1663 0 30036 50314 LOG: sending inval msg -1 0 30036 0 30624 30036 LOG: sending inval msg -2 0 1663 0 30036 50326 LOG: sending inval msg -2 0 1663 0 30036 50313 LOG: sending inval msg -1 0 30036 0 30622 30036 LOG: sending inval msg -2 0 1663 0 30036 50325 Tom Lane napsal(a): > Jakub Ouhrabka <kuba@comgate.cz> writes: >> We'we tried hard to identify what's the cause of filling sinval-queue. >> We went through query logs as well as function bodies stored in the >> database. We were not able to find any DDL, temp table creations etc. > > Strange. The best idea that comes to mind is to add some debugging > code to SendSharedInvalidMessage to log the content of each message > that's sent out. That would at least tell us *what* is going into > the queue, even if not directly *why*. Try something like (untested) > > void > SendSharedInvalidMessage(SharedInvalidationMessage *msg) > { > bool insertOK; > > + elog(LOG, "sending inval msg %d %u %u %u %u %u", > + msg->cc.id, > + msg->cc.tuplePtr.ip_blkid.bi_hi, > + msg->cc.tuplePtr.ip_blkid.bi_lo, > + msg->cc.tuplePtr.ip_posid, > + msg->cc.dbId, > + msg->cc.hashValue); > + > LWLockAcquire(SInvalLock, LW_EXCLUSIVE); > insertOK = SIInsertDataEntry(shmInvalBuffer, msg); > LWLockRelease(SInvalLock); > > regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Jakub Ouhrabka <kuba@comgate.cz> writes: >>> Huh. One transaction truncating a dozen tables? That would match the >>> sinval trace all right ... > It should be 4 tables - the shown log looks like there were more truncates? Actually, counting up the entries, there are close to 2 dozen relations apparently being truncated in the trace you showed. But that might be only four tables at the user level, since each index on these tables would appear separately, and you might have a toast table plus index for each one too. If you want to dig down, the table OIDs are visible in the trace, in the messages with type -1: >> LOG: sending inval msg -1 0 30036 0 30700 3218341912 ^^^^^ ^^^^^ DBOID RELOID so you could look into pg_class to confirm what's what. > Yes, performance was the initial reason to use truncate instead of > delete many years ago. But today the truncated tables usualy contain > exactly one row - quick measurements now show that it's faster to issue > delete instead of truncate in this case. Okay, for a table of just a few entries I agree that DELETE is probably better. But don't forget you're going to need to have those tables vacuumed fairly regularly now, else they'll start to bloat. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |