This is a discussion on Re: pg_terminate_backend idea within the pgsql Hackers forums, part of the PostgreSQL category; --> > >> In any case the correct way to solve the problem is to find out > >> what's ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > >> In any case the correct way to solve the problem is to find out > >> what's being left corrupt by SIGTERM, rather than install more > >> messiness in order to avoid facing the real issue ... > > > That is unfortunatly way over my head. And it doesn't seem like > > anybody who actually has what it takes to do the "proper > solution" is > > interested in doing it. > > A test case --- even one that fails only a small percentage > of the time > --- would make things far easier. So far all I've seen are > very vague reports, and it's impossible to do anything about > it without more info. Very well. Let me try putting it like this, then: Assuming we don't get such a case, and a chance to fix it, before 8.1 (while still hoping we will get it fixed properly, we can't be sure, can we? If we were, it'd be fixed already). In this case, will you consider such a kludgy solution as a temporary fix to resolve a problem that a lot of users are having? And then plan to have it removed once sending SIGTERM directly to a backend can be considered safe? //Magnus ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| |||
| "Magnus Hagander" <mha@sollentuna.net> writes: > Assuming we don't get such a case, and a chance to fix it, before 8.1 > (while still hoping we will get it fixed properly, we can't be sure, can > we? If we were, it'd be fixed already). In this case, will you consider > such a kludgy solution as a temporary fix to resolve a problem that a > lot of users are having? And then plan to have it removed once sending > SIGTERM directly to a backend can be considered safe? Kluges tend to become institutionalized, so my reaction is "no". It's also worth pointing out that with so little understanding of the problem Rod is reporting, it's tough to make a convincing case that this kluge will avoid it. SIGTERM exit *shouldn't* be leaving any corrupted locktable entries behind; it's not that much different from the normal case. Until we find out what's going on, introducing still another exit path isn't really going to make me feel more comfortable, no matter how close it's alleged to be to the normal path. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > "Magnus Hagander" <mha@sollentuna.net> writes: > > Assuming we don't get such a case, and a chance to fix it, before 8.1 > > (while still hoping we will get it fixed properly, we can't be sure, can > > we? If we were, it'd be fixed already). In this case, will you consider > > such a kludgy solution as a temporary fix to resolve a problem that a > > lot of users are having? And then plan to have it removed once sending > > SIGTERM directly to a backend can be considered safe? > > Kluges tend to become institutionalized, so my reaction is "no". It's > also worth pointing out that with so little understanding of the problem > Rod is reporting, it's tough to make a convincing case that this kluge > will avoid it. SIGTERM exit *shouldn't* be leaving any corrupted > locktable entries behind; it's not that much different from the normal > case. Until we find out what's going on, introducing still another exit > path isn't really going to make me feel more comfortable, no matter how > close it's alleged to be to the normal path. I have been running some tests to try to see the lock table corruption but I have been unable to reproduce the problem. I have attached my crude test scripts. I would run the scripts and then open another session as a different user and do UPDATE and LOCK to cause the psql session to block. The only functional difference I can see between a SIGTERM exit and a cancel followed by a normal exit is the call to AbortCurrentTransaction(). Could that be significant? Because I can't reproduce the failure I can't know for sure. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 #!/usr/contrib/bin/expect -- set timeout -1 eval spawn sql test expect -nocase "test=>" send "begin;\r" expect -nocase "test=>" send "lock pg_class;\r" expect -nocase "test=>" send "select * from pg_locks;\r" expect -nocase "test=>" send "update test set x=3;\r" expect -nocase "test=>" expect eof exit while : do XPID=`/letc/ps_sysv -ef | grep 'postgres test'|grep -v grep|awk '{print $2}'` if [ "$XPID" != "" ] then kill $XPID echo $XPID XPID=`/letc/ps_sysv -ef | grep 'psql test'|grep -v execargs|awk '{print $2}'` kill $XPID fi sleep 1 done ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: > I have been running some tests to try to see the lock table corruption > but I have been unable to reproduce the problem. It's possible that what Rod was complaining of is fixed in CVS tip --- see discussion about RemoveFromWaitQueue() bug. If so, it would have been a bug that could be seen in other circumstances too, but maybe SIGTERM made it more probable for some reason. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| ||||
| Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I have been running some tests to try to see the lock table corruption > > but I have been unable to reproduce the problem. > > It's possible that what Rod was complaining of is fixed in CVS tip --- > see discussion about RemoveFromWaitQueue() bug. If so, it would have > been a bug that could be seen in other circumstances too, but maybe > SIGTERM made it more probable for some reason. Was that backpatched to 8.0.X? If not, I can test that branch of CVS for the problem. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |