This is a discussion on locks in CREATE TRIGGER, ADD FK within the pgsql Hackers forums, part of the PostgreSQL category; --> Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> It isn't 100% MVCC, I agree. But it works because ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> It isn't 100% MVCC, I agree. But it works because system catalog >> lookups are SnapshotNow, and so when another session comes and wants to >> look at the table it will see the committed new version of the pg_class >> row pointing at the new relfilenode file. > If by "works", you mean "provides correct transactional semantics", then > that simply isn't true. Not making CLUSTER and similar DDL commands MVCC > compliant isn't the end of the world, I agree, but that doesn't make it > correct, either. I agree that we aren't MVCC with respect to DDL operations (and for this purpose CLUSTER is DDL). Trying to become so would open a can of worms far larger than it's worth, though, IMHO. Let me give you a couple of illustrations of why things are the way they are. Consider Transaction 1 Transaction 2 BEGIN; ... BEGIN; ... INSERT INTO a; CLUSTER a; ... Currently, because T2 continues to hold a write lock on A until it commits, T1's CLUSTER will block until T2 commits; therefore CLUSTER's use of SnapshotNow is sufficient to not lose any live tuples. Were T1 to use a transaction-aware snapshot to scan A, this would not be so. (SnapshotAny would work, but CLUSTER would have to become an order of magnitude harder than it is now, because it would have to preserve UPDATE chain link relationships ... compare VACUUM FULL's treatment of tuple chains.) Transaction 1 Transaction 2 BEGIN; ... CLUSTER a; INSERT INTO a; Were T1 using a transaction-aware snapshot to read pg_class, it would insert its new tuple into the wrong relfilenode for A, causing either immediate failure or eventual loss of a live tuple. Transaction 1 Transaction 2 BEGIN; ... CREATE TRIGGER ... ON INSERT TO a ... INSERT INTO a; Were T1 using a transaction-aware snapshot to read pg_trigger, it would fail to fire the already-committed insert trigger. (Variants of this include failing to insert index entries into a new index, failing to apply a new constraint, etc.) You don't have to assume that T1 is in Serializable mode, either. It might be using Read Committed, but the INSERT starts and sets its snapshot while T2 is still in progress; then of course blocks until T2 commits; then does the wrong thing because it is still paying attention to pre-T2 catalog entries. This is why LockRelation accepts SI inval messages immediately after each lock acquisition; it's to ensure that we do see the effects of T2. Transaction 1 Transaction 2 BEGIN; SELECT FROM a; ... CREATE TRIGGER ... ON INSERT TO a ... INSERT INTO a; Ordinarily, once T1 has touched relation A, it can be sure that A's schema will not change until end of transaction. The change you committed last night removes that guarantee, at least for the limited case of triggers, and makes the above interleaving possible. While I haven't come up with a clear failure case after a few minutes' thought, I'm not convinced that there isn't one. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Greg Stark <gsstark@mit.edu> writes: > Though pg_dump works in READ COMMITTED mode doesn't it? Certainly not. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Though pg_dump works in READ COMMITTED mode doesn't it? > > Certainly not. It works in serializable mode? I guess the only reason we don't see "transaction failed" ever is because it's not doing any updates? -- greg ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Greg Stark <gsstark@mit.edu> writes: >>> Though pg_dump works in READ COMMITTED mode doesn't it? >> >> Certainly not. > It works in serializable mode? I guess the only reason we don't see > "transaction failed" ever is because it's not doing any updates? Right. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > I think last night's discussion makes it crystal-clear why I felt that > this hasn't been sufficiently thought through. Please revert until the > discussion comes to a conclusion. I applied the patch because I don't think it is very closely related to the discussion. But if you'd prefer, I'm happy to wait until we're done, so I've reverted the patch. -Neil ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| Tom Lane wrote: > I agree that we aren't MVCC with respect to DDL operations (and for this > purpose CLUSTER is DDL). Trying to become so would open a can of worms > far larger than it's worth, though, IMHO. I think if we can come up with a reasonable way to handle all the consequences, it's worth doing. And yes, I realize there are a lot of consequences, so it may well not be possible. > Transaction 1 Transaction 2 > > BEGIN; > > ... BEGIN; > > ... INSERT INTO a; > > CLUSTER a; ... > > Currently, because T2 continues to hold a write lock on A until it > commits, T1's CLUSTER will block until T2 commits; therefore CLUSTER's > use of SnapshotNow is sufficient to not lose any live tuples. Were T1 > to use a transaction-aware snapshot to scan A, this would not be so. I think this is somewhat tangential: we're discussing changing the snapshot used to scan system catalogs, not user relations like A. The only reason that CLUSTER's use of SnapshotNow is a problem at the moment is the same reason that TRUNCATE is a problem -- a concurrent serializable transaction will use the new relfilenode, not the old one. > Transaction 1 Transaction 2 > > BEGIN; > > ... CLUSTER a; > > INSERT INTO a; > > Were T1 using a transaction-aware snapshot to read pg_class, it would > insert its new tuple into the wrong relfilenode for A, causing either > immediate failure or eventual loss of a live tuple. Yes, definitely a problem The only somewhat reasonable behavior I can think of is to cause modifications to the oldrelfilenode to fail in a concurrent serializable transaction. The behavior would be somewhat analogous to an UPDATE in a serializable transaction failing because of a concurrent data modification, although in this case we would error out on any modification (e.g. INSERT). -Neil ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| Alvaro Herrera wrote: > On Wed, Mar 23, 2005 at 10:42:01AM +0800, Christopher Kings-Lynne wrote: > > >>If you want to be my friend forever, then fix CLUSTER so that it uses > > >>sharerowexclusive as well > > > > > >I don't think it's as easy as that, because you have to move tuples > > >around in the cluster operation. Same sort of issue as vacuum full I > > >would suggest. > > > > Cluster doesn't move rows... > > > > I didn't say it was easy. It would involve changing how cluster works. > > It would keep the old table around while building the new, then grab > > an exclusive lock to swap the two. > > Huh, cluster already does that. > > I don't remember what the rationale was for locking the table, leaving > even simple SELECTs out. (In fact, IIRC the decision wasn't made by me, > and it wasn't discussed at all.) > The issue is that we would have to escalate the shared lock to exclusive to swap out the files, and lock escallation could lead to deadlock/starvation. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| On Wed, 2005-03-23 at 10:04 -0500, Tom Lane wrote: > I think last night's discussion makes it crystal-clear why I felt that > this hasn't been sufficiently thought through. Please revert until the > discussion comes to a conclusion. Are there any remaining objections to reapplying this patch? The original commit message is here: http://archives.postgresql.org/pgsql...3/msg00316.php The archives of the -hackers thread are here: http://archives.postgresql.org/pgsql...3/msg00764.php -Neil ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Neil Conway <neilc@samurai.com> writes: > Are there any remaining objections to reapplying this patch? > The original commit message is here: > http://archives.postgresql.org/pgsql...3/msg00316.php > The archives of the -hackers thread are here: > http://archives.postgresql.org/pgsql...3/msg00764.php I'm still concerned about the last example I raised in http://archives.postgresql.org/pgsql...3/msg00840.php which was: >> Transaction 1 Transaction 2 >> >> BEGIN; >> >> SELECT FROM a; >> >> ... CREATE TRIGGER ... ON INSERT TO a ... >> >> INSERT INTO a; >> >> Ordinarily, once T1 has touched relation A, it can be sure that A's >> schema will not change until end of transaction. The change you >> committed last night removes that guarantee, at least for the >> limited case of triggers, and makes the above interleaving possible. >> While I haven't come up with a clear failure case after a few minutes' >> thought, I'm not convinced that there isn't one. It's possible that this is safe for triggers only, but I'm not 100% convinced of that, and I'm not sure I see the point of relaxing the general rule "schema changes require AccessExclusiveLock" for just this one operation. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| On 2005-05-30, Neil Conway <neilc@samurai.com> wrote: > On Wed, 2005-03-23 at 10:04 -0500, Tom Lane wrote: >> I think last night's discussion makes it crystal-clear why I felt that >> this hasn't been sufficiently thought through. Please revert until the >> discussion comes to a conclusion. > > Are there any remaining objections to reapplying this patch? I've run into a few questions recently that might be relevent to the issue of DDL locking in general and therefore possibly this change in particular. The most significant one is to do with the pg_get_*def functions and their consistency, or otherwise, with explicit scans of the system catalogs. The problem here of course is that the pg_get_*def functions mostly (but not exclusively) use the syscache and therefore see data relative to SnapshotNow, whereas the queries that are invoking them are likely to be doing explicit scans of the catalog tables within the transaction's active snapshot (and for a long-running serializable transaction such as pg_dump, these may be some way apart). The obvious place to look for failure modes is to see whether pg_dump can be made to fail by deleting something (index, perhaps?) that it is expecting to find, and see whether it chokes (pg_get_indexdef will elog if the index doesn't exist in SnapshotNow). Dropping a view might be another case where this can be made to fail. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services |
| Thread Tools | |
| Display Modes | |
|
|