This is a discussion on locks in CREATE TRIGGER, ADD FK within the pgsql Hackers forums, part of the PostgreSQL category; --> Neil Conway <neilc@samurai.com> writes: > How can we drop the file at commit, given that a serializable > transaction's ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Neil Conway <neilc@samurai.com> writes: > How can we drop the file at commit, given that a serializable > transaction's snapshot should still be able to see old relfilenode's > content? It isn't 100% MVCC, I agree. But it works because system catalog lookups are SnapshotNow, and so when another session comes and wants to look at the table it will see the committed new version of the pg_class row pointing at the new relfilenode file. What you have to prevent is somebody accessing the table *while* the changeover happens ... and that's why your lock has to be AccessExclusive. If you want to complain about MVCC violations in CLUSTER, think about the fact that it scans the table with SnapshotNow, and therefore loses rows that are committed-dead but might still be visible to somebody. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: > In fact, would a truncate during a backup cause the backup to be > inconsistent because it wouldn't be a true snapshot of the database at > backup start time? Seems so. No, because pg_dump holds AccessShareLock on every table that it intends to dump, thereby ensuring that TRUNCATE/CLUSTER/etc are held off. The proposal to weaken the locks that those operations take would in fact break pg_dump. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > In fact, would a truncate during a backup cause the backup to be > > inconsistent because it wouldn't be a true snapshot of the database at > > backup start time? Seems so. > > No, because pg_dump holds AccessShareLock on every table that it intends > to dump, thereby ensuring that TRUNCATE/CLUSTER/etc are held off. The > proposal to weaken the locks that those operations take would in fact > break pg_dump. Oh, it pre-locks. I didn't know that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Neil Conway <neilc@samurai.com> writes: > - if we make these changes, we will need some way to delete a > no-longer-visible relfilenode. This is presuming that we abandon the notion that system catalog access use SnapshotNow. Which opens the question of what they should use instead ... to which "transaction snapshot" isn't the answer, because we have to be able to do system catalog accesses before we've set the snapshot. (Else forget issuing LOCK TABLE before the snapshot is set.) I really think that you haven't the faintest idea of the size of the can of worms you are opening here :-( regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Tom Lane wrote: > It isn't 100% MVCC, I agree. But it works because system catalog > lookups are SnapshotNow, and so when another session comes and wants to > look at the table it will see the committed new version of the pg_class > row pointing at the new relfilenode file. If by "works", you mean "provides correct transactional semantics", then that simply isn't true. Not making CLUSTER and similar DDL commands MVCC compliant isn't the end of the world, I agree, but that doesn't make it correct, either. > If you want to complain about MVCC violations in CLUSTER, think about > the fact that it scans the table with SnapshotNow, and therefore loses > rows that are committed-dead but might still be visible to somebody. This seems like another facet of the same problem (a serializable transaction's snapshot effectively includes the relfilenodes that were visible when the snapshot was taken, and swapping in another relfilenode under its nose is asking for trouble). We could fix the CLUSTER bug, although not the TRUNCATE bug, by scanning the old relation with SnapshotAny (or ideally, "the snapshot such that we can see all tuples visible to any currently running transaction", if we can produce such a snapshot easily). Not sure if that's worth doing; it would be nice to solve the root problem (scanning system catalogs with SnapshotNow, per discussion elsewhere in thread). -Neil ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| Tom Lane wrote: > This is presuming that we abandon the notion that system catalog > access use SnapshotNow. Which opens the question of what they should > use instead ... to which "transaction snapshot" isn't the answer, > because we have to be able to do system catalog accesses before > we've set the snapshot. I wonder if it would be possible to use SnapshotNow before the transaction's snapshot has been established, and the transaction's snapshot subsequently. Although it definitely makes me nervous to use multiple snapshots over the life of a single transaction... -Neil ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| Tom Lane wrote: > I don't think this has been adequately thought through at all ... but > at least make it ExclusiveLock. What is the use-case for allowing > SELECT FOR UPDATE in parallel with this? Ok, patch applied -- I adjusted it to use ExclusiveLock, and fleshed out some of the comments. -Neil ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: > > It isn't 100% MVCC, I agree. But it works because system catalog > > lookups are SnapshotNow, and so when another session comes and wants to > > look at the table it will see the committed new version of the pg_class > > row pointing at the new relfilenode file. > > If by "works", you mean "provides correct transactional semantics", then that > simply isn't true. Not making CLUSTER and similar DDL commands MVCC compliant > isn't the end of the world, I agree, but that doesn't make it correct, either. I think he means it works because it doesn't matter whether the serializable transaction sees the old table or the new one. As soon as the CLUSTER commits the serializable transaction can start using the new one since it's functionally identical to the old one (at least it's supposed to be, Tom points out it isn't). > > If you want to complain about MVCC violations in CLUSTER, think about > > the fact that it scans the table with SnapshotNow, and therefore loses > > rows that are committed-dead but might still be visible to somebody. Ouch. That's, er, a problem. I guess currently it's fine for any transaction using READ COMMITTED but it's already wrong for serializable transactions. And it'll be wrong for READ COMMITTED if CLUSTER is changed not to take an exclusive lock. -- greg ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Tom Lane <tgl@sss.pgh.pa.us> writes: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > In fact, would a truncate during a backup cause the backup to be > > inconsistent because it wouldn't be a true snapshot of the database at > > backup start time? Seems so. > > No, because pg_dump holds AccessShareLock on every table that it intends > to dump, thereby ensuring that TRUNCATE/CLUSTER/etc are held off. The > proposal to weaken the locks that those operations take would in fact > break pg_dump. It seems like that would be true for TRUNCATE but not CLUSTER. Though pg_dump works in READ COMMITTED mode doesn't it? So it doesn't really get a consistent view of the database exactly anyways. If it tried to go in SERIALIZABLE mode I suspect it would rarely complete though. -- greg ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| ||||
| Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> I don't think this has been adequately thought through at all ... but >> at least make it ExclusiveLock. What is the use-case for allowing >> SELECT FOR UPDATE in parallel with this? > Ok, patch applied -- I adjusted it to use ExclusiveLock, and fleshed out > some of the comments. I think last night's discussion makes it crystal-clear why I felt that this hasn't been sufficiently thought through. Please revert until the discussion comes to a conclusion. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| Thread Tools | |
| Display Modes | |
|
|