This is a discussion on NOLOGGING option, or ? within the pgsql Hackers forums, part of the PostgreSQL category; --> On Wed, 2005-06-01 at 11:55 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > 4. Optionally, we ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Wed, 2005-06-01 at 11:55 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > 4. Optionally, we set a flag on the table showing the whole table is > > frozen. Anybody writing to this table subsequently will spoil this flag. > > If the flag is set, all forms of VACUUM will return success immediately > > without performing a scan (since it is already in a perfect VACUUM FULL > > and VACUUM FREEZE state). > > This bit strikes me as dangerous and not related to the original > proposal. I don't care for the load-already-frozen-data part at all, > either. That's not just giving up WAL protection, that's an outright > MVCC semantics violation, in return for which we get ... not much. > Certainly not any speedup in the LOAD itself. I agree it *sounds* dangerous, but is it? If so, how? I will immediately withdraw any idea that proves dangerous. We're holding the table lock and will continue to do so until end of transaction. No transaction with an earlier id will ever see the data we load because of the lock. Later transactions will see the data only when we commit. At that stage, all they care about is that the data is visible...it doesn't matter that we have frigged it to look like its been there a real long time. We're not giving up WAL protection. The PITR case is covered. Non-PITR cases have exactly the same transactional consistency across as crash as they do now. Nothing is lost through this proposal. Doing those two additional actions gives us the ability to avoid: 1. avoid writing the whole table twice because of SetBufferCommitInfoNeedsSave 2. avoid pointless VACUUMs of very large tables 3. load the table direct to read-only media, or copy it to read-only media at some point in the future without needing to do a VACUUM FREEZE All of that means we write the table once, rather than 3 times over the lifetime of the table. And we only need scan it for user SQL, not for wrap-around avoiding VACUUMs. I know you have a solution to the second one in mind. I would not argue against the needs-vacuuming bitmap idea when that comes back around. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote: > On Wed, 2005-06-01 at 11:55 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > 4. Optionally, we set a flag on the table showing the whole table is > > > frozen. Anybody writing to this table subsequently will spoil this flag. > > > If the flag is set, all forms of VACUUM will return success immediately > > > without performing a scan (since it is already in a perfect VACUUM FULL > > > and VACUUM FREEZE state). > > > > This bit strikes me as dangerous and not related to the original > > proposal. I don't care for the load-already-frozen-data part at all, > > either. That's not just giving up WAL protection, that's an outright > > MVCC semantics violation, in return for which we get ... not much. > > Certainly not any speedup in the LOAD itself. > > I agree it *sounds* dangerous, but is it? If so, how? I will immediately > withdraw any idea that proves dangerous. > > We're holding the table lock and will continue to do so until end of > transaction. No transaction with an earlier id will ever see the data we > load because of the lock. Suppose you load half the tuples and the plug is pulled. After recovery, you have half-load of tuples that are visible to everyone. This is a no-no. Plus, what is the benefit of having the tuples in frozen state to start with? If this is a data warehouse application, I'd expect the table to be dropped or truncated rather before the billion-transactions barrier comes to pass. -- Alvaro Herrera (<alvherre[a]surnet.cl>) "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman) ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Simon Riggs <simon@2ndquadrant.com> writes: > If the server crashes, we replay WAL. If we see a load start message, we > truncate the relation and note that a load has started. If there is WAL > data for the tuples, we replay it. If WAL replay ends without the load > transaction having successfully committed, then we truncate the table. On further thought, this seems both risky and unnecessary. The reason it's risky is this scenario: * Backend 1 makes a LOAD-start WAL entry. * Backend 1 loads some data, extending the table beyond its former end. * Backend 1 errors out without committing its transaction. * Backend 2 inserts some data into the no-longer-locked table. It uses free space in one of the added pages, or maybe even adds new pages of its own. * Backend 2 commits. * System crashes, and we have to replay the above actions. In this scenario you cannot truncate at the end of replay without losing backend 2's committed data. You can think of various ways to avoid this risk (for instance, maybe *any* WAL-logged operation on the table should cause the pending TRUNCATE to be discarded) but they all seem expensive and/or still somewhat unsafe. The reason it's unnecessary is what's the point? All you're doing by not truncating is leaving some uncommitted tuples in the table. It's not the job of WAL recovery to get rid of such things; that's VACUUM's job. So what I'm thinking is we need no special WAL entries for this. What we need is just an operating mode of COPY in which it doesn't WAL-log its inserts, but instead fsyncs before completion, much like index build does. For safety it must do all its inserts into freshly-added pages; this is not to ensure truncatability, because we aren't going to do that anyway, but to ensure that we don't have unlogged operations changing pages that might contain committed tuples. (That would pose a risk of losing committed data to incomplete writes in case of system crash partway through. The same reason is why we need exclusive lock: else we might end up with pages containing a mix of logged and unlogged tuples.) Also there can be no indexes, since we don't want index entries pointing to unlogged tuples. And PITR can't be enabled. Otherwise no problem. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Alvaro Herrera <alvherre@surnet.cl> writes: > On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote: >> We're holding the table lock and will continue to do so until end of >> transaction. No transaction with an earlier id will ever see the data we >> load because of the lock. > Suppose you load half the tuples and the plug is pulled. After > recovery, you have half-load of tuples that are visible to everyone. > This is a no-no. Simon is expecting that the loaded tuples are guaranteed to be erased (by table truncation) during recovery. As I just noted I'm unconvinced of the safety of doing truncations during recovery, so I'd prefer not to depend on that. The scenario I was thinking of was different: you load pre-frozen tuples, commit, and thereby release the table lock. Now the tuples are visible to transactions that started before you did; that's what violates MVCC. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Wed, 2005-06-01 at 14:24 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@surnet.cl> writes: > > On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote: > >> We're holding the table lock and will continue to do so until end of > >> transaction. No transaction with an earlier id will ever see the data we > >> load because of the lock. > > > Suppose you load half the tuples and the plug is pulled. After > > recovery, you have half-load of tuples that are visible to everyone. > > This is a no-no. > > Simon is expecting that the loaded tuples are guaranteed to be erased > (by table truncation) during recovery. As I just noted I'm unconvinced > of the safety of doing truncations during recovery, so I'd prefer not > to depend on that. > > The scenario I was thinking of was different: you load pre-frozen > tuples, commit, and thereby release the table lock. Now the tuples > are visible to transactions that started before you did; that's what > violates MVCC. Agreed. MVCC violation. OK, back to the drawing board. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| On Wed, 2005-06-01 at 14:14 -0400, Tom Lane wrote: > So what I'm thinking is we need no special WAL entries for this. What > we need is just an operating mode of COPY in which it doesn't WAL-log > its inserts, but instead fsyncs before completion, much like index build > does. For safety it must do all its inserts into freshly-added pages; > this is not to ensure truncatability, because we aren't going to do that > anyway, but to ensure that we don't have unlogged operations changing > pages that might contain committed tuples. (That would pose a risk of > losing committed data to incomplete writes in case of system crash > partway through. The same reason is why we need exclusive lock: else > we might end up with pages containing a mix of logged and unlogged > tuples.) Also there can be no indexes, since we don't want index > entries pointing to unlogged tuples. And PITR can't be enabled. > Otherwise no problem. What you describe above is a coherent set of features that provide most of the benefits I sought, plus some others. We also don't mess with WAL, which is grand thing. We gain the ability to load into tables with rows already in them. I don't agree with all of your other points, but given time schedules, I think that we win with the above, so forget the rest. The main COPY/LOAD DATA discussion is on another thread of this from Alon, who has some interesting ideas and some really cool performance results to share. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| Jochem van Dieten wrote: > On 6/1/05, Bruce Momjian wrote: > > Jochem van Dieten wrote: > >> > >> Why only on an empty table? What is the problem with bypassing WAL on > >> any table as long as all files of that table are fsync'ed before > >> commit? > > > > Because adding rows to a table might modify existing pages, and if the > > COPY fails, you have to restore those pages to a consistent state, and > > make sure they are recovered for partial page writes, which we can't do > > without WAL. With an initially empty table, you can just throw away the > > file system file. > > Thank you for the explanation, but I am afraid I still don't get it. > > COPY can either fail and do a normal rollback, in which case there is > no problem because the xid never made it to the xlog. So I take it you > are talking about a hard crash (pull the plug) somewhere during the > actual writing to disk. In that case you have updated several pages > and overwritten the free space with new tuples. But you have not > overwritten live tuples, so why would you need to restore them? I > mean, didn't PostgreSQL < 7.1 work without a WAL at all? What if you are adding rows to an existing page --- in that case you are writing a page that also contained valid tuples before the COPY. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Alon Goldshuv wrote: > 9) allow for Simon's WAL bypass. > > I have surely missed some problems that hide behind the idea, but these > points make me believe that LOAD DATA is a good idea. The community is unlikely to add a new LOAD DATA command that does _almost_ everything COPY does. We are much more likely to incrementally improve COPY. The problem with a new command is that it becomes unclear when you should use COPY and when LOAD DATA, and it confuses users, and has maintenance overhead. If Bizgres wants a new command name, go for it, but it is unlikely that the community release is going to go in that direction, unless there is a fundamental agreement that COPY is broken and needs a major revamp, and I have heard no talk of that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Not unless you are proposing to change COPY to acquire a lock strong > >> enough to lock out other writers to the table for the duration ... > > > Well, if the table is initally empty, what harm is there in locking the > > table? > > You cannot *know* whether it is empty unless you lock the table before > you look. So your argument is circular. > > I think this only makes sense as an explicit option to COPY, one of the > effects of which would be to take a stronger lock than COPY normally does. One idea would be to look at the table file size first. If it has zero blocks, lock the table and if it still has zero blocks, do the no-WAL copy. I hate to add a flag to a command when we can automatically handle it ourselves. Now, you mentioned the idea of doing the optimization in tables that already have data, and if we do that, we would need a flag because the lock is stronger than what we have now. What we could do is to do no-WAL automatically for empty tables (like when a database is first loaded), and use the flag for cases where the tables is not zero pages. The fact is that database loads are a prefect case for this optimization and old dumps are not going to have that flag anyway, and automatic is better if we can do it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| Bruce, > The problem with a new command is that it becomes unclear when you > should use COPY and when LOAD DATA, and it confuses users, and has > maintenance overhead. If Bizgres wants a new command name, go for it, > but it is unlikely that the community release is going to go in that > direction, unless there is a fundamental agreement that COPY is broken > and needs a major revamp, and I have heard no talk of that. The question of whether COPY should be improved or whether the changes should take the form of a new command is separate from the question of whether the performance of the load path in PostgreSQL needs improvement. The 90% performance increase (from 12 MB/s to 21 MB/s) that Alon reported comes from replacing the parsing logic within COPY. I believe that the parsing logic in COPY is fundamentally broken from a performance perspective, and may be broken from a functionality perspective WRT embedded backslashes. One of the reasons to consider a LOAD DATA command is that we can isolate the need for performance improvements and special syntax from the concerns of preserving the legacy behavior of COPY for use as the primary mechanism for DUMP and RESTORE. - Luke ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |