This is a discussion on Re: Remove xmin and cmin from frozen tuples within the pgsql Hackers forums, part of the PostgreSQL category; --> On Thu, Sep 01, 2005 at 10:45:44AM +0900, ITAGAKI Takahiro wrote: Hi, > I think it would be a ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Thu, Sep 01, 2005 at 10:45:44AM +0900, ITAGAKI Takahiro wrote: Hi, > I think it would be a waste to retain xmin and cmin for frozen tuples > because their values represent only 'visible for all transactions'. > Additionally, most tuples in database can be frozen potentially. I think this is an interesting idea. I was thinking that when the tuple needs to be obsoleted it would need to grow to accomodate the Xmax, but you are not actually proposing to remove that, so it seems sensible. In fact, it is perfectly reasonable to remove Xmin and Cmin, because after the tuple is frozen, the Xmin never changes again. Now, one thing of note is that you need to "compress" the page in order to actually be able to use the just-freed space. VACUUM could do that, but maybe it would be better to do it on-line -- the freezing process is going to have to write the page regardless. I wonder if with your patch the page is compressed on the same VACUUM execution that freezes the tuple? One thing that comes to mind is that this makes somewhat easier to build a tool to write pre-built tables, for bulk-loading purposes. You just construct the binary file with the HEAP_FROZEN bit set, and then attach the file to a dummy table. (Then again, you can do it today, using a Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't advocating a tool to do that. It is very hard to do with user-defined types, but for BI/DW you mostly don't need those, do you?) -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Cuando no hay humildad las personas se degradan" (A. Christie) ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Alvaro, > One thing that comes to mind is that this makes somewhat easier to build > a tool to write pre-built tables, for bulk-loading purposes. You just > construct the binary file with the HEAP_FROZEN bit set, and then attach > the file to a dummy table. (Then again, you can do it today, using a > Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't > advocating a tool to do that. It is very hard to do with user-defined > types, but for BI/DW you mostly don't need those, do you?) Hmmm ... can you expand on this a little? We'd discussed "frozen partitions" but hadn't thought to get around to them for a while, expecting the kind of issues which Tom just raised. -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Now, one thing of note is that you need to "compress" the page in order > to actually be able to use the just-freed space. VACUUM could do that, > but maybe it would be better to do it on-line -- the freezing process is > going to have to write the page regardless. I agree. I think an good position of freezer is on bgwriter. My idea is: 1. Just before bgwriter writes an dirty page in LRU order, 2. Freeze tuples in the page and repair fragmentation. 3. (Replace the fsm page that has least freespace.) 4. Flush the page. > I wonder if with your patch > the page is compressed on the same VACUUM execution that freezes the tuple? Yes, defragmentation is performed after freezing, but the page has at least one dead tuple. In current VACUUM implementation, pages that have no dead tuples will not be defraged. So you cannot "compress" just after bulk-load. --- ITAGAKI Takahiro NTT Cyber Space Laboratories ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| ITAGAKI Takahiro <itagaki.takahiro@lab.ntt.co.jp> writes: > I agree. I think an good position of freezer is on bgwriter. > My idea is: > 1. Just before bgwriter writes an dirty page in LRU order, > 2. Freeze tuples in the page and repair fragmentation. > 3. (Replace the fsm page that has least freespace.) > 4. Flush the page. This is a bad idea. The bgwriter isn't the place to be doing freezing, because there is no reasonable way for it to guarantee that all old tuples in a table (or any larger unit) have been frozen. So you'd still need VACUUM to ensure no wraparound. Plus, you can't do such changes without emitting an XLOG record, which is something we don't want happening in the bgwriter's inner loop. Even more to the point, you can't do such changes without getting a superexclusive lock on the page (not only locked, but no one else has it pinned), which is a real nonstarter for the bgwriter, both for performance and possible deadlock issues. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Wed, Aug 31, 2005 at 09:14:42PM -0700, Josh Berkus wrote: > > One thing that comes to mind is that this makes somewhat easier to build > > a tool to write pre-built tables, for bulk-loading purposes. You just > > construct the binary file with the HEAP_FROZEN bit set, and then attach > > the file to a dummy table. (Then again, you can do it today, using a > > Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't > > advocating a tool to do that. It is very hard to do with user-defined > > types, but for BI/DW you mostly don't need those, do you?) > > Hmmm ... can you expand on this a little? We'd discussed "frozen partitions" > but hadn't thought to get around to them for a while, expecting the kind of > issues which Tom just raised. What issues did he raise on this? What I'm saying is that you can write a heap file, on which the tuples would all have xmin=FrozenTransactionId, xmax=Invalid, and the corresponding bits set in the infomask. This ensures that no matter the state of the server, you can plug the file in and all tuples will be valid. The "only" problem is figuring out how to lay the data in the tuples themselves, w.r.t endianness and such. This is platform-dependent, so you have to write code to do it correctly. In absence of user-defined types, this should not be _too_ hard to do. Of course, such a program would in general also be Postgres-version-dependent. Note that this is a very different business from skipping the Xmin and Cmin from the tuple header -- in fact, there's no relation to that at all. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com FOO MANE PADME HUM ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > What I'm saying is that you can write a heap file, on which the tuples > would all have xmin=FrozenTransactionId, xmax=Invalid, and the > corresponding bits set in the infomask. This ensures that no matter the > state of the server, you can plug the file in and all tuples will be > valid. > The "only" problem is figuring out how to lay the data in the tuples > themselves, w.r.t endianness and such. This is platform-dependent, so > you have to write code to do it correctly. In absence of user-defined > types, this should not be _too_ hard to do. Of course, such a program > would in general also be Postgres-version-dependent. Of course, it's fair to ask whether such a program would be any faster than binary-mode COPY by the time you got done ... or enough faster to justify your effort, anyway. THe only fundamental disadvantage that COPY labors under is having to write WAL records. It might be interesting to do something similar to the recent hacks for CREATE TABLE AS, so that a COPY into a table just created in the current transaction would skip writing WAL and instead fsync the table at the end. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Wed, 2005-08-31 at 22:25 -0400, Alvaro Herrera wrote: > On Thu, Sep 01, 2005 at 10:45:44AM +0900, ITAGAKI Takahiro wrote: > > Hi, > > > I think it would be a waste to retain xmin and cmin for frozen tuples > > because their values represent only 'visible for all transactions'. > > Additionally, most tuples in database can be frozen potentially. > > I think this is an interesting idea. Agreed, especially since it would avoid the need to vacuum altogether. > I was thinking that when the tuple > needs to be obsoleted it would need to grow to accomodate the Xmax, but > you are not actually proposing to remove that, so it seems sensible. In > fact, it is perfectly reasonable to remove Xmin and Cmin, because after > the tuple is frozen, the Xmin never changes again. It's a good idea, but the Xmin is set to FrozenTransactionId, which is how we know it is frozen, so how can we remove Xmin? The way to do this is surely by using a row version id that is different for this format. Getting 8 or 16 bytes per row back would be a very useful gain. > Now, one thing of note is that you need to "compress" the page in order > to actually be able to use the just-freed space. VACUUM could do that, > but maybe it would be better to do it on-line -- the freezing process is > going to have to write the page regardless. I wonder if with your patch > the page is compressed on the same VACUUM execution that freezes the > tuple? Only if you do a FULL, which is currently incompatible with a FREEZE. There's no point in compressing a block if you can't also redistribute rows between blocks to fill up the spaces, so another reason why it has to be a FULL. Unless you do this at load time, which is why I guess you mention.... > One thing that comes to mind is that this makes somewhat easier to build > a tool to write pre-built tables, for bulk-loading purposes. You just > construct the binary file with the HEAP_FROZEN bit set, and then attach > the file to a dummy table. (Then again, you can do it today, using a > Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't > advocating a tool to do that. It is very hard to do with user-defined > types, but for BI/DW you mostly don't need those, do you?) Loading a table using COPY with frozen bits set was suggested in May, so yeh... it was suggested. At that time it was rejected, since earlier transactions would then be able to see rows they ought not be able to see. Thinking some more about this, this is only the inverse situation of a TRUNCATE. With truncate we remove tuples that ought to still be visible to pre-existing transactions. So there shouldn't really be an issue with loading pre-frozen tuples - as long as you accept the consequences for row visibility. Externally writing blocks is possible, but it bypasses a lot of other features. My current preference would be to have bulk_heap_insert() function to add a whole page at a time rather than inserting rows one at at a time. The main objective for a load is to make it disk bound; once we've achieved that by some further tuning, writing an external file would cost around the same as writing it internally from the DBMS. Oracle (direct path loader) and Teradata (Fastload) load data in complete blocks using a reduced code pathway, so I guess I was just following on, but I'm genuinely open to further persuasion if there is a better way. Having a table marked as INSERT ONLY would allow us to save 8 bytes/row, loading it pre-frozen (in some way) would save another 8 bytes/row and allow us to permanently avoid VACUUMing the table. That would be even better when we have per-table XID wrap avoidance. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Thu, Sep 01, 2005 at 11:08:36AM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > What I'm saying is that you can write a heap file, on which the tuples > > would all have xmin=FrozenTransactionId, xmax=Invalid, and the > > corresponding bits set in the infomask. This ensures that no matter the > > state of the server, you can plug the file in and all tuples will be > > valid. > > > The "only" problem is figuring out how to lay the data in the tuples > > themselves, w.r.t endianness and such. This is platform-dependent, so > > you have to write code to do it correctly. In absence of user-defined > > types, this should not be _too_ hard to do. Of course, such a program > > would in general also be Postgres-version-dependent. > > Of course, it's fair to ask whether such a program would be any faster > than binary-mode COPY by the time you got done ... or enough faster to > justify your effort, anyway. It may not be faster generating the data in the first place, but you don't have to vacuum the table, nor you are subject to hint bits changing, resulting in more unnecessary I/O. This can't be avoided with COPY, because there's always the chance that it will fail partway through, so you can't write frozen tuples. With an external program, you can just dump the invalid line somewhere else and continue with the rest. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Just treat us the way you want to be treated + some extra allowance for ignorance." (Michael Brusser) ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Alvaro, > What issues did he raise on this? On having no Xmin. > What I'm saying is that you can write a heap file, on which the tuples > would all have xmin=FrozenTransactionId, xmax=Invalid, and the > corresponding bits set in the infomask. This ensures that no matter the > state of the server, you can plug the file in and all tuples will be > valid. > > The "only" problem is figuring out how to lay the data in the tuples > themselves, w.r.t endianness and such. This is platform-dependent, so > you have to write code to do it correctly. In absence of user-defined > types, this should not be _too_ hard to do. Of course, such a program > would in general also be Postgres-version-dependent. So, bulk loading by file generation? So the idea is that you would generate a properly formatted PostgreSQL table file, and then in one transaction create the table and attach it? Seems like this would have the additional limitation of being useful only for loading new partitions/new tables. However, it would have some significant advantages for bulk loading ... chiefly that the data page generation and associated computations could be done *off* the database server. This might help considerably in getting around the 100mb/s data computation ceiling we're hitting ... -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| Tom, > THe only fundamental disadvantage that COPY labors under is having to > write WAL records. It might be interesting to do something similar to > the recent hacks for CREATE TABLE AS, so that a COPY into a table just > created in the current transaction would skip writing WAL and instead > fsync the table at the end. Yes, I thought we discussed doing this for empty tables -- it would be, per our tests, a +10% to +30% boost to COPY. But there was some problem the patch? -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |