This is a discussion on NOLOGGING option, or ? within the pgsql Hackers forums, part of the PostgreSQL category; --> On Wed, Jun 01, 2005 at 10:35:30AM -0700, Alon Goldshuv wrote: > 2) A modified command syntax for introducing ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Wed, Jun 01, 2005 at 10:35:30AM -0700, Alon Goldshuv wrote: > 2) A modified command syntax for introducing a direct single row error > handling. By direct I mean - a row that if rejected from within the COPY > command context does not throw an error and rollsback the whole transaction. > Instead the error is caught and recorded elsewhere, maybe in some error > table, with some more information that can later on be retrieved. The > following rows continue to be processed. This way there is barely any error > handling overhead. Is there any idea on exactly how would this be done? Do you plan on using savepoints to implement it? I fail to see how is this "barely any overhead". Savepoints are not that expensive but they are not free either. (No, I haven't measured it.) -- Alvaro Herrera (<alvherre[a]surnet.cl>) "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV) ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| Steve, > I've been following this thread, and I'm a little confused. Could you > possibly clarify what you mean, by providing a couple of lines of > input as it would be formatted with escape processing turned off - > containing a text field with an embedded newline and tab and a null field. Using an extended command syntax for a "copy-like" command named LOAD: LOAD [schema.]tablename [(column1,column2,...)] FROM {'filename' | STDIN} [ [WITH] [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [ESCAPE [AS] Œescape¹] ] ; If you intend to support embedded newlines (0x0a) in your character data without escapes, you will need to choose an alternative newline character for formatting the data. An example that uses '0xaa' as the newline might be: Control statement: LOAD webform (formdata) FROM /home/sample/sample.txt WITH DELIMITER Œ|¹ NULL Œ¹ NEWLINE '0xaa'; Sample with 2 identical rows (with binary representations depicted between <>): Blahblah<0xaa>blahblah<0x09>blahblah<0x00>blahblah <0xaa>Blahblah<0xaa>blahbl ah<0x09>blahblah<0x00>blahblah<0xaa> - Luke ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Steve, Oops. Example below should have read differently: > Sample with 2 identical rows (with binary representations depicted between > <>): > Blahblah<0xaa>blahblah<0x09>blahblah<0x00>blahblah <0xaa>Blahblah<0xaa>blahbl > ah<0x09>blahblah<0x00>blahblah<0xaa> Blahblah<0x0a>blahblah<0x09>blahblah<0x00>blahblah <0xaa>Blahblah<0x0a>blahbl ah<0x09>blahblah<0x00>blahblah<0xaa> This would result in the load of two records each of which would look like this (when printed on a typical terminal): Blahblah blahblah blahblah Luke ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On K, 2005-06-01 at 11:31 -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > I think this should be a decision done when creating a table, just like > > TEMP tables. So you always know if a certain table is or is not > > safe/replicated/recoverable. > > This has also the advantage of requiring no changes to actual COPY and > > INSERT commands. > > That doesn't seem right to me; the scenario I envision is that you are > willing to do the initial data loading over again (since you presumably > still have the source data available). But once you've got it loaded > you want full protection. What I mean, was that as it can't be safely replicated using log- shipping, It should be visible as such. > Perhaps it could work to use an ALTER TABLE command to flip the state. No. It would be the same as flipping a TEMP table to an ordinary table, which we don't support, and IMHO for a good reason > But I'm not really seeing the point compared to treating it as a COPY > option. The point is having a separate (sub)type of storage - non-WAL/non- replicated table and its indexes. > I do not believe that anyone needs this to work on individual > INSERT commands --- if you are after max speed, why aren't you using > COPY? And treating it as an ALTER property opens the possibility of > forgetting to ALTER the table back to normal behavior, which would be > a foot-gun of large caliber indeed :-( That's what I'm trying to avoid - If it is obvious, that the whole table is quasi-stable (in PITR/log-shipping sense) it is more clearly a user choice what kinds of data can be stored there. Same as TEMP tables again. -- Hannu Krosing <hannu@tm.ee> ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On K, 2005-06-01 at 18:05 -0400, Bruce Momjian wrote: > What we could do is to do no-WAL automatically for empty tables (like > when a database is first loaded), You forget that some databases use WAL for PITR / replication and doing it automatically there would surely mess up their replica. How is index creation handeled if it is not logged in WAL ? - is it not automatically WAL'ed ? - Must one recreate indexes after PITR or failover ? > and use the flag for cases where the > tables is not zero pages. The fact is that database loads are a prefect > case for this optimization and old dumps are not going to have that flag > anyway, and automatic is better if we can do it. -- Hannu Krosing <hannu@tm.ee> ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| ||||
| >> 2) A modified command syntax for introducing a direct single row error >> handling. By direct I mean - a row that if rejected from within the COPY >> command context does not throw an error and rollsback the whole transaction. >> Instead the error is caught and recorded elsewhere, maybe in some error >> table, with some more information that can later on be retrieved. The >> following rows continue to be processed. This way there is barely any error >> handling overhead. > > Is there any idea on exactly how would this be done? Do you plan on > using savepoints to implement it? I fail to see how is this "barely any > overhead". Savepoints are not that expensive but they are not free either. > (No, I haven't measured it.) Good question, I am not entirely sure if this is possible yet, as I didn't think it through entirely yet. I guess data errors could be divided into 2 main categories: mal-formed data where error is detected even before forming a tuple, and the other is errors that are caused by some constraint violation, that is, after the tuple is formed and inserted. From what I hear and experience the big majority of errors are of the first type. In that case the error could be caught, the data line + line number + error description could be inserted into an ERROR table (all TEXT fields), and then COPY can skip forming a tuple, and move to parsing the next. In this process there is barely any overhead. The more difficult part obviously is handling the second error type, which I haven't looked at yet deeply. Hopefully it is not impossible to do while keeping transaction integrity (Any ideas anyone?). The overhead for this one will probably be larger, but again, we expect those to happen less (in most cases at least). Nevertheless, it is surely much faster than recursively narrowing down batch sizes. Alon. ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |