Unix Technical Forum

NOLOGGING option, or ?

This is a discussion on NOLOGGING option, or ? within the pgsql Hackers forums, part of the PostgreSQL category; --> Bruce Momjian <pgman@candle.pha.pa.us> writes: > One idea would be to look at the table file size first. If it ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #41 (permalink)  
Old 04-11-2008, 05:11 AM
Tom Lane
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> One idea would be to look at the table file size first. If it has zero
> blocks, lock the table and if it still has zero blocks, do the no-WAL
> copy.


I think that's a bad idea. It would make the behavior unpredictable
--- sometimes a COPY will take an exclusive lock, and other times not;
and the reason why is at a lower semantic level than the user is
supposed to know about.

Before you say "this is not important", consider the nontrivial risk
that the stronger lock will cause a deadlock failure. I don't think
that it's acceptable for lock strength to be unpredictable.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #42 (permalink)  
Old 04-11-2008, 05:11 AM
Tom Lane
 
Posts: n/a
Default Re: NOLOGGING option, or ?

"Luke Lonergan" <llonergan@greenplum.com> writes:
> One of the reasons to consider a LOAD DATA command is that we can isolate
> the need for performance improvements and special syntax from the concerns
> of preserving the legacy behavior of COPY for use as the primary mechanism
> for DUMP and RESTORE.


.... and instead, define some new behavior that will soon be considered
broken legacy code itself?

There isn't any demand for changing the semantics of COPY, as far as
I've noticed. If we can make it faster with the same semantics that's
great, but I'm not in favor of inventing an alternate that does almost
the same thing but (eg) breaks backslash handling in the name of speed.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #43 (permalink)  
Old 04-11-2008, 05:11 AM
Simon Riggs
 
Posts: n/a
Default Re: NOLOGGING option, or ?

On Wed, 2005-06-01 at 10:35 -0700, Alon Goldshuv wrote:
> I have been working on improving the COPY command performance


> Around 40% for 15 column (mixed types) table.
> Around 90% for 1 column table.


Thats very cool.

> 2) A modified command syntax for introducing a direct single row error
> handling. By direct I mean - a row that if rejected from within the COPY
> command context does not throw an error and rollsback the whole transaction.
> Instead the error is caught and recorded elsewhere, maybe in some error
> table, with some more information that can later on be retrieved. The
> following rows continue to be processed. This way there is barely any error
> handling overhead. Having a recursive row isolation into smaller batches is
> extremely expensive for non-small data sets. It's not an option for serious
> users.


Can we call this the ERRORTABLE clause?

> 5) allow an ERRORLIMIT to allow control of aborting a load after a certain
> number of errors (and a pre-requisite for this is point number 2 above).


The default for which would be ERRORLIMIT 0 to give backwards
compatibility.

2) and 5) seem critical for combined usability & performance with real
world data.

I'm not clear from all of those options whether we still need a LOAD
command, based upon other issues/comments raised on this thread.

However, there are some other arguments for why it might be a good idea
to have a LOAD DATA command separate from COPY. Certainly long term
features would be easier to add with two commands. Trying to maintain
backwards compatibility just because we use COPY seems like an uphill
struggle and is going to mean we have to handle sensible new additions
as options so we don't break existing applications. The most important
one is the lock type held.

[Oracle compatibility isn't one of them, even if it did provide the
command name.]

But things will be clearer when we see the patch.

Best Regards, Simon Riggs



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #44 (permalink)  
Old 04-11-2008, 05:11 AM
Simon Riggs
 
Posts: n/a
Default Re: NOLOGGING option, or ?

On Wed, 2005-06-01 at 18:32 -0400, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > One idea would be to look at the table file size first. If it has zero
> > blocks, lock the table and if it still has zero blocks, do the no-WAL
> > copy.

>
> I think that's a bad idea. It would make the behavior unpredictable
> --- sometimes a COPY will take an exclusive lock, and other times not;
> and the reason why is at a lower semantic level than the user is
> supposed to know about.
>
> Before you say "this is not important", consider the nontrivial risk
> that the stronger lock will cause a deadlock failure. I don't think
> that it's acceptable for lock strength to be unpredictable.


While I agree with Bruce's sentiment, the locking is an issue with
COPY.

It is particularly important to be able to plan the concurrency within
an application. That is why DB2's interesting ability to perform lock
upgrades in a relatively unpredictable manner is not good.

For CREATE TABLE AS SELECT, I think we could do this Bruce's way and
win, since we would take the same grade of lock in both cases.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #45 (permalink)  
Old 04-11-2008, 05:11 AM
Bruce Momjian
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Luke Lonergan wrote:
> Bruce,
>
> > The problem with a new command is that it becomes unclear when you
> > should use COPY and when LOAD DATA, and it confuses users, and has
> > maintenance overhead. If Bizgres wants a new command name, go for it,
> > but it is unlikely that the community release is going to go in that
> > direction, unless there is a fundamental agreement that COPY is broken
> > and needs a major revamp, and I have heard no talk of that.

>
> The question of whether COPY should be improved or whether the changes
> should take the form of a new command is separate from the question of
> whether the performance of the load path in PostgreSQL needs improvement.
>
> The 90% performance increase (from 12 MB/s to 21 MB/s) that Alon reported
> comes from replacing the parsing logic within COPY. I believe that the
> parsing logic in COPY is fundamentally broken from a performance
> perspective, and may be broken from a functionality perspective WRT embedded
> backslashes.


COPY works as designed. The idea that some guy we have never heard of
is going to appear and rewrite COPY's processing and tell us that the
existing code is actually broken seems pretty arrogant to me. If it is
broken (meaning doesn't work as designed), please show us facts rather
than conjecture.

Oh, and the "Our COPY improvements are so fundamental that they deserve
a new command name" also has a similar flavor.

(Please explain how you handle literal delimiters and nulls with no
escape processing.)

> One of the reasons to consider a LOAD DATA command is that we can isolate
> the need for performance improvements and special syntax from the concerns
> of preserving the legacy behavior of COPY for use as the primary mechanism
> for DUMP and RESTORE.


This seems like a case where GreenPlum's priorities and the community's
priorities might not match. There is much more work required on your
part if you are going to convince the community it needs a new data
loading command, and starting out with the assumption in emails that it
is going to be a newly named command isn't the best approach. That is
my fundamental point.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #46 (permalink)  
Old 04-11-2008, 05:11 AM
Bruce Momjian
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > One idea would be to look at the table file size first. If it has zero
> > blocks, lock the table and if it still has zero blocks, do the no-WAL
> > copy.

>
> I think that's a bad idea. It would make the behavior unpredictable
> --- sometimes a COPY will take an exclusive lock, and other times not;
> and the reason why is at a lower semantic level than the user is
> supposed to know about.
>
> Before you say "this is not important", consider the nontrivial risk
> that the stronger lock will cause a deadlock failure. I don't think
> that it's acceptable for lock strength to be unpredictable.


Yea, but you are only doing the lock if the table is zero pages.
Doesn't that help? Maybe not.

I do like the LOCK keyword if we have to use one to enable this
functionality, but I am suspecting people will want this functionality
in pg_dump output. How do we do that? Just make it the default for
pg_dump output?

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #47 (permalink)  
Old 04-11-2008, 05:11 AM
Bruce Momjian
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Simon Riggs wrote:
> I'm not clear from all of those options whether we still need a LOAD
> command, based upon other issues/comments raised on this thread.
>
> However, there are some other arguments for why it might be a good idea
> to have a LOAD DATA command separate from COPY. Certainly long term
> features would be easier to add with two commands. Trying to maintain
> backwards compatibility just because we use COPY seems like an uphill
> struggle and is going to mean we have to handle sensible new additions
> as options so we don't break existing applications. The most important
> one is the lock type held.


Well, we have had a pretty much unmodified COPY format since like the
Berkeley days (I added \N and \.). Please tell us exactly what you want
do to that requires a format change, and we can talk about it, but
showing up with no proof and expecting a new command is the _wrong_
approach. It actually reminds me of the "our company developed it so it
must be great" approach, which doesn't work well in the community.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #48 (permalink)  
Old 04-11-2008, 05:11 AM
Joe Conway
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Simon Riggs wrote:
> On Wed, 2005-06-01 at 10:35 -0700, Alon Goldshuv wrote:
>>2) A modified command syntax for introducing a direct single row error
>>handling. By direct I mean - a row that if rejected from within the COPY


>>5) allow an ERRORLIMIT to allow control of aborting a load after a certain
>>number of errors (and a pre-requisite for this is point number 2 above).


> 2) and 5) seem critical for combined usability & performance with real
> world data.


I'll second that! This would be a huge win for one of my real world
applications, whether implemented as a new command, or as added
capability on top of COPY. The other performance enhancements would
certainly be nice to have also, but in my experience not nearly as
important as these two.

Joe

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #49 (permalink)  
Old 04-11-2008, 05:11 AM
Alon Goldshuv
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Bruce,

The patch is not there to show that something is "broken" is it there to
show how things could be done in another way, which may or may not be more
desireable.

> COPY works as designed. The idea that some guy we have never heard of
> is going to appear and rewrite COPY's processing and tell us that the
> existing code is actually broken seems pretty arrogant to me. If it is
> broken (meaning doesn't work as designed), please show us facts rather
> than conjecture.


I am sure that the code works as designed. In my previous email I was
referring to the fact that using COPY with a data field that happens to have
backslashes in it, and then querying this data field you will get different
results. For example do COPY of a field "c:\\one\ten\nine" and query for it.
The result will be much different. This is a problem with clickstream data
for example. That's all. It's very possible that there are reasons that I
missed for why things are the way they are.

> (Please explain how you handle literal delimiters and nulls with no
> escape processing.)

Escape processing is done only for these 2 cases.

Thx,
Alon.




---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #50 (permalink)  
Old 04-11-2008, 05:11 AM
Luke Lonergan
 
Posts: n/a
Default Re: NOLOGGING option, or ?

Tom,
>
> ... and instead, define some new behavior that will soon be considered
> broken legacy code itself?


I'll not argue further about whether to have a separate LOAD command.
That's not as important as fixing the performance issues in the data load
path in PostgreSQL to me.

However, I find it compelling that the underlying problem confronting high
performance data loading in the engine is the slow parse code in psql/copy,
etc., and not the WAL or other issues. Next on the list are huge gains from
removing things like repetitive calls to strlen(TZ) in the attribute
conversion code.

> There isn't any demand for changing the semantics of COPY, as far as
> I've noticed. If we can make it faster with the same semantics that's
> great, but I'm not in favor of inventing an alternate that does almost
> the same thing but (eg) breaks backslash handling in the name of speed.


A 540% increase in parsing speed while performing substantially the same
logic is what the patch accomplishes. The patch is now available on
pgsql-patches. In this case the speed did not come from breaking backslash
handling, but rather from using faster processing for escape processing and
delimiter parsing. We *could* recreate identical escape semantics to COPY
at nearly the same speed, but frankly we're puzzled as to why character
sequences with "\" automatically imply escape processing.

We've found that there are many cases where more sophisticated escape
processing options are necessary, including multi-byte delimiters and escape
sequences, and the default use of "\" for escaping breaks many data load
cases.

- Luke



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:34 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com