This is a discussion on Escape handling in COPY, strings, psql within the pgsql Hackers forums, part of the PostgreSQL category; --> Christopher Kings-Lynne wrote: > > I think we can tell people in 8.1 that they should modify their > ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Christopher Kings-Lynne wrote: > > I think we can tell people in 8.1 that they should modify their > > applications to only use '', and that \' might be a security problem in > > the future. If we get to that then using ESC or not only affects input > > of values and literal backslashes being entered, and my guess is that > > 90% of the backslash entries that want escaping are literal in the > > application and not supplied by program variables. In fact, if we > > disable backslash by default then strings coming in only have to deal > > with single quotes (like other databases) and the system is more secure > > because there is no special backslash handling by default. > > I can tell you right now this will be a problem > PHP ppl who use addslashes() instead of pg_escape_string() to escape data. I read the PHP addslashes() manual page: http://us3.php.net/addslashes First, I see what people mean about PHP having most of the complex content in comments, rather than in the actual manual text, and this tendency is certainly something we want to avoid --- you end up having to digest all the comments to find the details that should be in the manual already. On to the case at hand, the comments mention that addslashes() isn't safe for all databases, and in fact isn't the prefered method. I do think it could be a problem we have to have people avoid. One idea for 8.1 is to throw a warning if \' appears in a string, thereby helping people find the places they are using the incorrect non-standard escaping. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| > I read the PHP addslashes() manual page: > > http://us3.php.net/addslashes > > First, I see what people mean about PHP having most of the complex > content in comments, rather than in the actual manual text, and this > tendency is certainly something we want to avoid --- you end up having > to digest all the comments to find the details that should be in the > manual already. Actually, all the comments are posted on the php-doc list, with automatic urls in them for 'fixed in cvs', 'rejected', etc. Each comment is supposed to be acted upon (ie. fixed in source), then deleted. There's still a lot of old comment around that hasn't had that treatment though... Chris ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Christopher Kings-Lynne wrote: > > I read the PHP addslashes() manual page: > > > > http://us3.php.net/addslashes > > > > First, I see what people mean about PHP having most of the complex > > content in comments, rather than in the actual manual text, and this > > tendency is certainly something we want to avoid --- you end up having > > to digest all the comments to find the details that should be in the > > manual already. > > Actually, all the comments are posted on the php-doc list, with > automatic urls in them for 'fixed in cvs', 'rejected', etc. > > Each comment is supposed to be acted upon (ie. fixed in source), then > deleted. > > There's still a lot of old comment around that hasn't had that treatment > though... Right, they are more _usage_ comments, but still I think they could be consolidated into manual text. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| Peter Eisentraut wrote: > Bruce Momjian wrote: > > I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to > > be before the string so scan.l can alter its processing of the string > > --- after the string is too late --- there is no way to undo any > > escaping that has happened, and it might already be used by gram.y. > > That pretty much corresponds to my E'string' proposal. Both are > probably equally trivial to implement. Right. I think your E'' idea has the benefit of fitting with our existing X'' and B'' modifiers. It is also simpler and cleaner to do in scan.l, so I think your idea is best. > > I agree with Tom that COPY has to be left unchanged. The fundamental > > problem is the representation of NULL values, that I don't think we > > can do without some escape mechanism. Single-quote escapes works by > > doubling them, but once you need to represent something more like > > null's, I can't think of a solution without escapes. > > Yes, I now realize that COPY has a whole set of different rules anyway, > so we can leave that out of this discussion. Cool. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: > Christopher Kings-Lynne wrote: > > > > Each comment is supposed to be acted upon (ie. fixed in source), then > > deleted. > > Right, they are more _usage_ comments, but still I think they could be > consolidated into manual text. If that's "supposed" to happen it certainly hasn't been the de facto procedure. I think they have things partly right here though. A lot of those comments aren't actually the kinds of things that belong in the canonical reference. They include things like "watch out for this common error" or "here's a handy use for this function". Often the "common error" or "handy use" are pretty bogus but every now and then there's a genuinely useful one. These kinds of things would just clutter up a reference. A reference should just state unambiguously this function does XYZ and give examples that help explain XYZ. The PHP Docs do have a bit of a problem in that often the comments include things like "In case X, what happens is Y" which really ought to be covered by the canonical reference. That's a problem. -- greg ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Bruce Momjian wrote: > Peter Eisentraut wrote: > > Bruce Momjian wrote: > > > I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to > > > be before the string so scan.l can alter its processing of the string > > > --- after the string is too late --- there is no way to undo any > > > escaping that has happened, and it might already be used by gram.y. > > > > That pretty much corresponds to my E'string' proposal. Both are > > probably equally trivial to implement. > > Right. I think your E'' idea has the benefit of fitting with our > existing X'' and B'' modifiers. It is also simpler and cleaner to do in > scan.l, so I think your idea is best. [ CC list trimmed.] OK, I talked to Tom and Peter and I have come up with a tentative plan. The goal, at some point, is that we would have two types of strings, '' strings and E'' strings. '' strings don't have any special backslash handling for compatibility with with the ANSI spec and all other databases except MySQL (and in MySQL it is now optional). E'' strings behave just like our strings do now, with backslash handling. In 8.0.X, we add support for E'' strings, but it is a noop. This is done just for portability with future releases. We also state that users should stop using \' to escape quotes in strings, and instead use '', and that we will throw a warning in 8.1 if we see a \' in a non-E string. (We could probably throw a warning for E'' use of \' too, but I want to give users the ability to avoid the warning if they can't change from using \' to ''.) In 8.1, we start issuing the warning for \' in non-E strings, plus we tell users who want escape processing that they will have to use E'' strings for this starting in release 8.2, and they should start migrating their escaped strings over to E''. Tom also suggested a readonly GUC variable that is sent to clients that indicates if simple strings have backslash handling, for use by applications that are doing escapes themselves, perhaps 'escape_all_strings'. PQescapeString() and PQescapeBytea() can still be used, but only with E'' strings in 8.2. We could create PQquoteString() for 8.1 and later to allow for just single-quote doubling for non-E strings. Tom asked about how to handle pg_dump contents that have strings, like function bodies. We could start using E'' for those in 8.0 but it does break backward movement of dumps, and someone upgrading from 7.1 to 8.2 would be in trouble. :-( Perhaps we will have another round of subrelease fixes and we can bundle this into that and tell people they have to upgrade to the newest subrelease before going to 8.2. I think we have had that requirement in the past when we had broken pg_dump processing. The good news is that once everyone uses only '' to quote string, we will not have any data security issues with this change. The only potential problem is the mishandling of backslash characters if there is a mismatch between what the client expects and the server uses. By backpatching E'' perhaps even to 7.4 and earlier (as a noop), we could minimize this problem. Is this whole thing ugly? Yes. Can we just close our eyes and hope we can continue with our current behavior while growing a larger userbase --- probabably not. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: > The goal, at some point, is that we would have two types of strings, '' > strings and E'' strings. '' strings don't have any special backslash > handling for compatibility with with the ANSI spec and all other > databases except MySQL (and in MySQL it is now optional). E'' strings > behave just like our strings do now, with backslash handling. The only thing I'm not clear on is what exactly is the use case for E'' strings. That is, who do you expect to actually use them? Any new applications are recommended to be using '' strings. And any existing applications obviously won't be using them since they don't currently exist. The only potential candidates are existing applications being ported forward. And that only makes sense if they're currently using some function like addslash. Is it really easier to change all the SQL queries to use E'' (and still have a bug) than it is to replace addslash with PQquoteString() ? Also, I'm really confused why you would make PQescapeString require E'' strings and introduce a new function. That means existing non-buggy applications would suddenly be buggy? And it would be impossible to write a properly functioning application that interpolates a constant into a query that would be portable to 8.2 and 8.0? -- greg ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Greg Stark wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > The goal, at some point, is that we would have two types of strings, '' > > strings and E'' strings. '' strings don't have any special backslash > > handling for compatibility with with the ANSI spec and all other > > databases except MySQL (and in MySQL it is now optional). E'' strings > > behave just like our strings do now, with backslash handling. > > > The only thing I'm not clear on is what exactly is the use case for E'' > strings. That is, who do you expect to actually use them? > > Any new applications are recommended to be using '' strings. And any existing > applications obviously won't be using them since they don't currently exist. We are saying to use '' to escape single quotes in all strings. E'' is still useful if you want to use backslash escapes in your strings. Does that answer your questions? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| Greg Stark <gsstark@mit.edu> writes: > The only thing I'm not clear on is what exactly is the use case for E'' > strings. That is, who do you expect to actually use them? The case that convinced me we need to keep some sort of backslash capability is this: suppose you want to put a string including a tab into your database. Try to do it with psql: t=> insert into foo values ('<TAB> Guess what: you won't get anywhere, at least not unless you disable readline. So it's nice to be able to use \t. There are related issues involving \r and \n depending on your platform. And this doesn't even scratch the surface of encoding-related funnies. So there's definitely a use-case for keeping the existing backslash behavior, and E'string' seems like a reasonable proposal for doing that without conflicting with the SQL spec. What I do not see at the moment is how we get there from here (ie, dropping backslashing in regular literals) without incurring tremendous pain --- including breaking all existing pg_dump files, introducing security holes and/or data corruption into many existing apps that are not presently broken, and probably some other ways of ruining your day. I'm quite unconvinced that this particular letter of the SQL spec is worth complying with ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| ||||
| On Tue, 31 May 2005, Tom Lane wrote: > The case that convinced me we need to keep some sort of backslash > capability is this: suppose you want to put a string including a tab > into your database. Try to do it with psql: > t=> insert into foo values ('<TAB> > Guess what: you won't get anywhere, at least not unless you disable > readline. So it's nice to be able to use \t. To insert a tab using readline you can press ESC followed by TAB. This works as least in readline as it is setup in redhat/fedora (and readline can be setup in 1000 different ways so who knows how portable this is). -- /Dennis Björklund ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| Thread Tools | |
| Display Modes | |
|
|