This is a discussion on Unicode problems on IRC within the pgsql Hackers forums, part of the PostgreSQL category; --> Hey guys, The 'Unicode characters above 0x10000' issue keeps rearing its ugly head in the IRC channel. I propose ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hey guys, The 'Unicode characters above 0x10000' issue keeps rearing its ugly head in the IRC channel. I propose that it be fixed, even backported... This is John Hansen's most recent patch to fix it: http://archives.postgresql.org/pgsql...1/msg00259.php And from what I can tell it was committed, then reverted because it wasn't a "bug". It was going to go in for 8.1. We on the channel are starting to think that it is in fact a bug. There are are people with legitimately utf-8 encoded XML documents that they cannot store in PostgreSQL. Apparently in the distant past, Unicode was limited to 0x10000, but then was extended. Perhaps we can reopen this case... Chris ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Christopher Kings-Lynne wrote: > Hey guys, > > The 'Unicode characters above 0x10000' issue keeps rearing its ugly head > in the IRC channel. I propose that it be fixed, even backported... > > This is John Hansen's most recent patch to fix it: > > http://archives.postgresql.org/pgsql...1/msg00259.php > > And from what I can tell it was committed, then reverted because it > wasn't a "bug". It was going to go in for 8.1. > > We on the channel are starting to think that it is in fact a bug. There > are are people with legitimately utf-8 encoded XML documents that they > cannot store in PostgreSQL. Apparently in the distant past, Unicode was > limited to 0x10000, but then was extended. > > Perhaps we can reopen this case... Uh, I thought we fixed this another way, buy not using Unicode-aware functions for upper/lower/initcap when the locale is "C" or "POSIX". That is backpatched to 8.0.X. Does that not fix the problem reported? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| On 2005-04-09, Bruce Momjian <pgman@candle.pha.pa.us> wrote: > Uh, I thought we fixed this another way, buy not using Unicode-aware > functions for upper/lower/initcap when the locale is "C" or "POSIX". > That is backpatched to 8.0.X. Does that not fix the problem reported? Unicode values over 0xFFFF are simply not accepted on input, so no, it doesn't fix the problem. What do upper/lower/initcap have to do with it? textin() unconditionally calls pg_verifymbstr, which in turn explicitly checks for such values (if the encoding is UTF8) and throws ERROR if it finds them. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services |