Unix Technical Forum

Unicode problems on IRC

This is a discussion on Unicode problems on IRC within the pgsql Hackers forums, part of the PostgreSQL category; --> Hey guys, The 'Unicode characters above 0x10000' issue keeps rearing its ugly head in the IRC channel. I propose ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 04:19 AM
Christopher Kings-Lynne
 
Posts: n/a
Default Unicode problems on IRC

Hey guys,

The 'Unicode characters above 0x10000' issue keeps rearing its ugly head
in the IRC channel. I propose that it be fixed, even backported...

This is John Hansen's most recent patch to fix it:

http://archives.postgresql.org/pgsql...1/msg00259.php

And from what I can tell it was committed, then reverted because it
wasn't a "bug". It was going to go in for 8.1.

We on the channel are starting to think that it is in fact a bug. There
are are people with legitimately utf-8 encoded XML documents that they
cannot store in PostgreSQL. Apparently in the distant past, Unicode was
limited to 0x10000, but then was extended.

Perhaps we can reopen this case...

Chris

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 04:22 AM
Bruce Momjian
 
Posts: n/a
Default Re: Unicode problems on IRC

Christopher Kings-Lynne wrote:
> Hey guys,
>
> The 'Unicode characters above 0x10000' issue keeps rearing its ugly head
> in the IRC channel. I propose that it be fixed, even backported...
>
> This is John Hansen's most recent patch to fix it:
>
> http://archives.postgresql.org/pgsql...1/msg00259.php
>
> And from what I can tell it was committed, then reverted because it
> wasn't a "bug". It was going to go in for 8.1.
>
> We on the channel are starting to think that it is in fact a bug. There
> are are people with legitimately utf-8 encoded XML documents that they
> cannot store in PostgreSQL. Apparently in the distant past, Unicode was
> limited to 0x10000, but then was extended.
>
> Perhaps we can reopen this case...


Uh, I thought we fixed this another way, buy not using Unicode-aware
functions for upper/lower/initcap when the locale is "C" or "POSIX".
That is backpatched to 8.0.X. Does that not fix the problem reported?

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 04:22 AM
Andrew - Supernews
 
Posts: n/a
Default Re: Unicode problems on IRC

On 2005-04-09, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> Uh, I thought we fixed this another way, buy not using Unicode-aware
> functions for upper/lower/initcap when the locale is "C" or "POSIX".
> That is backpatched to 8.0.X. Does that not fix the problem reported?


Unicode values over 0xFFFF are simply not accepted on input, so no, it
doesn't fix the problem. What do upper/lower/initcap have to do with it?

textin() unconditionally calls pg_verifymbstr, which in turn explicitly
checks for such values (if the encoding is UTF8) and throws ERROR if it
finds them.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:11 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com