This is a discussion on Three-byte Unicode characters within the pgsql Hackers forums, part of the PostgreSQL category; --> [ This email to hackers from last night got lost so I am remailing.] Tom Lane wrote: > "John ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| [ This email to hackers from last night got lost so I am remailing.] Tom Lane wrote: > "John Hansen" <john@geeknet.com.au> writes: > >> That is backpatched to 8.0.X. Does that not fix the problem reported? > > > No, as andrew said, what this patch does, is allow values > 0xffff and > > at the same time validates the input to make sure it's valid utf8. > > The impression I get is that most of the 'Unicode characters above > 0x10000' reports we've seen did not come from people who actually needed > more-than-16-bit Unicode codepoints, but from people who had screwed up > their encoding settings and were trying to tell the backend that Latin1 > was Unicode or some such. So I'm a bit worried that extending the > backend support to full 32-bit Unicode will do more to mask encoding > mistakes than it will do to create needed functionality. > > Not that I'm against adding the functionality. I'm just doubtful that > the reports we've seen really indicate that we need it, or that adding > it will cut down on the incidence of complaints :-( OK, I got on the IRC server and talked to folks who actually understand this. They say there are Chinese who are reporting this problem, so I Googled and found this: http://www.yale.edu/chinesemac/pages...g.html#Unicode See the paragraph with "Supplementary Ideographic Plane". You will see that paragraph says: The Supplementary Ideographic Plane (SIP) currently contains 42,711 additional characters in "CJK Unified Ideographs Extension B" (U+20000-2A6D6). The PDF chart for this is available at: http://www.unicode.org/charts/PDF/U20000.pdf I assume it is that U+20000-2A6D6 range that people are complaining about. So, we do have a bug, and we are probably going to need to fix it in 8.0.X. I apologize to people who reported this problem and I wasn't attentive to the seriousness of it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| Bruce Momjian wrote: > So, we do have a bug, and we are probably going to need to fix it in > 8.0.X. This has never worked in all the years we have had Unicode functionality, so I don't understand why we have to rush to fix it now. Certainly, it ought to be fixed, but not in a minor release. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| Peter Eisentraut <peter_e@gmx.net> writes: > Bruce Momjian wrote: >> So, we do have a bug, and we are probably going to need to fix it in >> 8.0.X. > This has never worked in all the years we have had Unicode > functionality, so I don't understand why we have to rush to fix it now. > Certainly, it ought to be fixed, but not in a minor release. The reasons why we rejected applying John's patch at the tail end of the 8.0 cycle are still valid: it is a new feature and there is nontrivial risk of introducing new bugs (more specifically, exposing bits of the system that aren't prepared for more-than-16-bit characters). I'm fine with changing it in the 8.1 cycle, but I think a back-patch would be folly. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| ||||
| On Sun, 10 Apr 2005, Peter Eisentraut wrote: > Bruce Momjian wrote: >> So, we do have a bug, and we are probably going to need to fix it in >> 8.0.X. > > This has never worked in all the years we have had Unicode > functionality, so I don't understand why we have to rush to fix it now. > Certainly, it ought to be fixed, but not in a minor release. Agreed ... this is extending an existing feature to include a broader charset, not fixing a but ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |