Unix Technical Forum

Patch for collation using ICU

This is a discussion on Patch for collation using ICU within the pgsql Hackers forums, part of the PostgreSQL category; --> Palle Girgensohn wrote: > >> This is because in the standard postgres implementation, upper/lower is > >> done one ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 04-11-2008, 04:50 AM
Bruce Momjian
 
Posts: n/a
Default Re: Patch for collation using ICU

Palle Girgensohn wrote:
> >> This is because in the standard postgres implementation, upper/lower is
> >> done one character at the time. A proper upper/lower cannot do it that
> >> way. Other known example is in Turkish, where an ? (?) should look
> >> different whether it is an initial letter or not. This fails in
> >> standard postgresql for all platforms.

> >
> > Uh, where do you see that? Our code has:
> >
> > workspace = texttowcs(string);
> >
> > for (i = 0; workspace[i] != 0; i++)
> > workspace[i] = towupper(workspace[i]);

>
> as you see, the loop runs towupper for one character at the time. I cannot
> consider whether the letter is the initial, as required in Turkish, and it
> cannot really convert one character into two ('?' -> 'SS')


Oh, OK. I thought texttowcs() would expand the string to allow such
conversions.

> >> > We have depricated UNICODE in 8.1 in favor of UTF8 (no dash). Does
> >> > that help?
> >>
> >> I'm aware of that. It might help for unicode, but there are a bunch of
> >> other encodings. IANA has decided that utf-8 has *no* aliases, hence
> >> only utf-8 (with dash, but case insensitve) is accepted. Perhaps ICU is
> >> fogiving, I don't remember/know, but I think we need the mappings,
> >> unfortunately.

> >
> > OK. I guess I am just confused why the native implementations are OK.

>
> They're OK since they understand that UNICODE (or UTF8) is really utf-8.
> Problem is the strings used to describe them are not understood by ICU.
>
> BTW, the pg_enc2iananame_tbl is only used *from* internal representation
> *to* IANA, not the other way around. Maybe that fact lowers the rate of
> confusion? ;-)


OK, got it. I am still a little confused why every native
implementation understands our existing names but ICU does not.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:43 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com