Unix Technical Forum

Re: UTF8 or Unicode

This is a discussion on Re: UTF8 or Unicode within the pgsql Hackers forums, part of the PostgreSQL category; --> Abhijit Menon-Sen wrote: > At 2005-02-14 21:14:54 -0500, pgman@candle.pha.pa.us wrote: > > > > Should our multi-byte encoding be ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 03:43 AM
Bruce Momjian
 
Posts: n/a
Default Re: UTF8 or Unicode

Abhijit Menon-Sen wrote:
> At 2005-02-14 21:14:54 -0500, pgman@candle.pha.pa.us wrote:
> >
> > Should our multi-byte encoding be referred to as UTF8 or Unicode?

>
> The *encoding* should certainly be referred to as UTF-8. Unicode is a
> character set, not an encoding; Unicode characters may be encoded with
> UTF-8, among other things.
>
> (One might think of a charset as being a set of integers representing
> characters, and an encoding as specifying how those integers may be
> converted to bytes.)
>
> > I know UTF8 is a type of unicode but do we need to rename anything
> > from Unicode to UTF8?

>
> I don't know. I'll go through the documentation to see if I can find
> anything that needs changing.


I looked at encoding.sgml and that mentions Unicode, and then UTF8 as an
acronym. I am wondering if we need to make UTF8 first and Unicode
second. Does initdb accept UTF8 as an encoding?

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 03:46 AM
Karel Zak
 
Posts: n/a
Default Re: UTF8 or Unicode

On Tue, 2005-02-15 at 14:33 +0100, Peter Eisentraut wrote:
> Am Dienstag, 15. Februar 2005 10:22 schrieb Karel Zak:
> > in PG: unicode = utf8 = utf-8
> >
> > Our internal routines in src/backend/utils/mb/encnames.c accept all
> > synonyms. The "official" internal PG name for UTF-8 is "UNICODE" :-(

>
> I think in the SQL standard the official name is UTF8. If someone wants to
> verify that this is the case and is exactly the encoding we offer (perhaps
> modulo the 0x10000 issue), then it might make sense to change the canonical
> form to UTF8.


Yes, I think we should fix it and remove UNICODE and WIN encoding names
from PG code.

Karel

--
Karel Zak <zakkr@zf.jcu.cz>


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 03:46 AM
Oliver Jowett
 
Posts: n/a
Default Re: UTF8 or Unicode

Karel Zak wrote:

> Yes, I think we should fix it and remove UNICODE and WIN encoding names
> from PG code.


The JDBC driver asks for a UNICODE client encoding before it knows the
server version it is talking to. How do you avoid breaking this?

-O

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 03:46 AM
Karel Zak
 
Posts: n/a
Default Re: UTF8 or Unicode

On Sat, 2005-02-19 at 00:27 +1300, Oliver Jowett wrote:
> Karel Zak wrote:
>
> > Yes, I think we should fix it and remove UNICODE and WIN encoding names
> > from PG code.

>
> The JDBC driver asks for a UNICODE client encoding before it knows the
> server version it is talking to. How do you avoid breaking this?


Fix JDBC driver as soon as possible.

Add to 8.1 release notes: encoding names 'UNICODE' and 'WIN' are
deprecated and it will removed in next release. Please, use correct
names "UTF-8" and "WIN1215".

8.2: remove it.

OK?

Karel

--
Karel Zak <zakkr@zf.jcu.cz>


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-11-2008, 03:46 AM
Christopher Kings-Lynne
 
Posts: n/a
Default Re: UTF8 or Unicode

> Add to 8.1 release notes: encoding names 'UNICODE' and 'WIN' are
> deprecated and it will removed in next release. Please, use correct
> names "UTF-8" and "WIN1215".
>
> 8.2: remove it.
>
> OK?


Why on earth remove it? Just leave it in as an alias to UTF8

Chris

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-11-2008, 03:46 AM
Oliver Jowett
 
Posts: n/a
Default Re: UTF8 or Unicode

Karel Zak wrote:
> On Sat, 2005-02-19 at 00:27 +1300, Oliver Jowett wrote:
>
>>Karel Zak wrote:
>>
>>
>>>Yes, I think we should fix it and remove UNICODE and WIN encoding names
>>>from PG code.

>>
>>The JDBC driver asks for a UNICODE client encoding before it knows the
>>server version it is talking to. How do you avoid breaking this?

>
> Fix JDBC driver as soon as possible.


How, exactly? Ask for a 'utf8' client encoding instead of 'UNICODE'?
Will this work if the driver is connecting to an older server?

> Add to 8.1 release notes: encoding names 'UNICODE' and 'WIN' are
> deprecated and it will removed in next release. Please, use correct
> names "UTF-8" and "WIN1215".


8.0 appears to spell it 'utf8'.

Removing the existing aliases seems like a fairly gratuitous
incompatibility to introduce to me.

-O

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:21 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com