This is a discussion on Re: UTF8 or Unicode within the pgsql Hackers forums, part of the PostgreSQL category; --> Abhijit Menon-Sen wrote: > At 2005-02-14 21:14:54 -0500, pgman@candle.pha.pa.us wrote: > > > > Should our multi-byte encoding be ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Abhijit Menon-Sen wrote: > At 2005-02-14 21:14:54 -0500, pgman@candle.pha.pa.us wrote: > > > > Should our multi-byte encoding be referred to as UTF8 or Unicode? > > The *encoding* should certainly be referred to as UTF-8. Unicode is a > character set, not an encoding; Unicode characters may be encoded with > UTF-8, among other things. > > (One might think of a charset as being a set of integers representing > characters, and an encoding as specifying how those integers may be > converted to bytes.) > > > I know UTF8 is a type of unicode but do we need to rename anything > > from Unicode to UTF8? > > I don't know. I'll go through the documentation to see if I can find > anything that needs changing. I looked at encoding.sgml and that mentions Unicode, and then UTF8 as an acronym. I am wondering if we need to make UTF8 first and Unicode second. Does initdb accept UTF8 as an encoding? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, 2005-02-15 at 14:33 +0100, Peter Eisentraut wrote: > Am Dienstag, 15. Februar 2005 10:22 schrieb Karel Zak: > > in PG: unicode = utf8 = utf-8 > > > > Our internal routines in src/backend/utils/mb/encnames.c accept all > > synonyms. The "official" internal PG name for UTF-8 is "UNICODE" :-( > > I think in the SQL standard the official name is UTF8. If someone wants to > verify that this is the case and is exactly the encoding we offer (perhaps > modulo the 0x10000 issue), then it might make sense to change the canonical > form to UTF8. Yes, I think we should fix it and remove UNICODE and WIN encoding names from PG code. Karel -- Karel Zak <zakkr@zf.jcu.cz> ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Karel Zak wrote: > Yes, I think we should fix it and remove UNICODE and WIN encoding names > from PG code. The JDBC driver asks for a UNICODE client encoding before it knows the server version it is talking to. How do you avoid breaking this? -O ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| On Sat, 2005-02-19 at 00:27 +1300, Oliver Jowett wrote: > Karel Zak wrote: > > > Yes, I think we should fix it and remove UNICODE and WIN encoding names > > from PG code. > > The JDBC driver asks for a UNICODE client encoding before it knows the > server version it is talking to. How do you avoid breaking this? Fix JDBC driver as soon as possible. Add to 8.1 release notes: encoding names 'UNICODE' and 'WIN' are deprecated and it will removed in next release. Please, use correct names "UTF-8" and "WIN1215". 8.2: remove it. OK? Karel -- Karel Zak <zakkr@zf.jcu.cz> ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| > Add to 8.1 release notes: encoding names 'UNICODE' and 'WIN' are > deprecated and it will removed in next release. Please, use correct > names "UTF-8" and "WIN1215". > > 8.2: remove it. > > OK? Why on earth remove it? Just leave it in as an alias to UTF8 Chris ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Karel Zak wrote: > On Sat, 2005-02-19 at 00:27 +1300, Oliver Jowett wrote: > >>Karel Zak wrote: >> >> >>>Yes, I think we should fix it and remove UNICODE and WIN encoding names >>>from PG code. >> >>The JDBC driver asks for a UNICODE client encoding before it knows the >>server version it is talking to. How do you avoid breaking this? > > Fix JDBC driver as soon as possible. How, exactly? Ask for a 'utf8' client encoding instead of 'UNICODE'? Will this work if the driver is connecting to an older server? > Add to 8.1 release notes: encoding names 'UNICODE' and 'WIN' are > deprecated and it will removed in next release. Please, use correct > names "UTF-8" and "WIN1215". 8.0 appears to spell it 'utf8'. Removing the existing aliases seems like a fairly gratuitous incompatibility to introduce to me. -O ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |