This is a discussion on Re: UTF8 or Unicode within the pgsql Hackers forums, part of the PostgreSQL category; --> -----Original Message----- From: pgsql-hackers-owner@postgresql.org on behalf of Oliver Jowett Sent: Fri 2/18/2005 11:27 AM To: Karel Zak Cc: List ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| -----Original Message----- From: pgsql-hackers-owner@postgresql.org on behalf of Oliver Jowett Sent: Fri 2/18/2005 11:27 AM To: Karel Zak Cc: List pgsql-hackers Subject: Re: [HACKERS] UTF8 or Unicode Karel Zak wrote: >> Yes, I think we should fix it and remove UNICODE and WIN encoding names >> from PG code. > > The JDBC driver asks for a UNICODE client encoding before it knows the > server version it is talking to. How do you avoid breaking this? So does pgAdmin. Regards, Dave ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| Dave Page wrote: > Karel Zak wrote: > > >> Yes, I think we should fix it and remove UNICODE and WIN encoding names > >> from PG code. > > > > The JDBC driver asks for a UNICODE client encoding before it knows the > > server version it is talking to. How do you avoid breaking this? > > So does pgAdmin. I think we just need to _favor_ UTF8. The question is where are we favoring Unicode rather than UTF8? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: > I think we just need to _favor_ UTF8. I agree. > The question is where are we > favoring Unicode rather than UTF8? It's the canonical name of the encoding, both in the code and the docs. regression=# create database e encoding 'utf-8'; CREATE DATABASE regression=# \l List of databases Name | Owner | Encoding ------------+----------+----------- e | postgres | UNICODE regression | postgres | SQL_ASCII template0 | postgres | SQL_ASCII template1 | postgres | SQL_ASCII (5 rows) As soon as we decide whether the canonical name is "UTF8" or "UTF-8" ;-) we can fix it. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I think we just need to _favor_ UTF8. > > I agree. > > > The question is where are we > > favoring Unicode rather than UTF8? > > It's the canonical name of the encoding, both in the code and the docs. > > regression=# create database e encoding 'utf-8'; > CREATE DATABASE > regression=# \l > List of databases > Name | Owner | Encoding > ------------+----------+----------- > e | postgres | UNICODE > regression | postgres | SQL_ASCII > template0 | postgres | SQL_ASCII > template1 | postgres | SQL_ASCII > (5 rows) > > As soon as we decide whether the canonical name is "UTF8" or "UTF-8" > ;-) we can fix it. I checked and it looks like "UTF-8" is the correct usage: http://www.unicode.org/glossary/ -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| |||
| I do not object the changing UNICODE->UTF-8, but all these discussions sound a little bit funny to me. If you want to blame UNICODE, you should blame LATIN1 etc. as well. LATIN1(ISO-8859-1) is actually a character set name, not an encoding name. ISO-8859-1 can be encoded in 8-bit single byte stream. But it can be encoded in 7-bit too. So when we refer to LATIN1(ISO-8859-1), it's not clear if it's encoded in 7/8-bit. -- Tatsuo Ishii From: Bruce Momjian <pgman@candle.pha.pa.us> Subject: Re: [HACKERS] UTF8 or Unicode Date: Mon, 21 Feb 2005 22:08:25 -0500 (EST) Message-ID: <200502220308.j1M38PV03238@candle.pha.pa.us> > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > I think we just need to _favor_ UTF8. > > > > I agree. > > > > > The question is where are we > > > favoring Unicode rather than UTF8? > > > > It's the canonical name of the encoding, both in the code and the docs. > > > > regression=# create database e encoding 'utf-8'; > > CREATE DATABASE > > regression=# \l > > List of databases > > Name | Owner | Encoding > > ------------+----------+----------- > > e | postgres | UNICODE > > regression | postgres | SQL_ASCII > > template0 | postgres | SQL_ASCII > > template1 | postgres | SQL_ASCII > > (5 rows) > > > > As soon as we decide whether the canonical name is "UTF8" or "UTF-8" > > ;-) we can fix it. > > I checked and it looks like "UTF-8" is the correct usage: > > http://www.unicode.org/glossary/ > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Tatsuo Ishii wrote: > I do not object the changing UNICODE->UTF-8, but all these discussions > sound a little bit funny to me. > > If you want to blame UNICODE, you should blame LATIN1 etc. as > well. LATIN1(ISO-8859-1) is actually a character set name, not an > encoding name. ISO-8859-1 can be encoded in 8-bit single byte > stream. But it can be encoded in 7-bit too. So when we refer to > LATIN1(ISO-8859-1), it's not clear if it's encoded in 7/8-bit. Wow, Tatsuo has a point here. Looking at encnames.c, I see: "UNICODE", PG_UTF8 but also: "WIN", PG_WIN1251 "LATIN1", PG_LATIN1 and I see conversions for those: "iso88591", PG_LATIN1 "win", PG_WIN1251 so I see what he is saying. We are not consistent in favoring the official names vs. the common names. I will work on a patch that people can review and test. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Bruce Momjian wrote: > We are not consistent in favoring the > official names vs. the common names. The problem is rather that there are too many standards and conventions to choose from. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Thu, 2005-02-24 at 23:51 -0500, Bruce Momjian wrote: > Tatsuo Ishii wrote: > > I do not object the changing UNICODE->UTF-8, but all these discussions > > sound a little bit funny to me. > > > > If you want to blame UNICODE, you should blame LATIN1 etc. as > > well. LATIN1(ISO-8859-1) is actually a character set name, not an > > encoding name. ISO-8859-1 can be encoded in 8-bit single byte > > stream. But it can be encoded in 7-bit too. So when we refer to > > LATIN1(ISO-8859-1), it's not clear if it's encoded in 7/8-bit. > > Wow, Tatsuo has a point here. Looking at encnames.c, I see: > > "UNICODE", PG_UTF8 > > but also: > > "WIN", PG_WIN1251 > "LATIN1", PG_LATIN1 > so I see what he is saying. We are not consistent in favoring the > official names vs. the common names. Yes. I said already. For example "WIN" is extremely bad alias. It all is heritage from old versions. > I will work on a patch that people can review and test. Thanks. Karel -- Karel Zak <zakkr@zf.jcu.cz> ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Am Freitag, 25. Februar 2005 05:51 schrieb Bruce Momjian: > so I see what he is saying. We are not consistent in favoring the > official names vs. the common names. > > I will work on a patch that people can review and test. I think this is what we should do: UNICODE => UTF8 ALT => WIN866 WIN => WIN1251 TCVN => WIN1258 That should clear it up. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| Peter Eisentraut wrote: > Am Freitag, 25. Februar 2005 05:51 schrieb Bruce Momjian: > > so I see what he is saying. We are not consistent in favoring the > > official names vs. the common names. > > > > I will work on a patch that people can review and test. > > I think this is what we should do: > > UNICODE => UTF8 > ALT => WIN866 > WIN => WIN1251 > TCVN => WIN1258 OK, but what about latin1? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| Thread Tools | |
| Display Modes | |
|
|