View Single Post

   
  #4 (permalink)  
Old 02-28-2008, 08:39 AM
howa
 
Posts: n/a
Default Re: Should I specify the CHARACTER SET & COLLATE for UTF8 contents?


Axel Schwenke ¼g¹D¡G

> "howa" <howachen@gmail.com> wrote:
> > Since even I use, e.g.
> >
> > CREATE DATABASE `test_ascii` DEFAULT CHARACTER SET latin1 COLLATE
> > latin1_swedish_ci;
> >
> > CREATE TABLE `table` (
> > `str` TEXT CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL
> > ) ENGINE = innodb;
> >
> > I can store UTF8 in the DB, even better, I don't need to set names
> > "utf8" in each
> > connection, so it is recommended? Any drawbacks?

>
>
> Howa,
>
> I'm beginning to get tired of answering you the same question over
> and over again. SET NAMES foo is just shorthand for
>
> SET character_set_client = foo;
> SET character_set_results = foo;
> SET character_set_connection = foo;
>
> So if if you SET NAMES utf8 you tell the database "everything I send
> is utf8 encoded" and "please encode everything you give back to me
> in utf8".
>
> OTOH, if you declare a column to be latin1 encoded, MySQL does not
> care if you store latin1, latin5 or utf8 in there - for most
> operations. Of course sorting and comparing strings is affected by
> the collation, but INSERT and SELECT are not.
>
> By default, character_set_client is latin1. If you INSERT something
> in a latin1 column, no conversation takes place. If you send utf8,
> you will get back utf8 later. BUT - if you SET NAMES utf8 and then
> insert into a latin1 column, MySQL will convert your data to latin1.
> Of course this will only work for a certain subset of your input -
> those characters that are available in latin1.
>
>
> Lesson to be learned: always be true to your database about the
> encoding you use. If you don't, bad things may happen.
>
>
> XL
> --
> Axel Schwenke, Senior Software Developer, MySQL AB
>
> Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
> MySQL User Forums: http://forums.mysql.com/


ok, thanks for your support...

my last 2 questions:

1. if we designed some columns might need to store UTF8 character,
should we define the whole DB as UTF8, or just define UTF8 in the
column definition? Which one is recommeded? for performance, for
reliablity?

2. Suppose my DB is defined as latin, but a table is UTF8, how to
mysqlimport from a UTF8 text file to the UTF8 table? Currently, I found
NO WAY to do so unless I alter the DB character set to UTF8. Since
mysqlimport will simply ignore the table definition of UTF8, it will
just use the DB default character set, and it is confusing.

Thanks again!

Reply With Quote