View Single Post

   
  #3 (permalink)  
Old 02-28-2008, 08:39 AM
Axel Schwenke
 
Posts: n/a
Default Re: Should I specify the CHARACTER SET & COLLATE for UTF8 contents?

"howa" <howachen@gmail.com> wrote:
> Since even I use, e.g.
>
> CREATE DATABASE `test_ascii` DEFAULT CHARACTER SET latin1 COLLATE
> latin1_swedish_ci;
>
> CREATE TABLE `table` (
> `str` TEXT CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL
> ) ENGINE = innodb;
>
> I can store UTF8 in the DB, even better, I don't need to set names
> "utf8" in each
> connection, so it is recommended? Any drawbacks?



Howa,

I'm beginning to get tired of answering you the same question over
and over again. SET NAMES foo is just shorthand for

SET character_set_client = foo;
SET character_set_results = foo;
SET character_set_connection = foo;

So if if you SET NAMES utf8 you tell the database "everything I send
is utf8 encoded" and "please encode everything you give back to me
in utf8".

OTOH, if you declare a column to be latin1 encoded, MySQL does not
care if you store latin1, latin5 or utf8 in there - for most
operations. Of course sorting and comparing strings is affected by
the collation, but INSERT and SELECT are not.

By default, character_set_client is latin1. If you INSERT something
in a latin1 column, no conversation takes place. If you send utf8,
you will get back utf8 later. BUT - if you SET NAMES utf8 and then
insert into a latin1 column, MySQL will convert your data to latin1.
Of course this will only work for a certain subset of your input -
those characters that are available in latin1.


Lesson to be learned: always be true to your database about the
encoding you use. If you don't, bad things may happen.


XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Reply With Quote