On Wed, 16 Apr 2008, Jerry Schwartz <jschwartz@the-infoshop.com> wrote:
> I'm running afoul of the UTF8 character set somehow:
>
> mysql> select convert(char(0x96) using utf8);
> +----------------------------------+
> | convert(char(0x96) using utf8) |
> +----------------------------------+
> | NULL |
> +----------------------------------+
> 1 row in set, 1 warning (0.00 sec)
>
> mysql> show warnings;
> +-------+------+-------------------------------------+
> | Level | Code | Message |
> +-------+------+-------------------------------------+
> | Error | 1300 | Invalid utf8 character string: '96' |
> +-------+------+-------------------------------------+
> 1 row in set (0.00 sec)
>
> On top of my other problems, I've discovered that pasting the UTF8
> character represented by 0x96 into the MySQL CLI (Windows) somehow
> converts the character to 0x2D (a normal dash); so a lot of my
> testing has been wasted. Pasting it into a Windows-based editor
> preserves the character as 0x96.
In an earlier note, he wrote
> You may not be able to see it, but that is actually an n-dash
> (\x96).
Actually, \x96 is not an en-dash.
<http://www.unicode.org/charts/PDF/U0080.pdf> says that it's
"START OF GUARDED AREA". x96 is in the middle of a block of control
characters from the unnamed control character at \x80 through
APPLICATION PROGRAM COMMAND at \x9F (or arguably NO-BREAK SPACE at
\xa0).
Microsoft, in some of their Windows code pages, assigned meanings to
those values that differ from the Unicode and ISO-8859-1 standards
(quelle suprise), assigning many of them uses as printable characters.
I think it's the Windows 1250 code page, at
<http://www.microsoft.com/globaldev/reference/sbcs/1250.mspx>.
As that page and
<http://www.microsoft.com/typography/developers/fdsspec/punc2.htm>
note, the Unicode standard value for an en-dash is U+2013 (which
appears to be in hex).
As to whether this affects the problem I don't know. Since x96 is a
valid character, whether Microsoft or real Unicode, I would not expect
it to be a problem per se. I just wanted to point out what it might
not mean.
--
Tim McDaniel,
tmcd@panix.com