View Single Post

   
  #2 (permalink)  
Old 05-24-2008, 07:09 AM
Willem Bogaerts
 
Posts: n/a
Default Re: SUBSTRING and UTF-8

Matthias Watermann wrote:
> Hi,
>
> I ran into a problem I couldn't figure out. What I want seems easy enough:
> the first two characters of a string copied to another string where both
> the original string and the destination string should be utf-8 encoded.
> It has to happen in a "stored procedure" (which is used by several other
> SPs).
>
> I've stripped that down to the smallest possible test case:
> -----------------------------------------------------------------------
> DELIMITER $$
>
> DROP FUNCTION IF EXISTS firstTwo$$
> CREATE FUNCTION firstTwo($aNachname VARCHAR(63)) RETURNS VARCHAR(12)
> SQL SECURITY INVOKER
> BEGIN
> DECLARE _kurz VARCHAR(12) DEFAULT '';
>
> SELECT SUBSTRING($aNachname, 1, 2) INTO _kurz;
>
> RETURN _kurz;
> END$$
>
> DELIMITER ;
> -----------------------------------------------------------------------


I think the problem is that the encoding is not passed along with the
parameter, and that the connection encoding is latin1 by default. When I
query:
SELECT SUBSTRING('Müller', 1, 2)
I get the expected result of 'Mü' (and I use an utf-8 encoded
connection: if I put a HEX function around it, I see 3 bytes). So the
function works fine for utf-8 (MySQL version 5.0.51). You could put the
encoding back in by explicitly giving it:

SELECT SUBSTRING(_utf8 $aNachname, 1, 2) INTO _kurz;

This may not be an optimal solution, as it will treat any value in
$aNachname as utf-8 encoded. But if that is the case in your database,
it should work.

>
> Now, calling it by
> SELECT firstTwo('Müller');
> returns "MÃ" but not "Mü" (note the u-umlaut at position two).


Best regards,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
Reply With Quote