Unix Technical Forum

At my wits' end: LIKE

This is a discussion on At my wits' end: LIKE within the SQL Server forums, part of the Microsoft SQL Server category; --> (SQL Server 2005, express edition) I have a list of table names that I need to translate according to ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 03-01-2008, 02:52 PM
bbcworldtour@hotmail.com
 
Posts: n/a
Default At my wits' end: LIKE

(SQL Server 2005, express edition)

I have a list of table names that I need to translate according to a
naming convention. I'm doing this using pattern matching in a LIKE
clause.

In one specific case I get no match where I believe that there should
be one. I must be missing something obvious here, but what?

I have boiled it down to this example (the real one is more complex):

Matching on the first four characters I get a match:

select 'yes'
where 'TBAAA243_D_AFTBEL' like 'TBAA%';

----
yes

(1 row(s) affected)


That is fine, just as I would have expected. But if I try to match
only on the first 3 characters, I get this:

select 'yes'
where 'TBAAA243_D_AFTBEL' like 'TBA%';

(0 row(s) affected)

I have also tried the same on enterprise edition and get the same
strange result. Language is set to us_english

What am I missing here?

Any help appreciated, before I tear out the very last of my remaining
hair

Bo Brunsgaard

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 03-01-2008, 02:52 PM
Dan Guzman
 
Posts: n/a
Default Re: At my wits' end: LIKE

Have you applied any service packs? If not, try installing Express SP2
(http://www.microsoft.com/downloads/d...isplaylang=en).
I get the correct results on my SP2 Developer Edition instance:

select 'yes'
where 'TBAAA243_D_AFTBEL' like 'TBA%';

----
yes

(1 row(s) affected)



--
Hope this helps.

Dan Guzman
SQL Server MVP

<bbcworldtour@hotmail.com> wrote in message
news:1176718533.092593.280820@b75g2000hsg.googlegr oups.com...
> (SQL Server 2005, express edition)
>
> I have a list of table names that I need to translate according to a
> naming convention. I'm doing this using pattern matching in a LIKE
> clause.
>
> In one specific case I get no match where I believe that there should
> be one. I must be missing something obvious here, but what?
>
> I have boiled it down to this example (the real one is more complex):
>
> Matching on the first four characters I get a match:
>
> select 'yes'
> where 'TBAAA243_D_AFTBEL' like 'TBAA%';
>
> ----
> yes
>
> (1 row(s) affected)
>
>
> That is fine, just as I would have expected. But if I try to match
> only on the first 3 characters, I get this:
>
> select 'yes'
> where 'TBAAA243_D_AFTBEL' like 'TBA%';
>
> (0 row(s) affected)
>
> I have also tried the same on enterprise edition and get the same
> strange result. Language is set to us_english
>
> What am I missing here?
>
> Any help appreciated, before I tear out the very last of my remaining
> hair
>
> Bo Brunsgaard
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 03-01-2008, 02:54 PM
bbcworldtour@hotmail.com
 
Posts: n/a
Default Re: At my wits' end: LIKE

On 16 Apr., 13:46, "Dan Guzman" <guzma...@nospam-online.sbcglobal.net>
wrote:
> Have you applied any service packs? If not, try installing Express SP2
> (http://www.microsoft.com/downloads/d...11d5d-725....).
> I get the correct results on my SP2 Developer Edition instance:


I upgraded to SP2, but the problem persisted. It turns out that it is
hidden deep inside the finer points of the database collation. I
thought this was kind of interesting in a low-intense way, so here's
the story:

Our databases are running a collation of Danish_Norwegian_CS_AS (we
are a Danish company).

In Danish we have three special phonemes that are represented in
writing as the letters Æ, Ø and Å. These three letters are
alphabetically placed as the last three letters of the alphabet.

The last one turns out to the culprit (if it doesn't show up proper
imagine an upper-cased A with a small circle superimposed on it).

Using the letter Å for the phoneme [Å] is a fairly recent addition to
Danish (around 1950's). Traditionally it was written as "AA". For
instance, my surname can be written as either "Brunsgård" or
"Brunsgaard", but is still considered the same name.

So in Danish, "AA" can be either the traditional writing of the
phoneme [Å] OR just two "A"s which happen to be consecutive.

Danish_Norwegian_CS_AS collation recognizes "AA" as "Å". This is
usually real neat for sorting. Consider the lastnames "Ågård" and
"Aagaard" - these should be sorted together at the end of a list, and
using any Danish_Norwegian collation will ensure just that.

Consider:

create table taDanishDemo
(
nameInDanish varchar(30)
collate Danish_Norwegian_CS_AS

, nameInEnglish varchar(30)
collate Latin1_General_CS_AS
)
;

Let us insert a couple of rows which contain a case of consecutive
"A"s:

insert
into taDanishDemo (nameInDanish,nameInEnglish)
select 'TBAAA','TBAAA'
union all
select 'TBABA','TBABA'
;

Retrieving the rows ordered will now yield different results depending
on whether we order on the Danish or the Latin1 collated column:

select nameInEnglish
from taDanishDemo
order by nameInEnglish;

nameInEnglish
------------------------------
TBAAA
TBABA

Under Latin1 collation the "AA" is considered just two concecutive
"A"'s and ordered at the beginning of the list.
But, under Danish collation, the "AA" is considered the traditional
writing of [Å], and placed at the end of the list:

select nameInDanish
from taDanishDemo
order by nameInDanish;

nameInDanish
------------------------------
TBABA
TBAAA

So far, so good.

What threw me completely is that this also affect how the string "AA"
is interpreted by the LIKE operator.

select nameInDanish
from taDanishDemo
where nameInDanish like 'TBA%'

nameInDanish
------------------------------
TBABA

The row containing "TBAAA" isn't returned Trying to match "AA" with an
"A" plus a wildcard will yield no match under Danish collation, since
SQL Serve interprets this as trying to match "Å" with "A"!

But under Latin1 collation "AA" does match "A" and a wildcard, as "AA"
is just two "A"'s


select nameInEnglish
from taDanishDemo
where nameInEnglish like 'TBA%'

nameInEnglish
------------------------------
TBAAA
TBABA

I'm still not really sure whether this is a useful feature, an
unintended side effect or a bug :-)

Bo Brunsgaard




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 03-01-2008, 02:54 PM
Dan Guzman
 
Posts: n/a
Default Re: At my wits' end: LIKE

> Our databases are running a collation of Danish_Norwegian_CS_AS (we
> are a Danish company).


I'm glad you were able to identify the root cause. I briefly considered a
possible collation issue but didn't think that would explain your symptoms
since I didn't know that collation rules considered consecutive characters.
Thanks a lot for the detailed analysis.


--
Hope this helps.

Dan Guzman
SQL Server MVP

<bbcworldtour@hotmail.com> wrote in message
news:1176883896.875730.239770@n76g2000hsh.googlegr oups.com...
On 16 Apr., 13:46, "Dan Guzman" <guzma...@nospam-online.sbcglobal.net>
wrote:
> Have you applied any service packs? If not, try installing Express SP2
> (http://www.microsoft.com/downloads/d...711d5d-725...).
> I get the correct results on my SP2 Developer Edition instance:


I upgraded to SP2, but the problem persisted. It turns out that it is
hidden deep inside the finer points of the database collation. I
thought this was kind of interesting in a low-intense way, so here's
the story:

Our databases are running a collation of Danish_Norwegian_CS_AS (we
are a Danish company).

In Danish we have three special phonemes that are represented in
writing as the letters Æ, Ø and Å. These three letters are
alphabetically placed as the last three letters of the alphabet.

The last one turns out to the culprit (if it doesn't show up proper
imagine an upper-cased A with a small circle superimposed on it).

Using the letter Å for the phoneme [Å] is a fairly recent addition to
Danish (around 1950's). Traditionally it was written as "AA". For
instance, my surname can be written as either "Brunsgård" or
"Brunsgaard", but is still considered the same name.

So in Danish, "AA" can be either the traditional writing of the
phoneme [Å] OR just two "A"s which happen to be consecutive.

Danish_Norwegian_CS_AS collation recognizes "AA" as "Å". This is
usually real neat for sorting. Consider the lastnames "Ågård" and
"Aagaard" - these should be sorted together at the end of a list, and
using any Danish_Norwegian collation will ensure just that.

Consider:

create table taDanishDemo
(
nameInDanish varchar(30)
collate Danish_Norwegian_CS_AS

, nameInEnglish varchar(30)
collate Latin1_General_CS_AS
)
;

Let us insert a couple of rows which contain a case of consecutive
"A"s:

insert
into taDanishDemo (nameInDanish,nameInEnglish)
select 'TBAAA','TBAAA'
union all
select 'TBABA','TBABA'
;

Retrieving the rows ordered will now yield different results depending
on whether we order on the Danish or the Latin1 collated column:

select nameInEnglish
from taDanishDemo
order by nameInEnglish;

nameInEnglish
------------------------------
TBAAA
TBABA

Under Latin1 collation the "AA" is considered just two concecutive
"A"'s and ordered at the beginning of the list.
But, under Danish collation, the "AA" is considered the traditional
writing of [Å], and placed at the end of the list:

select nameInDanish
from taDanishDemo
order by nameInDanish;

nameInDanish
------------------------------
TBABA
TBAAA

So far, so good.

What threw me completely is that this also affect how the string "AA"
is interpreted by the LIKE operator.

select nameInDanish
from taDanishDemo
where nameInDanish like 'TBA%'

nameInDanish
------------------------------
TBABA

The row containing "TBAAA" isn't returned Trying to match "AA" with an
"A" plus a wildcard will yield no match under Danish collation, since
SQL Serve interprets this as trying to match "Å" with "A"!

But under Latin1 collation "AA" does match "A" and a wildcard, as "AA"
is just two "A"'s


select nameInEnglish
from taDanishDemo
where nameInEnglish like 'TBA%'

nameInEnglish
------------------------------
TBAAA
TBABA

I'm still not really sure whether this is a useful feature, an
unintended side effect or a bug :-)

Bo Brunsgaard




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:35 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com