This is a discussion on At my wits' end: LIKE within the SQL Server forums, part of the Microsoft SQL Server category; --> (SQL Server 2005, express edition) I have a list of table names that I need to translate according to ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| (SQL Server 2005, express edition) I have a list of table names that I need to translate according to a naming convention. I'm doing this using pattern matching in a LIKE clause. In one specific case I get no match where I believe that there should be one. I must be missing something obvious here, but what? I have boiled it down to this example (the real one is more complex): Matching on the first four characters I get a match: select 'yes' where 'TBAAA243_D_AFTBEL' like 'TBAA%'; ---- yes (1 row(s) affected) That is fine, just as I would have expected. But if I try to match only on the first 3 characters, I get this: select 'yes' where 'TBAAA243_D_AFTBEL' like 'TBA%'; (0 row(s) affected) I have also tried the same on enterprise edition and get the same strange result. Language is set to us_english What am I missing here? Any help appreciated, before I tear out the very last of my remaining hair Bo Brunsgaard |
| |||
| Have you applied any service packs? If not, try installing Express SP2 (http://www.microsoft.com/downloads/d...isplaylang=en). I get the correct results on my SP2 Developer Edition instance: select 'yes' where 'TBAAA243_D_AFTBEL' like 'TBA%'; ---- yes (1 row(s) affected) -- Hope this helps. Dan Guzman SQL Server MVP <bbcworldtour@hotmail.com> wrote in message news:1176718533.092593.280820@b75g2000hsg.googlegr oups.com... > (SQL Server 2005, express edition) > > I have a list of table names that I need to translate according to a > naming convention. I'm doing this using pattern matching in a LIKE > clause. > > In one specific case I get no match where I believe that there should > be one. I must be missing something obvious here, but what? > > I have boiled it down to this example (the real one is more complex): > > Matching on the first four characters I get a match: > > select 'yes' > where 'TBAAA243_D_AFTBEL' like 'TBAA%'; > > ---- > yes > > (1 row(s) affected) > > > That is fine, just as I would have expected. But if I try to match > only on the first 3 characters, I get this: > > select 'yes' > where 'TBAAA243_D_AFTBEL' like 'TBA%'; > > (0 row(s) affected) > > I have also tried the same on enterprise edition and get the same > strange result. Language is set to us_english > > What am I missing here? > > Any help appreciated, before I tear out the very last of my remaining > hair > > Bo Brunsgaard > |
| |||
| On 16 Apr., 13:46, "Dan Guzman" <guzma...@nospam-online.sbcglobal.net> wrote: > Have you applied any service packs? If not, try installing Express SP2 > (http://www.microsoft.com/downloads/d...11d5d-725....). > I get the correct results on my SP2 Developer Edition instance: I upgraded to SP2, but the problem persisted. It turns out that it is hidden deep inside the finer points of the database collation. I thought this was kind of interesting in a low-intense way, so here's the story: Our databases are running a collation of Danish_Norwegian_CS_AS (we are a Danish company). In Danish we have three special phonemes that are represented in writing as the letters Æ, Ø and Å. These three letters are alphabetically placed as the last three letters of the alphabet. The last one turns out to the culprit (if it doesn't show up proper imagine an upper-cased A with a small circle superimposed on it). Using the letter Å for the phoneme [Å] is a fairly recent addition to Danish (around 1950's). Traditionally it was written as "AA". For instance, my surname can be written as either "Brunsgård" or "Brunsgaard", but is still considered the same name. So in Danish, "AA" can be either the traditional writing of the phoneme [Å] OR just two "A"s which happen to be consecutive. Danish_Norwegian_CS_AS collation recognizes "AA" as "Å". This is usually real neat for sorting. Consider the lastnames "Ågård" and "Aagaard" - these should be sorted together at the end of a list, and using any Danish_Norwegian collation will ensure just that. Consider: create table taDanishDemo ( nameInDanish varchar(30) collate Danish_Norwegian_CS_AS , nameInEnglish varchar(30) collate Latin1_General_CS_AS ) ; Let us insert a couple of rows which contain a case of consecutive "A"s: insert into taDanishDemo (nameInDanish,nameInEnglish) select 'TBAAA','TBAAA' union all select 'TBABA','TBABA' ; Retrieving the rows ordered will now yield different results depending on whether we order on the Danish or the Latin1 collated column: select nameInEnglish from taDanishDemo order by nameInEnglish; nameInEnglish ------------------------------ TBAAA TBABA Under Latin1 collation the "AA" is considered just two concecutive "A"'s and ordered at the beginning of the list. But, under Danish collation, the "AA" is considered the traditional writing of [Å], and placed at the end of the list: select nameInDanish from taDanishDemo order by nameInDanish; nameInDanish ------------------------------ TBABA TBAAA So far, so good. What threw me completely is that this also affect how the string "AA" is interpreted by the LIKE operator. select nameInDanish from taDanishDemo where nameInDanish like 'TBA%' nameInDanish ------------------------------ TBABA The row containing "TBAAA" isn't returned Trying to match "AA" with an "A" plus a wildcard will yield no match under Danish collation, since SQL Serve interprets this as trying to match "Å" with "A"! But under Latin1 collation "AA" does match "A" and a wildcard, as "AA" is just two "A"'s select nameInEnglish from taDanishDemo where nameInEnglish like 'TBA%' nameInEnglish ------------------------------ TBAAA TBABA I'm still not really sure whether this is a useful feature, an unintended side effect or a bug :-) Bo Brunsgaard |
| ||||
| > Our databases are running a collation of Danish_Norwegian_CS_AS (we > are a Danish company). I'm glad you were able to identify the root cause. I briefly considered a possible collation issue but didn't think that would explain your symptoms since I didn't know that collation rules considered consecutive characters. Thanks a lot for the detailed analysis. -- Hope this helps. Dan Guzman SQL Server MVP <bbcworldtour@hotmail.com> wrote in message news:1176883896.875730.239770@n76g2000hsh.googlegr oups.com... On 16 Apr., 13:46, "Dan Guzman" <guzma...@nospam-online.sbcglobal.net> wrote: > Have you applied any service packs? If not, try installing Express SP2 > (http://www.microsoft.com/downloads/d...711d5d-725...). > I get the correct results on my SP2 Developer Edition instance: I upgraded to SP2, but the problem persisted. It turns out that it is hidden deep inside the finer points of the database collation. I thought this was kind of interesting in a low-intense way, so here's the story: Our databases are running a collation of Danish_Norwegian_CS_AS (we are a Danish company). In Danish we have three special phonemes that are represented in writing as the letters Æ, Ø and Å. These three letters are alphabetically placed as the last three letters of the alphabet. The last one turns out to the culprit (if it doesn't show up proper imagine an upper-cased A with a small circle superimposed on it). Using the letter Å for the phoneme [Å] is a fairly recent addition to Danish (around 1950's). Traditionally it was written as "AA". For instance, my surname can be written as either "Brunsgård" or "Brunsgaard", but is still considered the same name. So in Danish, "AA" can be either the traditional writing of the phoneme [Å] OR just two "A"s which happen to be consecutive. Danish_Norwegian_CS_AS collation recognizes "AA" as "Å". This is usually real neat for sorting. Consider the lastnames "Ågård" and "Aagaard" - these should be sorted together at the end of a list, and using any Danish_Norwegian collation will ensure just that. Consider: create table taDanishDemo ( nameInDanish varchar(30) collate Danish_Norwegian_CS_AS , nameInEnglish varchar(30) collate Latin1_General_CS_AS ) ; Let us insert a couple of rows which contain a case of consecutive "A"s: insert into taDanishDemo (nameInDanish,nameInEnglish) select 'TBAAA','TBAAA' union all select 'TBABA','TBABA' ; Retrieving the rows ordered will now yield different results depending on whether we order on the Danish or the Latin1 collated column: select nameInEnglish from taDanishDemo order by nameInEnglish; nameInEnglish ------------------------------ TBAAA TBABA Under Latin1 collation the "AA" is considered just two concecutive "A"'s and ordered at the beginning of the list. But, under Danish collation, the "AA" is considered the traditional writing of [Å], and placed at the end of the list: select nameInDanish from taDanishDemo order by nameInDanish; nameInDanish ------------------------------ TBABA TBAAA So far, so good. What threw me completely is that this also affect how the string "AA" is interpreted by the LIKE operator. select nameInDanish from taDanishDemo where nameInDanish like 'TBA%' nameInDanish ------------------------------ TBABA The row containing "TBAAA" isn't returned Trying to match "AA" with an "A" plus a wildcard will yield no match under Danish collation, since SQL Serve interprets this as trying to match "Å" with "A"! But under Latin1 collation "AA" does match "A" and a wildcard, as "AA" is just two "A"'s select nameInEnglish from taDanishDemo where nameInEnglish like 'TBA%' nameInEnglish ------------------------------ TBAAA TBABA I'm still not really sure whether this is a useful feature, an unintended side effect or a bug :-) Bo Brunsgaard |
| Thread Tools | |
| Display Modes | |
|
|