This is a discussion on Differences in Range Scan Performance between Binary and NLS Indexes within the Oracle Database forums, part of the Database Server Software category; --> I have a table set with about 340k rows in it. User's execute queries against this table for arbitrary ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a table set with about 340k rows in it. User's execute queries against this table for arbitrary search strings which are resolved against the name table in the column. The table contains mixed case, but the users expect to match their search regardless of case. So, for example, both of these should match a search term of "badg": ARabidBadger somebadglowercase A typical search, for all those entries whose name contains "badg" would look like this: SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0 To make this query efficient, I added a covering index create index cmdb_ci_lower_name_sys_id on cmdb_ci (lower("name"), "sys_id"); And our query plan looks like this and completes in about 120 ms. This was worst case since only 1 row was returned. Short search strings like "b" will hit their stop key quickly and exit after finding 250 matches without having to exhaust the index. Still, worst case, this query does a full index scan in about 120ms; SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0; Elapsed: 00:00:00.12 Execution Plan ---------------------------------------------------------- Plan hash value: 424525705 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 250 | 11750 | 46 (0)| 00:00:01 | |* 1 | VIEW | | 250 | 11750 | 46 (0)| 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | | 3 | VIEW | | 251 | 8534 | 46 (0)| 00:00:01 | |* 4 | INDEX FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 | 813K| 46 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNUM">0) 2 - filter(ROWNUM<=250) 4 - filter(LOWER("name") LIKE '%badge%') Statistics ---------------------------------------------------------- 141 recursive calls 0 db block gets 2964 consistent gets 0 physical reads 0 redo size 439 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Recently though, we started storing NLS characters in this column (european customers), so we'll have values like: badger bdger <-- note the accented a The users want both of the above to collate next to each other. So we, not unnaturally, put the database in linguisitc mode: alter session set nls_sort=german; alter session set nls_comp=linguistic; Then we issue the same query. The query plan here involves a sort, but the query returns a grand total of 1 record, so the sort isn't material to the response time. Note that the same query, running the same index full scan against the same index, now takes 1.23 second, roughly 10X as long as it took when running in binary mode. SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0; Elapsed: 00:00:01.23 Execution Plan ---------------------------------------------------------- Plan hash value: 3626452865 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 250 | 11750 | | 870 (3)| 00:00:11 | |* 1 | VIEW | | 250 | 11750 | | 870 (3)| 00:00:11 | |* 2 | COUNT STOPKEY | | | | | | | | 3 | VIEW | | 16997 | 564K| | 870 (3)| 00:00:11 | |* 4 | SORT ORDER BY STOPKEY| | 16997 | 813K| 2008K| 870 (3)| 00:00:11 | |* 5 | INDEX FAST FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 | 813K| | 659 (3)| 00:00:08 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNUM">0) 2 - filter(ROWNUM<=250) 4 - filter(ROWNUM<=250) 5 - filter(LOWER("name") LIKE '%badge%') Statistics ---------------------------------------------------------- 134 recursive calls 0 db block gets 2978 consistent gets 2929 physical reads 0 redo size 439 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed If I take off the order by clause so we use exactly the same plan as our original query, we still get: SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ) a where ROWNUM <= 250) WHERE rnum > 0; Elapsed: 00:00:01.20 Execution Plan ---------------------------------------------------------- Plan hash value: 1941712263 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 250 | 11750 | 11 (0)| 00:00:01 | |* 1 | VIEW | | 250 | 11750 | 11 (0)| 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | |* 3 | INDEX FAST FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 251 | 12299 | 11 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNUM">0) 2 - filter(ROWNUM<=250) 3 - filter(LOWER("name") LIKE '%badge%') Statistics ---------------------------------------------------------- 134 recursive calls 0 db block gets 2979 consistent gets 0 physical reads 0 redo size 439 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed So my questions are is: 1) Am I doing something wrong here? Why does putting the system into NLS_SORT and NLS_COMP make this big a difference on performance? 2) Is there an index I should add here that would allow some form of efficient querying across this set? I know I'm going to have to do a full scan here (contains queries being what they are), but why is my full scan 10X slower in linguistic mode? 3) Is there some other recommended approach to getting linguistic collation working efficiently? Shadow columns aren't really practical here (and even then I'm not sure what I'd put in the shadow unless I wrote my own german to english unaccenter). Any help would be much appreciated. |
| |||
| On May 6, 1:16*pm, Pat <pat.ca...@service-now.com> wrote: > I have a table set with about 340k rows in it. User's execute queries > against this table for arbitrary search strings which are resolved > against the name table in the column. The table contains mixed case, > but the users expect to match their search regardless of case. So, for > example, both of these should match a search term of "badg": > > ARabidBadger > somebadglowercase > > A typical search, for all those entries whose name contains "badg" > would look like this: > > SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum > FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 *WHERE > lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) > a where ROWNUM <= 250) WHERE rnum > 0 > > To make this query efficient, I added a covering index > > create index cmdb_ci_lower_name_sys_id on cmdb_ci (lower("name"), > "sys_id"); > > And our query plan looks like this and completes in about 120 ms. This > was worst case since only 1 row was returned. Short search strings > like "b" will hit their stop key quickly and exit after finding 250 > matches without having to exhaust the index. > > Still, worst case, this query does a full index scan in about 120ms; > > SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM > rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 *WHERE > lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) > a where ROWNUM <= 250) WHERE rnum > 0; > > Elapsed: 00:00:00.12 > > Execution Plan > ---------------------------------------------------------- > Plan hash value: 424525705 > > ---------------------------------------------------------------------------*--------------------- > | Id *| Operation * * * * *| Name * * * * * * * * * * *| Rows *| Bytes | Cost (%CPU)| > Time * * | > ---------------------------------------------------------------------------*--------------------- > | * 0 | SELECT STATEMENT * | * * * * * * * * * * * * * | * 250 | 11750 | * *46 * (0)| > 00:00:01 | > |* *1 | *VIEW * * * * * * *| * * * * * * ** * * * * * | * 250 | 11750 | * *46 * (0)| 00:00:01 | > |* *2 | * COUNT STOPKEY * *| * * * * * * * * * * * * * | * * * | * * * | * * * * * *| * * * * *| > | * 3 | * *VIEW * * * * * *| * * * * * * ** * * * * * | * 251 | *8534 | * *46 * (0)| 00:00:01 > | > |* *4 | * * INDEX FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 | > 813K| * *46 * * *(0)| 00:00:01 | > ---------------------------------------------------------------------------*--------------------- > > Predicate Information (identified by operation id): > --------------------------------------------------- > > * *1 - filter("RNUM">0) > * *2 - filter(ROWNUM<=250) > * *4 - filter(LOWER("name") LIKE '%badge%') > > Statistics > ---------------------------------------------------------- > * * * * 141 *recursive calls > * * * * * 0 *db block gets > * * * *2964 *consistent gets > * * * * * 0 *physical reads > * * * * * 0 *redo size > * * * * 439 *bytes sent via SQL*Net to client > * * * * 400 *bytes received via SQL*Net from client > * * * * * 2 *SQL*Net roundtrips to/from client > * * * * * 0 *sorts (memory) > * * * * * 0 *sorts (disk) > * * * * * 1 *rows processed > > Recently though, we started storing NLS characters in this column > (european customers), so we'll have values like: > > badger > bdger <-- note the accented a > > The users want both of the above to collate next to each other. > > So we, not unnaturally, put the database in linguisitc mode: > > alter session set nls_sort=german; > alter session set nls_comp=linguistic; > > Then we issue the same query. The query plan here involves a sort, but > the query returns a grand total of 1 record, so the sort isn't > material to the response time. ... > 1) Am I doing something wrong here? Why does putting the system into > NLS_SORT and NLS_COMP make this big a difference on performance? > 2) Is there an index I should add here that would allow some form of > efficient querying across this set? I know I'm going to have to do a > full scan here (contains queries being what they are), but why is my > full scan 10X slower in linguistic mode? > 3) Is there some other recommended approach to getting linguistic > collation working efficiently? Shadow columns aren't really practical > here (and even then I'm not sure what I'd put in the shadow unless I > wrote my own german to english unaccenter). > > Any help would be much appreciated. See metalink Note:30779.1 "Setting NLS_SORT to anything other than BINARY causes a sort to use a full table scan, regardless of the path chosen by the optimizer. BINARY is the exception because indexes are built according to a binary order of keys. Thus the optimizer can use an index to satisfy the ORDER BY clause when NLS_SORT is set to BINARY. If NLS_SORT is set to any linguistic sort, the optimizer must include a full table scan and a full sort in the execution plan. " Things may be different in different versions, I wouldn't know. jg -- @home.com is bogus. http://forums.oracle.com/forums/thre...07441� |
| |||
| On May 6, 2:26 pm, joel garry <joel-ga...@home.com> wrote: > On May 6, 1:16 pm, Pat <pat.ca...@service-now.com> wrote: > > > I have a table set with about 340k rows in it. User's execute queries > > against this table for arbitrary search strings which are resolved > > against the name table in the column. The table contains mixed case, > > but the users expect to match their search regardless of case. So, for > > example, both of these should match a search term of "badg": > > > ARabidBadger > > somebadglowercase > > > A typical search, for all those entries whose name contains "badg" > > would look like this: > > > SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum > > FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE > > lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) > > a where ROWNUM <= 250) WHERE rnum > 0 > > > To make this query efficient, I added a covering index > > > create index cmdb_ci_lower_name_sys_id on cmdb_ci (lower("name"), > > "sys_id"); > > > And our query plan looks like this and completes in about 120 ms. This > > was worst case since only 1 row was returned. Short search strings > > like "b" will hit their stop key quickly and exit after finding 250 > > matches without having to exhaust the index. > > > Still, worst case, this query does a full index scan in about 120ms; > > > SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM > > rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE > > lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) > > a where ROWNUM <= 250) WHERE rnum > 0; > > > Elapsed: 00:00:00.12 > > > Execution Plan > > ---------------------------------------------------------- > > Plan hash value: 424525705 > > > ---------------------------------------------------------------------------*--------------------- > > | Id | Operation | Name | Rows | Bytes |Cost (%CPU)| > > Time | > > ---------------------------------------------------------------------------*--------------------- > > | 0 | SELECT STATEMENT | | 250 | 11750 |46 (0)| > > 00:00:01 | > > |* 1 | VIEW | | 250 | 11750 |46 (0)| 00:00:01 | > > |* 2 | COUNT STOPKEY | | | || | > > | 3 | VIEW | | 251 | 8534 |46 (0)| 00:00:01 > > | > > |* 4 | INDEX FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 | > > 813K| 46 (0)| 00:00:01 | > > ---------------------------------------------------------------------------*--------------------- > > > Predicate Information (identified by operation id): > > --------------------------------------------------- > > > 1 - filter("RNUM">0) > > 2 - filter(ROWNUM<=250) > > 4 - filter(LOWER("name") LIKE '%badge%') > > > Statistics > > ---------------------------------------------------------- > > 141 recursive calls > > 0 db block gets > > 2964 consistent gets > > 0 physical reads > > 0 redo size > > 439 bytes sent via SQL*Net to client > > 400 bytes received via SQL*Net from client > > 2 SQL*Net roundtrips to/from client > > 0 sorts (memory) > > 0 sorts (disk) > > 1 rows processed > > > Recently though, we started storing NLS characters in this column > > (european customers), so we'll have values like: > > > badger > > bädger <-- note the accented a > > > The users want both of the above to collate next to each other. > > > So we, not unnaturally, put the database in linguisitc mode: > > > alter session set nls_sort=german; > > alter session set nls_comp=linguistic; > > > Then we issue the same query. The query plan here involves a sort, but > > the query returns a grand total of 1 record, so the sort isn't > > material to the response time. > > ... > > > 1) Am I doing something wrong here? Why does putting the system into > > NLS_SORT and NLS_COMP make this big a difference on performance? > > 2) Is there an index I should add here that would allow some form of > > efficient querying across this set? I know I'm going to have to do a > > full scan here (contains queries being what they are), but why is my > > full scan 10X slower in linguistic mode? > > 3) Is there some other recommended approach to getting linguistic > > collation working efficiently? Shadow columns aren't really practical > > here (and even then I'm not sure what I'd put in the shadow unless I > > wrote my own german to english unaccenter). > > > Any help would be much appreciated. > > See metalink Note:30779.1 > > "Setting NLS_SORT to anything other than BINARY causes a sort to use a > full table scan, regardless of the path chosen by the optimizer. > BINARY is the exception because indexes are built according to a > binary order of keys. Thus the optimizer can use an index to satisfy > the ORDER BY clause when NLS_SORT is set to BINARY. If NLS_SORT is set > to any linguistic sort, the optimizer must include a full table scan > and a full sort in the execution plan. " > > Things may be different in different versions, I wouldn't know. > > jg > -- > @home.com is bogus.http://forums.oracle.com/forums/thre...ID=2507441? Even though my query plan says INDEX FAST FULL SCAN he's actually doing TABLE ACCESS FULL? SQL> select "name" from cmdb_ci where "sys_id" like '%abc%' order by "name"; 1530 rows selected. Elapsed: 00:00:00.20 Execution Plan ---------------------------------------------------------- Plan hash value: 502630801 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 16997 | 813K| | 3190 (3)| 00:00:39 | | 1 | SORT ORDER BY | | 16997 | 813K| 2008K| 3190 (3)| 00:00:39 | |* 2 | TABLE ACCESS FULL| CMDB_CI | 16997 | 813K| | 2979 (3)| 00:00:36 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("sys_id" LIKE '%abc%') Statistics ---------------------------------------------------------- 148 recursive calls 0 db block gets 13165 consistent gets 0 physical reads 0 redo size 40810 bytes sent via SQL*Net to client 1511 bytes received via SQL*Net from client 103 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1530 rows processed So in this case, a TABLE ACCESS FULL, followed by a sort in binary mode is 5x faster than an index full scan in linguistic mode? If the only problem was that linguistic forced him to table scan, I'd expect him to return data in 200ms or so (the time a table scan takes) as opposed to 1.3 seconds, neh? |
| |||
| On May 6, 2:36*pm, Pat <pat.ca...@service-now.com> wrote: > On May 6, 2:26 pm, joel garry <joel-ga...@home.com> wrote: > > > > > > > On May 6, 1:16 pm, Pat <pat.ca...@service-now.com> wrote: > > > > I have a table set with about 340k rows in it. User's execute queries > > > against this table for arbitrary search strings which are resolved > > > against the name table in the column. The table contains mixed case, > > > but the users expect to match their search regardless of case. So, for > > > example, both of these should match a search term of "badg": > > > > ARabidBadger > > > somebadglowercase > > > > A typical search, for all those entries whose name contains "badg" > > > would look like this: > > > > SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum > > > FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 *WHERE > > > lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) > > > a where ROWNUM <= 250) WHERE rnum > 0 > > > > To make this query efficient, I added a covering index > > > > create index cmdb_ci_lower_name_sys_id on cmdb_ci (lower("name"), > > > "sys_id"); > > > > And our query plan looks like this and completes in about 120 ms. This > > > was worst case since only 1 row was returned. Short search strings > > > like "b" will hit their stop key quickly and exit after finding 250 > > > matches without having to exhaust the index. > > > > Still, worst case, this query does a full index scan in about 120ms; > > > > SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM > > > rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 *WHERE > > > lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) > > > a where ROWNUM <= 250) WHERE rnum > 0; > > > > Elapsed: 00:00:00.12 > > > > Execution Plan > > > ---------------------------------------------------------- > > > Plan hash value: 424525705 > > > > ---------------------------------------------------------------------------**--------------------- > > > | Id *| Operation * * * * *| Name ** * * * * * * * * *| Rows*| Bytes | Cost (%CPU)| > > > Time * * | > > > ---------------------------------------------------------------------------**--------------------- > > > | * 0 | SELECT STATEMENT * | * * * * * * * * * * * * * | * 250 | 11750 | * *46 * (0)| > > > 00:00:01 | > > > |* *1 | *VIEW * * * * * * *| * * * * * * * * * * * * * | * 250 | 11750 | * *46 * (0)| 00:00:01 | > > > |* *2 | * COUNT STOPKEY * *| * * * * * * * * * * * * * | * * * | * * * | * * * * * *| * * * * *| > > > | * 3 | * *VIEW * * * * * *| * * * * * * * * * * * * * | * 251 | *8534 | * *46 * (0)|00:00:01 > > > | > > > |* *4 | * * INDEX FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 | > > > 813K| * *46 * * *(0)| 00:00:01 | > > > ---------------------------------------------------------------------------**--------------------- > > > > Predicate Information (identified by operation id): > > > --------------------------------------------------- > > > > * *1 - filter("RNUM">0) > > > * *2 - filter(ROWNUM<=250) > > > * *4 - filter(LOWER("name") LIKE '%badge%') > > > > Statistics > > > ---------------------------------------------------------- > > > * * * * 141 *recursive calls > > > * * * * * 0 *db block gets > > > * * * *2964 *consistent gets > > > * * * * * 0 *physical reads > > > * * * * * 0 *redo size > > > * * * * 439 *bytes sent via SQL*Net to client > > > * * * * 400 *bytes received via SQL*Net from client > > > * * * * * 2 *SQL*Net roundtrips to/from client > > > * * * * * 0 *sorts (memory) > > > * * * * * 0 *sorts (disk) > > > * * * * * 1 *rows processed > > > > Recently though, we started storing NLS characters in this column > > > (european customers), so we'll have values like: > > > > badger > > > bädger <-- note the accented a > > > > The users want both of the above to collate next to each other. > > > > So we, not unnaturally, put the database in linguisitc mode: > > > > alter session set nls_sort=german; > > > alter session set nls_comp=linguistic; > > > > Then we issue the same query. The query plan here involves a sort, but > > > the query returns a grand total of 1 record, so the sort isn't > > > material to the response time. > > > ... > > > > 1) Am I doing something wrong here? Why does putting the system into > > > NLS_SORT and NLS_COMP make this big a difference on performance? > > > 2) Is there an index I should add here that would allow some form of > > > efficient querying across this set? I know I'm going to have to do a > > > full scan here (contains queries being what they are), but why is my > > > full scan 10X slower in linguistic mode? > > > 3) Is there some other recommended approach to getting linguistic > > > collation working efficiently? Shadow columns aren't really practical > > > here (and even then I'm not sure what I'd put in the shadow unless I > > > wrote my own german to english unaccenter). > > > > Any help would be much appreciated. > > > See metalink Note:30779.1 > > > "Setting NLS_SORT to anything other than BINARY causes a sort to use a > > full table scan, regardless of the path chosen by the optimizer. > > BINARY is the exception because indexes are built according to a > > binary order of keys. Thus the optimizer can use an index to satisfy > > the ORDER BY clause when NLS_SORT is set to BINARY. If NLS_SORT is set > > to any linguistic sort, the optimizer must include a full table scan > > and a full sort in the execution plan. *" > > > Things may be different in different versions, I wouldn't know. > > > jg > > -- > > @home.com is bogus.http://forums.oracle.com/forums/thre...ID=2507441? > > Even though my query plan says INDEX FAST FULL SCAN he's actually > doing TABLE ACCESS FULL? I dunno, I think tracing would be called for to see for sure. The idea of lying optimizer plans that are obtained for things actually running is a bit of a thought-provoker for me. > SQL> select "name" from cmdb_ci where "sys_id" like '%abc%' order by > "name"; > > 1530 rows selected. > > Elapsed: 00:00:00.20 > > Execution Plan > ---------------------------------------------------------- > Plan hash value: 502630801 > > ---------------------------------------------------------------------------*----------- > | Id *| Operation * * * * *| Name * *| Rows *| Bytes |TempSpc| Cost (%CPU)| > Time * * | > ---------------------------------------------------------------------------*----------- > | * 0 | SELECT STATEMENT * | * * * * | 16997| * 813K| * * * | *3190 * (3)| > 00:00:39 | > | * 1 | *SORT ORDER BY * * | * * * * | 16997 | * 813K| *2008K| *3190 * (3)| > 00:00:39 | > |* *2 | * TABLE ACCESS FULL| CMDB_CI | 16997 | * 813K| * * * | *2979 > (3)| 00:00:36 | > ---------------------------------------------------------------------------*----------- > > Predicate Information (identified by operation id): > --------------------------------------------------- > > * *2 - filter("sys_id" LIKE '%abc%') > > Statistics > ---------------------------------------------------------- > * * * * 148 *recursive calls > * * * * * 0 *db block gets > * * * 13165 *consistent gets > * * * * * 0 *physical reads > * * * * * 0 *redo size > * * * 40810 *bytes sent via SQL*Net to client > * * * *1511 *bytes received via SQL*Net from client > * * * * 103 *SQL*Net roundtrips to/from client > * * * * * 1 *sorts (memory) > * * * * * 0 *sorts (disk) > * * * *1530 *rows processed > > So in this case, a TABLE ACCESS FULL, followed by a sort in binary > mode is 5x faster than an index full scan in linguistic mode? > > If the only problem was that linguistic forced him to table scan, I'd > expect him to return data in 200ms or so (the time a table scan takes) > as opposed to 1.3 seconds, neh?- Well, maybe something else is going on, like applying some semantics to every row... been too long since I worked with this stuff to do anything but guess. jg -- @home.com is bogus. Now not only do you drop your keys or phone in the toilet, it automatically flushes if you stand up to get them. |
| |||
| On May 7, 11:29 am, joel garry <joel-ga...@home.com> wrote: > > Well, maybe something else is going on, like applying some semantics > to every row... been too long since I worked with this stuff to do > anything but guess. > > jg > -- > @home.com is bogus. > Now not only do you drop your keys or phone in the toilet, it > automatically flushes if you stand up to get them. Thanks for the update, I was wondering the same thing e.g. if running in linguistic mode was dramatically increasing the cost of any index comparison. Almost smells like he's running a regex or something instead of some form of bitwise compare (pure specualation on my part). Main thing that's bugging me is that running in nls_mode, even with nls_indexes, seems to utterly torch my performance (order of magnitude slower). I've seen a number of other comments out of the web to the effect of "oh, yea, don't use nls mode, performance sucks". All the Oracle doc though seems to be saying "use nls mode, it's a wonderful feature that we're recommending". I can't seem to find any kind of official statement from Oracle as to whether I should *expect* bad performance like this in nls mode, or if this is a bug I'm hitting, or if I'm doing something wrong. At this point, I've tentatively decided that performance in nls mode is just plain bad and there's nothing I can do about it other than running the session in nls_comp=binary mode and using lower() on my various query terms to get the right results back (in us english at least). I'd love to be proven wrong here though. |
| ||||
| "Pat" <pat.casey@service-now.com> wrote in message news:b0cf7d61-3547-48e0-a6e4-f2cdbfcde83a@27g2000hsf.googlegroups.com... I have a table set with about 340k rows in it. User's execute queries against this table for arbitrary search strings which are resolved against the name table in the column. The table contains mixed case, but the users expect to match their search regardless of case. So, for example, both of these should match a search term of "badg": ARabidBadger somebadglowercase A typical search, for all those entries whose name contains "badg" would look like this: SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0 To make this query efficient, I added a covering index create index cmdb_ci_lower_name_sys_id on cmdb_ci (lower("name"), "sys_id"); And our query plan looks like this and completes in about 120 ms. This was worst case since only 1 row was returned. Short search strings like "b" will hit their stop key quickly and exit after finding 250 matches without having to exhaust the index. Still, worst case, this query does a full index scan in about 120ms; SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0; Elapsed: 00:00:00.12 Execution Plan ---------------------------------------------------------- Plan hash value: 424525705 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 250 | 11750 | 46 (0)| 00:00:01 | |* 1 | VIEW | | 250 | 11750 | 46 (0)| 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | | 3 | VIEW | | 251 | 8534 | 46 (0)| 00:00:01 | |* 4 | INDEX FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 | 813K| 46 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNUM">0) 2 - filter(ROWNUM<=250) 4 - filter(LOWER("name") LIKE '%badge%') Statistics ---------------------------------------------------------- 141 recursive calls 0 db block gets 2964 consistent gets 0 physical reads 0 redo size 439 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Recently though, we started storing NLS characters in this column (european customers), so we'll have values like: badger bdger <-- note the accented a The users want both of the above to collate next to each other. So we, not unnaturally, put the database in linguisitc mode: alter session set nls_sort=german; alter session set nls_comp=linguistic; Then we issue the same query. The query plan here involves a sort, but the query returns a grand total of 1 record, so the sort isn't material to the response time. Note that the same query, running the same index full scan against the same index, now takes 1.23 second, roughly 10X as long as it took when running in binary mode. SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ORDER BY lower(cmdb_ci0."name")) a where ROWNUM <= 250) WHERE rnum > 0; Elapsed: 00:00:01.23 Execution Plan ---------------------------------------------------------- Plan hash value: 3626452865 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 250 | 11750 | | 870 (3)| 00:00:11 | |* 1 | VIEW | | 250 | 11750 | | 870 (3)| 00:00:11 | |* 2 | COUNT STOPKEY | | | | | | | | 3 | VIEW | | 16997 | 564K| | 870 (3)| 00:00:11 | |* 4 | SORT ORDER BY STOPKEY| | 16997 | 813K| 2008K| 870 (3)| 00:00:11 | |* 5 | INDEX FAST FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 16997 | 813K| | 659 (3)| 00:00:08 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNUM">0) 2 - filter(ROWNUM<=250) 4 - filter(ROWNUM<=250) 5 - filter(LOWER("name") LIKE '%badge%') Statistics ---------------------------------------------------------- 134 recursive calls 0 db block gets 2978 consistent gets 2929 physical reads 0 redo size 439 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed If I take off the order by clause so we use exactly the same plan as our original query, we still get: SQL> SELECT "sys_id" FROM (SELECT /*+ FIRST_ROWS(250) */ a.*, ROWNUM rnum FROM (SELECT cmdb_ci0."sys_id" FROM cmdb_ci cmdb_ci0 WHERE lower(cmdb_ci0."name") LIKE '%badge%' ) a where ROWNUM <= 250) WHERE rnum > 0; Elapsed: 00:00:01.20 Execution Plan ---------------------------------------------------------- Plan hash value: 1941712263 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 250 | 11750 | 11 (0)| 00:00:01 | |* 1 | VIEW | | 250 | 11750 | 11 (0)| 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | |* 3 | INDEX FAST FULL SCAN| CMDB_CI_LOWER_NAME_SYS_ID | 251 | 12299 | 11 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNUM">0) 2 - filter(ROWNUM<=250) 3 - filter(LOWER("name") LIKE '%badge%') Statistics ---------------------------------------------------------- 134 recursive calls 0 db block gets 2979 consistent gets 0 physical reads 0 redo size 439 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed So my questions are is: 1) Am I doing something wrong here? Why does putting the system into NLS_SORT and NLS_COMP make this big a difference on performance? 2) Is there an index I should add here that would allow some form of efficient querying across this set? I know I'm going to have to do a full scan here (contains queries being what they are), but why is my full scan 10X slower in linguistic mode? 3) Is there some other recommended approach to getting linguistic collation working efficiently? Shadow columns aren't really practical here (and even then I'm not sure what I'd put in the shadow unless I wrote my own german to english unaccenter). Any help would be much appreciated. A couple of points. In starting case, you are doing an "INDEX FULL SCAN". Because the "order by" clause matches an index that can be used for the "like" - Oracle knows that if it walks the index in order then it can stop as soon as it has found enough data without sorting. In the second case, Oracle has to do the "table scan" - but in fact, since all the necessary data is available in an index, it can do an "index FAST full scan", which is the tablescan mechanism (multiblock read) applied to an index segment. Because of this, it has to read the whole data set, and then sort it to get the first 250. This gives you three possible reasons for the extra time: a) time for reading all the data b) time for sorting c) physical reads - a 'full scan' will buffer the index blocks, a 'fast full scan' may not, unless the index is a small one. This means you have to check your test conditions very carefully to see that you are comparing like with like, and also creating a test that will reflect the production behaviour closely. In the last case (without the order by), Oracle could stop the fast full scan early (after finding 250) rows, and doesn't have to sort. It also happens that you don't do any physical reads - so the index blocks were buffered for this test, when they weren't for the previous test. Note - your third test wasn' the same as your first - the third one is still doing the FAST full scan of the second test. -- Regards Jonathan Lewis http://jonathanlewis.wordpress.com Author: Cost Based Oracle: Fundamentals http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/faq/ind_faq.html |