This is a discussion on simple sql using between startnumber and endnumber not performing within the Oracle Miscellaneous forums, part of the Oracle Database category; --> Hi all I have a simple sql which I can't make perform. In its simplest form it has two ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all I have a simple sql which I can't make perform. In its simplest form it has two cols, a startnumber and an endnumber. None of the rows overlap with another. I am using an sql which looks like this: select startnumber, endnumber from table where '12345678' between startnumber and endnumber I have also tried select startnumber, endnumber from table where '12345678' >= startnumber and '12345678' <= endnumber The latter sometimes performes a little better. The table has about 6 mill rows. I have tried with indexes on both rows, both with desc and asc as FBI in any combination that i can think of and also hinted the sql. The explain plan shows that it will use the index but the query takes around 6 to 9 seconds which is way to much in respect of the needs. Does any one of you have any suggestions ? Regards Benjamin |
| |||
| =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelberg@gmail.com) wrote: : Hi all : I have a simple sql which I can't make perform. : In its simplest form it has two cols, a startnumber and an endnumber. : None of the rows overlap with another. : I am using an sql which looks like this: : select startnumber, endnumber from table where '12345678' between : startnumber and endnumber : I have also tried : select startnumber, endnumber from table where '12345678' >= : startnumber and '12345678' <= endnumber : The latter sometimes performes a little better. : The table has about 6 mill rows. : I have tried with indexes on both rows, both with desc and asc as FBI : in any combination that i can think of and also hinted the sql. : The explain plan shows that it will use the index but the query takes : around 6 to 9 seconds which is way to much in respect of the needs. : Does any one of you have any suggestions ? create an index on four columns, where the first two columns contain truncations of the two numbers. indexed columns would be ( integer portion of startnumber/1000 integer portion of endnumber/1000 startnumber endnumber ) the 1000 is just an example, some other number is probably better you might have to query on the truncated number as well as the real number. WHERE integer portion of 12345678/1000 between integer portion of startnumber/1000 and integer portion of endnumber/1000 AND 12345678 between startnumber and endnumber $0.10 |
| |||
| On 27 Jun., 05:45, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote: > =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote: > > : Hi all > > : I have a simple sql which I can't make perform. > > : In its simplest form it has two cols, a startnumber and an endnumber. > : None of the rows overlap with another. > > : I am using an sql which looks like this: > > : select startnumber, endnumber from table where '12345678' between > : startnumber and endnumber > > : I have also tried > : select startnumber, endnumber from table where '12345678' >= > : startnumber and '12345678' <= endnumber > > : The latter sometimes performes a little better. > > : The table has about 6 mill rows. > : I have tried with indexes on both rows, both with desc and asc as FBI > : in any combination that i can think of and also hinted the sql. > > : The explain plan shows that it will use the index but the query takes > : around 6 to 9 seconds which is way to much in respect of the needs. > > : Does any one of you have any suggestions ? > > create an index on four columns, where the first two columns contain > truncations of the two numbers. > > indexed columns would be > ( integer portion of startnumber/1000 > integer portion of endnumber/1000 > startnumber > endnumber > ) > > the 1000 is just an example, some other number is probably better > > you might have to query on the truncated number as well as the real > number. > > WHERE integer portion of 12345678/1000 between > integer portion of startnumber/1000 and > integer portion of endnumber/1000 > AND 12345678 between startnumber and endnumber > > $0.10 Hello Malcolm, Thank you for your fast reply. I was thinking about something like this, not an index based but just the first digit as each number are all 8 digits. But your idea seems like a better way. One question: If there are more than 9999 between fromnumber and tonumber will this still work ? Say i select from 20000000 to 30000000. I belive not and in this case i should select using only startnumber and endnumber right ? Regards Benjamin |
| |||
| -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Benjamin Sølberg wrote: > I am using an sql which looks like this: > > select startnumber, endnumber from table where '12345678' between > startnumber and endnumber What is it - numbers or strings?!? If it's a number, loose the quotes. (If it's a string, I'd make you rename the columns in the design!) > > Does any one of you have any suggestions ? > Yes - -- Regards, Frank van Bortel Top-posting is one way to shut me up... -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (MingW32) iD8DBQFGgrFiLw8L4IAs830RAtkmAKCSRdO2Ab4yEwWHIN4FWL M4gTcUmgCfWlzb MU7zlqalHvZFFTStEVDV42o= =ZFK7 -----END PGP SIGNATURE----- |
| |||
| On 27 Jun., 20:50, Frank van Bortel <frank.van.bor...@gmail.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Benjamin Sølberg wrote: > > I am using an sql which looks like this: > > > select startnumber, endnumber from table where '12345678' between > > startnumber and endnumber > > What is it - numbers or strings?!? If it's a number, loose > the quotes. (If it's a string, I'd make you rename the > columns in the design!) > > > Does any one of you have any suggestions ? > > Yes > > - -- > Regards, > Frank van Bortel > > Top-posting is one way to shut me up... > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (MingW32) > > iD8DBQFGgrFiLw8L4IAs830RAtkmAKCSRdO2Ab4yEwWHIN4FWL M4gTcUmgCfWlzb > MU7zlqalHvZFFTStEVDV42o= > =ZFK7 > -----END PGP SIGNATURE----- Hi Frank Yes the numbers are infact Strings.. actually varchar2(8) I would very much like to change it but its an old and huge system of which I only have partly control. Anyway each start and endnumber is a telephonenumber (just 8 digits where I live) Normal start and endnumber is the same (in 80% of the rows) but often its a series of numbers. I tried the FBI part with substr(startnumber, 1, 4) and the same on endnumber as well as using start and endnumber in the query as suggested. I even hinted the SQL and the index is used but is infact 50% slower. So to sum up: The oracle version is 8.1.7 (yes it is old, yes it has reached EOL) I have a table: NUMBERS (startnumber(varchar2(8)) not null, endnumber(varchar2(8)) not null) I do the query: select * from numbers where '12345678' between startnumber and endnumber and even with a hint on startnumber, endnumber index, it performes bad. Number of rows in the numbers table is 5-6 millions. There is a normal index on startnumber, endnumber (both asc) There is a function based index on substr(startnumber, 1, 4), substr(endnumber, 1, 4), startnumber, endnumber And even if I do a select that uses the FBI it still performes bad, actually about 50% more bad :-( You wrote that you might had any ideas or did i misunderstood you ? Regards Benjamin |
| |||
| =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelberg@gmail.com) wrote: : On 27 Jun., 05:45, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote: : > =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote: : > : > : Hi all : > : > : I have a simple sql which I can't make perform. : > : > : In its simplest form it has two cols, a startnumber and an endnumber. : > : None of the rows overlap with another. : > : > : I am using an sql which looks like this: : > : > : select startnumber, endnumber from table where '12345678' between : > : startnumber and endnumber : > : > : I have also tried : > : select startnumber, endnumber from table where '12345678' >= : > : startnumber and '12345678' <= endnumber : > : > : The latter sometimes performes a little better. : > : > : The table has about 6 mill rows. : > : I have tried with indexes on both rows, both with desc and asc as FBI : > : in any combination that i can think of and also hinted the sql. : > : > : The explain plan shows that it will use the index but the query takes : > : around 6 to 9 seconds which is way to much in respect of the needs. : > : > : Does any one of you have any suggestions ? : > : > create an index on four columns, where the first two columns contain : > truncations of the two numbers. : > : > indexed columns would be : > ( integer portion of startnumber/1000 : > integer portion of endnumber/1000 : > startnumber : > endnumber : > ) : > : > the 1000 is just an example, some other number is probably better : > : > you might have to query on the truncated number as well as the real : > number. : > : > WHERE integer portion of 12345678/1000 between : > integer portion of startnumber/1000 and : > integer portion of endnumber/1000 : > AND 12345678 between startnumber and endnumber : > : > $0.10 : Hello Malcolm, : Thank you for your fast reply. : I was thinking about something like this, not an index based but just : the first digit as each number are all 8 digits. : But your idea seems like a better way. If they are actually strings then using the first digit would be the same general idea. : One question: : If there are more than 9999 between fromnumber and tonumber will this : still work ? Are you saying that there is a correspondence between the two numbers in each row? If so then some function of the two numbers should perhaps be indexed instead of the two numbers independently. : Say i select from 20000000 to 30000000. : I belive not and in this case i should select using only startnumber : and endnumber right ? No, the underlying issue is the same. The point of the index can be determined by trying to sort a list of pairs of numbers. The following is sorted on the first number. 1 10 2 5 3 8 4 1 5 1 6 3 notice that sorting on the first number helps you find the first number efficiently, but it will not help in finding the second number. On average, the first number is in the middle of the list, but then you have to scan through all the rows _starting_ at that point in the list to find the second number. (i.e. on average you still half to scan through half the rows). What happens if you sort the above on both numbers? - well actually the list is already sorted on both numbers! It doesn't help because the first numbers are all unique so each first number corresponds to a single second number. The point of the suggested index is to partition the first numbers into lumps. You can efficiently find which lumps probably have the first number because that is sorted, but within each lump there is a sorted set of second numbers to help you efficiently find the second numbers in each lump. Each level of partitition should, on average, half the rows to be examined. The original numbers must be examined at some point, so you put them are in the index so that only the index needs to be read, not the table. HOWEVER, the indexing adds over head, so at some point the technique starts to become less efficient, therefore experimentation is required to see if this will even help in your situation. Also, if the numbers already have many repetitions then the above will do nothing except add over head, so as I said, experimentation and understanding is required to see if this will even help in your situation. |
| |||
| On 27 Jun., 23:41, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote: > =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote: > > : On 27 Jun., 05:45, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote: > : > =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote: > : > SNIP > : > $0.10 > > : Hello Malcolm, > > : Thank you for your fast reply. > : I was thinking about something like this, not an index based but just > : the first digit as each number are all 8 digits. > : But your idea seems like a better way. > > If they are actually strings then using the first digit would be the same > general idea. > > : One question: > : If there are more than 9999 between fromnumber and tonumber will this > : still work ? > > Are you saying that there is a correspondence between the two numbers in > each row? If so then some function of the two numbers should perhaps be > indexed instead of the two numbers independently. > > : Say i select from 20000000 to 30000000. > : I belive not and in this case i should select using only startnumber > : and endnumber right ? > > No, the underlying issue is the same. > > The point of the index can be determined by trying to sort a list of pairs > of numbers. The following is sorted on the first number. > > 1 10 > 2 5 > 3 8 > 4 1 > 5 1 > 6 3 > > notice that sorting on the first number helps you find the first number > efficiently, but it will not help in finding the second number. On > average, the first number is in the middle of the list, but then you have > to scan through all the rows _starting_ at that point in the list to find > the second number. (i.e. on average you still half to scan through half > the rows). > > What happens if you sort the above on both numbers? - well actually the > list is already sorted on both numbers! It doesn't help because the first > numbers are all unique so each first number corresponds to a single second > number. > > The point of the suggested index is to partition the first numbers into > lumps. You can efficiently find which lumps probably have the first > number because that is sorted, but within each lump there is a sorted set > of second numbers to help you efficiently find the second numbers in each > lump. Each level of partitition should, on average, half the rows to be > examined. > > The original numbers must be examined at some point, so you put them are > in the index so that only the index needs to be read, not the table. > > HOWEVER, the indexing adds over head, so at some point the technique > starts to become less efficient, therefore experimentation is required to > see if this will even help in your situation. > > Also, if the numbers already have many repetitions then the above will do > nothing except add over head, so as I said, experimentation and > understanding is required to see if this will even help in your situation. Yes each startnumber is either less or equal to the endnumber. Here is an example: startnumber - endnumber 20000000 - 20000000 20000010 - 20000545 31234567 - 31234567 47462832 - 47469999 and so on up til the value 99999999. Not all numbers exists and some may cross bounderies of others. I belive i understand the "lump" part and in my case I should do the following: Create an index on substr(startnumber, 1, 4), substr(endnumber, 1, 4), startnumber, endnumber Then do a SQL like this : SELECT * FROM numbers WHERE '3131' BETWEEN SUBSTR (startnumber, 1, 4) AND SUBSTR (endnumber, 1, 4) AND '31313131' BETWEEN startnumber AND endnumber; Is this correct ? Regards Benjamin |
| |||
| =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelberg@gmail.com) wrote: : Create an index on substr(startnumber, 1, 4), substr(endnumber, 1, 4), : startnumber, endnumber : Then do a SQL like this : : SELECT * : FROM numbers : WHERE '3131' BETWEEN SUBSTR (startnumber, 1, 4) AND SUBSTR : (endnumber, 1, 4) : AND '31313131' BETWEEN startnumber AND endnumber; : Is this correct ? I can't say off hand. I think if you use a functional index then the syntax used within the where must be identical to the syntax used in the functional index. The "traditional" way would be to add columns that have derived values and include those columns in the query table X startnumber NUMBER endnumber NUMBER start_helper NUMBER (always has part of startnumnber) end_helper NUMBER (always has part of endnumnber) index on all four columns SELECT * FROM X WHERE '3131' BETWEEN start_helper and end_helper AND '31313131' BETWEEN startnumber AND endnumber; More than this I cannot say, at this stage I would be experimenting to check my understanding. |
| |||
| On 27 jun, 22:19, Benjamin Sølberg <benjamin.soelb...@gmail.com> wrote: > On 27 Jun., 20:50, Frank van Bortel <frank.van.bor...@gmail.com> > wrote: > > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > Benjamin Sølberg wrote: > > > I am using an sql which looks like this: > > > > select startnumber, endnumber from table where '12345678' between > > > startnumber and endnumber > > > What is it - numbers or strings?!? If it's a number, loose > > the quotes. (If it's a string, I'd make you rename the > > columns in the design!) > > > > Does any one of you have any suggestions ? > > > Yes > > > - -- > > Regards, > > Frank van Bortel > > > Top-posting is one way to shut me up... > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.1 (MingW32) > > > iD8DBQFGgrFiLw8L4IAs830RAtkmAKCSRdO2Ab4yEwWHIN4FWL M4gTcUmgCfWlzb > > MU7zlqalHvZFFTStEVDV42o= > > =ZFK7 > > -----END PGP SIGNATURE----- > > Hi Frank > Yes the numbers are infact Strings.. actually varchar2(8) > I would very much like to change it but its an old and huge system of > which I only have partly control. > Which part of " (If it's a string, I'd make you rename the columns in the design!) " did you not understand?!? Do us a favor: take a copy of your table: create table frank as select to_number(startnumber) "STARTNUMBER" , to_number(endnumber) "ENDNUMBER" from numbers; Now rerun you between query with numbers (meaning: same statement, no quotes). Explain the difference in speed. Due, monday, before 09:00. |
| ||||
| -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Malcolm Dew-Jones wrote: > > table X > startnumber NUMBER [snip] > index on all four columns > > SELECT * > FROM X > WHERE '3131' BETWEEN start_helper and end_helper Not using index - implicit conversion - BAD! Don't tell - but comparing numbers is much faster that comparing strings - hence my homework assignment. - -- Regards, Frank van Bortel Top-posting is one way to shut me up... -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (MingW32) iD8DBQFGhWnSLw8L4IAs830RAmCeAJ9V/Qw4HbZ9o7jucRYOSyB77eYnsQCeN5cJ JjVC9cFksXiIaO+AojSYUJo= =uILO -----END PGP SIGNATURE----- |