Unix Technical Forum

simple sql using between startnumber and endnumber not performing

This is a discussion on simple sql using between startnumber and endnumber not performing within the Oracle Miscellaneous forums, part of the Oracle Database category; --> Hi all I have a simple sql which I can't make perform. In its simplest form it has two ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database > Oracle Miscellaneous

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-08-2008, 01:07 PM
=?iso-8859-1?q?Benjamin_S=F8lberg?=
 
Posts: n/a
Default simple sql using between startnumber and endnumber not performing

Hi all

I have a simple sql which I can't make perform.

In its simplest form it has two cols, a startnumber and an endnumber.
None of the rows overlap with another.

I am using an sql which looks like this:

select startnumber, endnumber from table where '12345678' between
startnumber and endnumber

I have also tried
select startnumber, endnumber from table where '12345678' >=
startnumber and '12345678' <= endnumber

The latter sometimes performes a little better.

The table has about 6 mill rows.
I have tried with indexes on both rows, both with desc and asc as FBI
in any combination that i can think of and also hinted the sql.

The explain plan shows that it will use the index but the query takes
around 6 to 9 seconds which is way to much in respect of the needs.

Does any one of you have any suggestions ?

Regards
Benjamin

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-08-2008, 01:07 PM
Malcolm Dew-Jones
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

=?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelberg@gmail.com) wrote:
: Hi all

: I have a simple sql which I can't make perform.

: In its simplest form it has two cols, a startnumber and an endnumber.
: None of the rows overlap with another.

: I am using an sql which looks like this:

: select startnumber, endnumber from table where '12345678' between
: startnumber and endnumber

: I have also tried
: select startnumber, endnumber from table where '12345678' >=
: startnumber and '12345678' <= endnumber

: The latter sometimes performes a little better.

: The table has about 6 mill rows.
: I have tried with indexes on both rows, both with desc and asc as FBI
: in any combination that i can think of and also hinted the sql.

: The explain plan shows that it will use the index but the query takes
: around 6 to 9 seconds which is way to much in respect of the needs.

: Does any one of you have any suggestions ?

create an index on four columns, where the first two columns contain
truncations of the two numbers.

indexed columns would be
( integer portion of startnumber/1000
integer portion of endnumber/1000
startnumber
endnumber
)

the 1000 is just an example, some other number is probably better

you might have to query on the truncated number as well as the real
number.

WHERE integer portion of 12345678/1000 between
integer portion of startnumber/1000 and
integer portion of endnumber/1000
AND 12345678 between startnumber and endnumber

$0.10

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-08-2008, 01:07 PM
=?iso-8859-1?q?Benjamin_S=F8lberg?=
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

On 27 Jun., 05:45, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote:
> =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote:
>
> : Hi all
>
> : I have a simple sql which I can't make perform.
>
> : In its simplest form it has two cols, a startnumber and an endnumber.
> : None of the rows overlap with another.
>
> : I am using an sql which looks like this:
>
> : select startnumber, endnumber from table where '12345678' between
> : startnumber and endnumber
>
> : I have also tried
> : select startnumber, endnumber from table where '12345678' >=
> : startnumber and '12345678' <= endnumber
>
> : The latter sometimes performes a little better.
>
> : The table has about 6 mill rows.
> : I have tried with indexes on both rows, both with desc and asc as FBI
> : in any combination that i can think of and also hinted the sql.
>
> : The explain plan shows that it will use the index but the query takes
> : around 6 to 9 seconds which is way to much in respect of the needs.
>
> : Does any one of you have any suggestions ?
>
> create an index on four columns, where the first two columns contain
> truncations of the two numbers.
>
> indexed columns would be
> ( integer portion of startnumber/1000
> integer portion of endnumber/1000
> startnumber
> endnumber
> )
>
> the 1000 is just an example, some other number is probably better
>
> you might have to query on the truncated number as well as the real
> number.
>
> WHERE integer portion of 12345678/1000 between
> integer portion of startnumber/1000 and
> integer portion of endnumber/1000
> AND 12345678 between startnumber and endnumber
>
> $0.10


Hello Malcolm,

Thank you for your fast reply.
I was thinking about something like this, not an index based but just
the first digit as each number are all 8 digits.
But your idea seems like a better way.

One question:
If there are more than 9999 between fromnumber and tonumber will this
still work ?
Say i select from 20000000 to 30000000.
I belive not and in this case i should select using only startnumber
and endnumber right ?

Regards
Benjamin

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-08-2008, 01:07 PM
Frank van Bortel
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Benjamin Sølberg wrote:

> I am using an sql which looks like this:
>
> select startnumber, endnumber from table where '12345678' between
> startnumber and endnumber


What is it - numbers or strings?!? If it's a number, loose
the quotes. (If it's a string, I'd make you rename the
columns in the design!)
>
> Does any one of you have any suggestions ?
>

Yes

- --
Regards,
Frank van Bortel

Top-posting is one way to shut me up...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)

iD8DBQFGgrFiLw8L4IAs830RAtkmAKCSRdO2Ab4yEwWHIN4FWL M4gTcUmgCfWlzb
MU7zlqalHvZFFTStEVDV42o=
=ZFK7
-----END PGP SIGNATURE-----
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-08-2008, 01:07 PM
=?iso-8859-1?q?Benjamin_S=F8lberg?=
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

On 27 Jun., 20:50, Frank van Bortel <frank.van.bor...@gmail.com>
wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Benjamin Sølberg wrote:
> > I am using an sql which looks like this:

>
> > select startnumber, endnumber from table where '12345678' between
> > startnumber and endnumber

>
> What is it - numbers or strings?!? If it's a number, loose
> the quotes. (If it's a string, I'd make you rename the
> columns in the design!)
>
> > Does any one of you have any suggestions ?

>
> Yes
>
> - --
> Regards,
> Frank van Bortel
>
> Top-posting is one way to shut me up...
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (MingW32)
>
> iD8DBQFGgrFiLw8L4IAs830RAtkmAKCSRdO2Ab4yEwWHIN4FWL M4gTcUmgCfWlzb
> MU7zlqalHvZFFTStEVDV42o=
> =ZFK7
> -----END PGP SIGNATURE-----



Hi Frank
Yes the numbers are infact Strings.. actually varchar2(8)
I would very much like to change it but its an old and huge system of
which I only have partly control.

Anyway each start and endnumber is a telephonenumber (just 8 digits
where I live)
Normal start and endnumber is the same (in 80% of the rows) but often
its a series of numbers.
I tried the FBI part with substr(startnumber, 1, 4) and the same on
endnumber as well as using start and endnumber in the query as
suggested.
I even hinted the SQL and the index is used but is infact 50% slower.

So to sum up:

The oracle version is 8.1.7 (yes it is old, yes it has reached EOL)

I have a table:
NUMBERS (startnumber(varchar2(8)) not null, endnumber(varchar2(8)) not
null)

I do the query:
select * from numbers where '12345678' between startnumber and
endnumber and even with a hint on startnumber, endnumber index, it
performes bad.

Number of rows in the numbers table is 5-6 millions.
There is a normal index on startnumber, endnumber (both asc)
There is a function based index on substr(startnumber, 1, 4),
substr(endnumber, 1, 4), startnumber, endnumber

And even if I do a select that uses the FBI it still performes bad,
actually about 50% more bad :-(

You wrote that you might had any ideas or did i misunderstood you ?

Regards
Benjamin

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-08-2008, 01:07 PM
Malcolm Dew-Jones
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

=?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelberg@gmail.com) wrote:
: On 27 Jun., 05:45, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote:
: > =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote:
: >
: > : Hi all
: >
: > : I have a simple sql which I can't make perform.
: >
: > : In its simplest form it has two cols, a startnumber and an endnumber.
: > : None of the rows overlap with another.
: >
: > : I am using an sql which looks like this:
: >
: > : select startnumber, endnumber from table where '12345678' between
: > : startnumber and endnumber
: >
: > : I have also tried
: > : select startnumber, endnumber from table where '12345678' >=
: > : startnumber and '12345678' <= endnumber
: >
: > : The latter sometimes performes a little better.
: >
: > : The table has about 6 mill rows.
: > : I have tried with indexes on both rows, both with desc and asc as FBI
: > : in any combination that i can think of and also hinted the sql.
: >
: > : The explain plan shows that it will use the index but the query takes
: > : around 6 to 9 seconds which is way to much in respect of the needs.
: >
: > : Does any one of you have any suggestions ?
: >
: > create an index on four columns, where the first two columns contain
: > truncations of the two numbers.
: >
: > indexed columns would be
: > ( integer portion of startnumber/1000
: > integer portion of endnumber/1000
: > startnumber
: > endnumber
: > )
: >
: > the 1000 is just an example, some other number is probably better
: >
: > you might have to query on the truncated number as well as the real
: > number.
: >
: > WHERE integer portion of 12345678/1000 between
: > integer portion of startnumber/1000 and
: > integer portion of endnumber/1000
: > AND 12345678 between startnumber and endnumber
: >
: > $0.10

: Hello Malcolm,

: Thank you for your fast reply.
: I was thinking about something like this, not an index based but just
: the first digit as each number are all 8 digits.
: But your idea seems like a better way.

If they are actually strings then using the first digit would be the same
general idea.

: One question:
: If there are more than 9999 between fromnumber and tonumber will this
: still work ?

Are you saying that there is a correspondence between the two numbers in
each row? If so then some function of the two numbers should perhaps be
indexed instead of the two numbers independently.


: Say i select from 20000000 to 30000000.
: I belive not and in this case i should select using only startnumber
: and endnumber right ?

No, the underlying issue is the same.

The point of the index can be determined by trying to sort a list of pairs
of numbers. The following is sorted on the first number.

1 10
2 5
3 8
4 1
5 1
6 3

notice that sorting on the first number helps you find the first number
efficiently, but it will not help in finding the second number. On
average, the first number is in the middle of the list, but then you have
to scan through all the rows _starting_ at that point in the list to find
the second number. (i.e. on average you still half to scan through half
the rows).

What happens if you sort the above on both numbers? - well actually the
list is already sorted on both numbers! It doesn't help because the first
numbers are all unique so each first number corresponds to a single second
number.

The point of the suggested index is to partition the first numbers into
lumps. You can efficiently find which lumps probably have the first
number because that is sorted, but within each lump there is a sorted set
of second numbers to help you efficiently find the second numbers in each
lump. Each level of partitition should, on average, half the rows to be
examined.

The original numbers must be examined at some point, so you put them are
in the index so that only the index needs to be read, not the table.

HOWEVER, the indexing adds over head, so at some point the technique
starts to become less efficient, therefore experimentation is required to
see if this will even help in your situation.

Also, if the numbers already have many repetitions then the above will do
nothing except add over head, so as I said, experimentation and
understanding is required to see if this will even help in your situation.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-08-2008, 01:07 PM
=?iso-8859-1?q?Benjamin_S=F8lberg?=
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

On 27 Jun., 23:41, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote:
> =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote:
>
> : On 27 Jun., 05:45, y...@vtn1.victoria.tc.ca (Malcolm Dew-Jones) wrote:
> : > =?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelb...@gmail.com) wrote:
> : >

SNIP
> : > $0.10
>
> : Hello Malcolm,
>
> : Thank you for your fast reply.
> : I was thinking about something like this, not an index based but just
> : the first digit as each number are all 8 digits.
> : But your idea seems like a better way.
>
> If they are actually strings then using the first digit would be the same
> general idea.
>
> : One question:
> : If there are more than 9999 between fromnumber and tonumber will this
> : still work ?
>
> Are you saying that there is a correspondence between the two numbers in
> each row? If so then some function of the two numbers should perhaps be
> indexed instead of the two numbers independently.
>
> : Say i select from 20000000 to 30000000.
> : I belive not and in this case i should select using only startnumber
> : and endnumber right ?
>
> No, the underlying issue is the same.
>
> The point of the index can be determined by trying to sort a list of pairs
> of numbers. The following is sorted on the first number.
>
> 1 10
> 2 5
> 3 8
> 4 1
> 5 1
> 6 3
>
> notice that sorting on the first number helps you find the first number
> efficiently, but it will not help in finding the second number. On
> average, the first number is in the middle of the list, but then you have
> to scan through all the rows _starting_ at that point in the list to find
> the second number. (i.e. on average you still half to scan through half
> the rows).
>
> What happens if you sort the above on both numbers? - well actually the
> list is already sorted on both numbers! It doesn't help because the first
> numbers are all unique so each first number corresponds to a single second
> number.
>
> The point of the suggested index is to partition the first numbers into
> lumps. You can efficiently find which lumps probably have the first
> number because that is sorted, but within each lump there is a sorted set
> of second numbers to help you efficiently find the second numbers in each
> lump. Each level of partitition should, on average, half the rows to be
> examined.
>
> The original numbers must be examined at some point, so you put them are
> in the index so that only the index needs to be read, not the table.
>
> HOWEVER, the indexing adds over head, so at some point the technique
> starts to become less efficient, therefore experimentation is required to
> see if this will even help in your situation.
>
> Also, if the numbers already have many repetitions then the above will do
> nothing except add over head, so as I said, experimentation and
> understanding is required to see if this will even help in your situation.



Yes each startnumber is either less or equal to the endnumber.

Here is an example:

startnumber - endnumber

20000000 - 20000000
20000010 - 20000545
31234567 - 31234567
47462832 - 47469999

and so on up til the value 99999999.
Not all numbers exists and some may cross bounderies of others.

I belive i understand the "lump" part and in my case I should do the
following:

Create an index on substr(startnumber, 1, 4), substr(endnumber, 1, 4),
startnumber, endnumber

Then do a SQL like this :
SELECT *
FROM numbers
WHERE '3131' BETWEEN SUBSTR (startnumber, 1, 4) AND SUBSTR
(endnumber, 1, 4)
AND '31313131' BETWEEN startnumber AND endnumber;

Is this correct ?

Regards
Benjamin

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-08-2008, 01:07 PM
Malcolm Dew-Jones
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

=?iso-8859-1?q?Benjamin_S=F8lberg?= (benjamin.soelberg@gmail.com) wrote:

: Create an index on substr(startnumber, 1, 4), substr(endnumber, 1, 4),
: startnumber, endnumber

: Then do a SQL like this :
: SELECT *
: FROM numbers
: WHERE '3131' BETWEEN SUBSTR (startnumber, 1, 4) AND SUBSTR
: (endnumber, 1, 4)
: AND '31313131' BETWEEN startnumber AND endnumber;

: Is this correct ?

I can't say off hand. I think if you use a functional index then the
syntax used within the where must be identical to the syntax used in the
functional index.

The "traditional" way would be to add columns that have derived values and
include those columns in the query

table X
startnumber NUMBER
endnumber NUMBER
start_helper NUMBER (always has part of startnumnber)
end_helper NUMBER (always has part of endnumnber)

index on all four columns

SELECT *
FROM X
WHERE '3131' BETWEEN start_helper and end_helper
AND '31313131' BETWEEN startnumber AND endnumber;


More than this I cannot say, at this stage I would be experimenting to
check my understanding.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-08-2008, 01:07 PM
Frank van Bortel
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

On 27 jun, 22:19, Benjamin Sølberg <benjamin.soelb...@gmail.com>
wrote:
> On 27 Jun., 20:50, Frank van Bortel <frank.van.bor...@gmail.com>
> wrote:
>
>
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1

>
> > Benjamin Sølberg wrote:
> > > I am using an sql which looks like this:

>
> > > select startnumber, endnumber from table where '12345678' between
> > > startnumber and endnumber

>
> > What is it - numbers or strings?!? If it's a number, loose
> > the quotes. (If it's a string, I'd make you rename the
> > columns in the design!)

>
> > > Does any one of you have any suggestions ?

>
> > Yes

>
> > - --
> > Regards,
> > Frank van Bortel

>
> > Top-posting is one way to shut me up...
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.1 (MingW32)

>
> > iD8DBQFGgrFiLw8L4IAs830RAtkmAKCSRdO2Ab4yEwWHIN4FWL M4gTcUmgCfWlzb
> > MU7zlqalHvZFFTStEVDV42o=
> > =ZFK7
> > -----END PGP SIGNATURE-----

>
> Hi Frank
> Yes the numbers are infact Strings.. actually varchar2(8)
> I would very much like to change it but its an old and huge system of
> which I only have partly control.
>

Which part of " (If it's a string, I'd make you rename the
columns in the design!) "
did you not understand?!?

Do us a favor: take a copy of your table:
create table frank as select to_number(startnumber) "STARTNUMBER" ,
to_number(endnumber) "ENDNUMBER" from
numbers;

Now rerun you between query with numbers (meaning: same statement, no
quotes).

Explain the difference in speed.

Due, monday, before 09:00.



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-08-2008, 01:08 PM
Frank van Bortel
 
Posts: n/a
Default Re: simple sql using between startnumber and endnumber not performing

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Malcolm Dew-Jones wrote:
>
> table X
> startnumber NUMBER

[snip]
> index on all four columns
>
> SELECT *
> FROM X
> WHERE '3131' BETWEEN start_helper and end_helper


Not using index - implicit conversion - BAD!

Don't tell - but comparing numbers is much faster that
comparing strings - hence my homework assignment.
- --
Regards,
Frank van Bortel

Top-posting is one way to shut me up...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)

iD8DBQFGhWnSLw8L4IAs830RAmCeAJ9V/Qw4HbZ9o7jucRYOSyB77eYnsQCeN5cJ
JjVC9cFksXiIaO+AojSYUJo=
=uILO
-----END PGP SIGNATURE-----
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:27 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com