Unix Technical Forum

GiST indexing tuples

This is a discussion on GiST indexing tuples within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi all. I'm wanting to write a new GiST index system, to improve performance on some queries I am ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:42 AM
Matthew
 
Posts: n/a
Default GiST indexing tuples


Hi all.

I'm wanting to write a new GiST index system, to improve performance on
some queries I am running. I have had quite a look through the docs and
code, and I'm not convinced that it is possible to do what I want. This is
what I am wanting to index:

CREATE INDEX range_index ON table(a, b) USING fancy_new_index;

and then:

SELECT * FROM table WHERE a > 1 AND b < 4;

and have that answered by the index.

Now, generating an index format that can answer that particular
arrangement of constraints is easy. I can do that. However, getting
multiple values into the GiST functions is something I don't know how to
do. As far as I can see, I would need to create a composite type and index
that, like the contrib package seg does. This would change my SQL to:

CREATE INDEX range_index ON table(fancy_type(a, b)) USING fancy_index;

SELECT * FROM table WHERE fancy_type(a, b) &^£@! fancy_type(1, 4);

which I don't want to do.

So, has this problem been solved before? Is there an already-existing
index that will speed up my query? Is there a way to get more than one
value into a GiST index?

Thanks,

Matthew

--
If you let your happiness depend upon how somebody else feels about you,
now you have to control how somebody else feels about you. -- Abraham Hicks

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:42 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: GiST indexing tuples

On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote:
> SELECT * FROM table WHERE a > 1 AND b < 4;


This sounds like something an R-tree can do.

/* Steinar */
--
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:42 AM
Matthew
 
Posts: n/a
Default Re: GiST indexing tuples

On Tue, 27 Nov 2007, Steinar H. Gunderson wrote:
> On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote:
> > SELECT * FROM table WHERE a > 1 AND b < 4;

>
> This sounds like something an R-tree can do.


I *know* that. However, Postgres (as far as I can see) doesn't provide a
simple R-tree index that will index two integers. This means I have to
write one myself. I'm asking whether it is possible to get two values into
a GiST index, which would allow me to implement this.

Matthew

--
It is better to keep your mouth closed and let people think you are a fool
than to open it and remove all doubt. -- Mark Twain

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:42 AM
Gregory Stark
 
Posts: n/a
Default Re: GiST indexing tuples

"Matthew" <matthew@flymine.org> writes:

> On Tue, 27 Nov 2007, Steinar H. Gunderson wrote:
>> On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote:
>> > SELECT * FROM table WHERE a > 1 AND b < 4;

>>
>> This sounds like something an R-tree can do.

>
> I *know* that. However, Postgres (as far as I can see) doesn't provide a
> simple R-tree index that will index two integers. This means I have to
> write one myself. I'm asking whether it is possible to get two values into
> a GiST index, which would allow me to implement this.


The database is capable of determining that a>1 and b<4 are both conditions
which a single index can satisfy.

However GIST itself treats each column of the index independently applying the
first column then the second one and so on like a traditional btree index, so
it doesn't really do what you would want.

I did propose a while back that GIST should consider all columns
simultaneously in the same style as rtree.

However this would require making GIST somewhat less general in another sense.
Currently page splits can be handled arbitrarily but if you have to be able to
combine different datatypes it would mean having to come up with a standard
algorithm which would work everywhere. (I suggested making everything work in
terms of "distance" and then using the n-space vector distance (ie
sqrt((a1-b1)^2+(a2-b2)^2+...).) This means GIST wouldn't be as general as
it is now but it would allow us to handle cases like yours automatically.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 11:42 AM
Tom Lane
 
Posts: n/a
Default Re: GiST indexing tuples

Matthew <matthew@flymine.org> writes:
>> This sounds like something an R-tree can do.


> I *know* that. However, Postgres (as far as I can see) doesn't provide a
> simple R-tree index that will index two integers. This means I have to
> write one myself. I'm asking whether it is possible to get two values into
> a GiST index, which would allow me to implement this.


Have you looked at contrib/seg/ ?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 11:42 AM
Matthew
 
Posts: n/a
Default Re: GiST indexing tuples

On Wed, 28 Nov 2007, Tom Lane wrote:
> Have you looked at contrib/seg/ ?


Yes, I had a pretty good look at that. However, I believe that in order to
use seg's indexes, I would need to put my data into seg's data type, and
reformat my query, as I stated in my original message. What I'm looking
for is a general R-tree (or similar) index that will index multiple
columns of normal data types.

For instance, the normal B-tree index on (a, b) is able to answer queries
like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these,
plus queries like "a > 5 AND b < 1".

As far as I can see, it is not possible at the moment to write such an
index system for GiST, which is a shame because the actual R-tree
algorithm is very simple. It's just a matter of communicating both values
from the query to the index code.

Matthew

--
I have an inferiority complex. But it's not a very good one.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 11:42 AM
Matthew T. O'Connor
 
Posts: n/a
Default Re: GiST indexing tuples

Matthew wrote:
> For instance, the normal B-tree index on (a, b) is able to answer queries
> like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these,
> plus queries like "a > 5 AND b < 1".


Sorry in advance if this is a stupid question, but how is this better
than two index, one on "a" and one on "b"? I supposed there could be a
space savings but beyond that?


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 11:42 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: GiST indexing tuples

On Thu, Nov 29, 2007 at 03:23:10PM -0500, Matthew T. O'Connor wrote:
> Sorry in advance if this is a stupid question, but how is this better than
> two index, one on "a" and one on "b"? I supposed there could be a space
> savings but beyond that?


You could index on both columns simultaneously without a bitmap index scan.

/* Steinar */
--
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 11:42 AM
Matthew
 
Posts: n/a
Default Re: GiST indexing tuples

On Thu, 29 Nov 2007, Matthew T. O'Connor wrote:
> Matthew wrote:
> > For instance, the normal B-tree index on (a, b) is able to answer queries
> > like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these,
> > plus queries like "a > 5 AND b < 1".

>
> Sorry in advance if this is a stupid question, but how is this better
> than two index, one on "a" and one on "b"? I supposed there could be a
> space savings but beyond that?


Imagine you have a table with columns "a" and "b". The table has a
bazillion rows, and the values of "a" and "b" both range from a negative
bazillion to a positive bazillion. (Note this is exactly the case in our
database, for some value of a bazillion.) Then, you run the query:

SELECT * FROM table WHERE a > 5 AND b < 1;

So, an index on "a" will return half a bazillion results for the
constraint "a > 5". Likewise, the index on "b" will return half a
bazillion results for the constraint "b < 1". However, the intersection of
these two constraints could be just a few rows. (Note this is exactly the
case in our database.)

Now, Postgres has two options. It could use just the one index and filter
half a bazillion rows (which is what it used to do), or it could create a
bitmap with a bazillion bits from each index, and do a logical AND
operation on them to create a new bitmap with just a few bits set (which
it now can do under some circumstances). Either way, it's going to be a
heavy operation.

An R-tree index on "a, b" would instantly return just those few rows,
without using significant amounts of memory or time.

Hope that helps,

Matthew

--
Patron: "I am looking for a globe of the earth."
Librarian: "We have a table-top model over here."
Patron: "No, that's not good enough. Don't you have a life-size?"
Librarian: (pause) "Yes, but it's in use right now."

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:09 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com