Unix Technical Forum

rtree/gist index taking enormous amount of space in 8.2.3

This is a discussion on rtree/gist index taking enormous amount of space in 8.2.3 within the Pgsql Performance forums, part of the PostgreSQL category; --> The application need is to determine genomic features present in a user-defined portion of a chromosome. My guess is ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-19-2008, 11:09 AM
Dolafi, Tom
 
Posts: n/a
Default Re: rtree/gist index taking enormous amount of space in 8.2.3

The application need is to determine genomic features present in a
user-defined portion of a chromosome. My guess is that features (boxes)
are overlapping along a line (chromosome), and there is a need to
represent them as being stacked. Since I'm not certain of its exact
use, I've emailed the application owner to find the motivation as to why
a geometric index structure is used, and why the boxes are tall and
overlapping. As a side note, the data model for our application is
based on a popular bioinformatics open source project called chado.

Thanks,
Tom
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, June 29, 2007 2:38 PM
To: Dolafi, Tom
Cc: pgsql-performance@postgresql.org; Oleg Bartunov; Teodor Sigaev
Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space
in 8.2.3

"Dolafi, Tom" <dolafit@janelia.hhmi.org> writes:
> In the mean time I've dropped the index which has resulted in overall
> performance gain on queries against the table, but we have not tested
> the part of the application which would utilize this index.


I noted that with the same (guessed-at) distribution of fmin/fmax, the
index size remains reasonable if you change the derived boxes to

CREATE OR REPLACE FUNCTION boxrange(integer, integer)
RETURNS box AS
'SELECT box (point($1, $1), point($2, $2))'
LANGUAGE 'sql' STRICT IMMUTABLE;

which makes sense from the point of view of geometric intuition: instead
of a bunch of very tall, mostly very narrow, mostly overlapping boxes,
you have a bunch of small square boxes spread out along a line. So it
stands to reason that a geometrically-motivated index structure would
work a lot better on the latter. I don't know though whether your
queries can be adapted to work with this. What was the index being used
for, exactly?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-19-2008, 11:10 AM
Dolafi, Tom
 
Posts: n/a
Default Re: rtree/gist index taking enormous amount of space in 8.2.3

Thank you for the patch. The index size is back down to 500MB and there
are no performance issues with queries against the table.

-----Original Message-----
From: Teodor Sigaev [mailto:teodor@sigaev.ru]
Sent: Friday, July 06, 2007 8:08 AM
To: Tom Lane
Cc: Dolafi, Tom; pgsql-performance@postgresql.org; Oleg Bartunov
Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space
in 8.2.3

> Oleg, Teodor, can this be improved?

Attached patch improves creation of index for similar corner cases. And
split
algorithm still demonstrates O(n).

It possible to make fallback to Guttman's split algorithm in corner
cases, but I
don't like this: used linear algorithm is much faster and usually has
better
performance in search.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:07 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com