This is a discussion on rtree/gist index taking enormous amount of space in 8.2.3 within the Pgsql Performance forums, part of the PostgreSQL category; --> The application need is to determine genomic features present in a user-defined portion of a chromosome. My guess is ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The application need is to determine genomic features present in a user-defined portion of a chromosome. My guess is that features (boxes) are overlapping along a line (chromosome), and there is a need to represent them as being stacked. Since I'm not certain of its exact use, I've emailed the application owner to find the motivation as to why a geometric index structure is used, and why the boxes are tall and overlapping. As a side note, the data model for our application is based on a popular bioinformatics open source project called chado. Thanks, Tom -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, June 29, 2007 2:38 PM To: Dolafi, Tom Cc: pgsql-performance@postgresql.org; Oleg Bartunov; Teodor Sigaev Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3 "Dolafi, Tom" <dolafit@janelia.hhmi.org> writes: > In the mean time I've dropped the index which has resulted in overall > performance gain on queries against the table, but we have not tested > the part of the application which would utilize this index. I noted that with the same (guessed-at) distribution of fmin/fmax, the index size remains reasonable if you change the derived boxes to CREATE OR REPLACE FUNCTION boxrange(integer, integer) RETURNS box AS 'SELECT box (point($1, $1), point($2, $2))' LANGUAGE 'sql' STRICT IMMUTABLE; which makes sense from the point of view of geometric intuition: instead of a bunch of very tall, mostly very narrow, mostly overlapping boxes, you have a bunch of small square boxes spread out along a line. So it stands to reason that a geometrically-motivated index structure would work a lot better on the latter. I don't know though whether your queries can be adapted to work with this. What was the index being used for, exactly? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Thank you for the patch. The index size is back down to 500MB and there are no performance issues with queries against the table. -----Original Message----- From: Teodor Sigaev [mailto:teodor@sigaev.ru] Sent: Friday, July 06, 2007 8:08 AM To: Tom Lane Cc: Dolafi, Tom; pgsql-performance@postgresql.org; Oleg Bartunov Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3 > Oleg, Teodor, can this be improved? Attached patch improves creation of index for similar corner cases. And split algorithm still demonstrates O(n). It possible to make fallback to Guttman's split algorithm in corner cases, but I don't like this: used linear algorithm is much faster and usually has better performance in search. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/ ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |