This is a discussion on Fixing r-tree semantics within the pgsql Hackers forums, part of the PostgreSQL category; --> I looked into the r-tree breakage discussed in this thread: http://archives.postgresql.org/pgsql...3/msg01135.php The executive summary: r-tree index opclasses contain four ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I looked into the r-tree breakage discussed in this thread: http://archives.postgresql.org/pgsql...3/msg01135.php The executive summary: r-tree index opclasses contain four two-dimensional operators, which behave correctly, and four one-dimensional operators which do not. There is a basic logic error in the handling of the 1-D operators. One could also legitimately ask why the opclasses don't cover both directions (X and Y) for 1-D inquiries. The same problems exist in the contrib/rtree_gist implementation, because it just copied the r-tree logic without inquiring too closely into it :-( We currently have built-in opclasses for types "box" and "polygon", both of which are fundamentally 2-D objects. The 2-D operators that the r-tree opclasses handle are: ~= same (ordinary equality) && overlaps ~ contains @ is contained by with pretty much the intuitive definitions of these things. The 1-D operators in the opclasses are << left l.max_x < r.min_x >> right l.min_x > r.max_x &< overleft l.max_x <= r.max_x &> overright l.min_x >= r.min_x (I'm not here to argue about whether these definitions are right in detail, particularly about the equality cases; that's the way it's been since Berkeley and I'm not proposing to change them.) Now the problem is that given a query box, say "index_col << some_box", the rtree code has to decide whether to descend to a child page of the index tree based on what is in the parent index page's entry for the child --- and what is there is the union, or minimum combined bounding box, of the boxes or polygons in the child. So the test for descending is not the same as the test for whether a leaf index entry actually matches the query. Fine ... but somebody got this wrong long ago. If you think about it, the criterion for descending during a << (left) search is properly union_box.min_x < query.min_x If this is true, there might be boxes within the union that satisfy the << requirement (box.max_x < query.min_x); if it is not true then there clearly can be no such boxes. So, given the available operators, the correct test for descending is "!overright". But the actual test being done, according to rtstrat.c, is "overleft". This causes the search to fail to search child pages that should be searched (and probably also to waste time searching pages that shouldn't be searched). The observed result is, not surprisingly, that the indexscan fails to find some rows it should find. In the same way, the correct descent tests for the other operators are overleft: !right right: !overleft overright: !left overlaps: overlaps same: contains contains: contains contained: overlaps rtstrat.c gets the first three of these wrong, but the last four cases covering the 2-D operators are correct. This analysis explains why we've not heard more complaints about such a fundamental breakage: the cases most people care about, when using an r-tree, are 2-D inquiries. And what's more, the default selectivity estimates for 1-D inquiries are so low that the index never got used. In 8.1 this will change, because a bitmap index scan looks attractive to the planner even at rather low selectivity --- which means that we are probably going to hear more complaints, if we don't fix it. Fixing the existing operators seems relatively straightforward; there will need to be some extension to rtstrat.c to represent "NOT this operator" as well as "this operator", but that's not hard. I plan to do this, and make the corresponding fixes in contrib/rtree_gist as well. What needs more discussion is that it seems to me to make sense to extend the standard opclasses to handle the four Y-direction operators comparable to the X-direction operators that are already there, that is "above", "below", "overabove", and "overbelow". The polygon type has none of these operators at the moment. Box has <^ below l.max_y <= r.min_y >^ above l.min_y >= r.max_y but not the overlap variants. If you compare these to the X-direction versions: << left l.max_x < r.min_x >> right l.min_x > r.max_x there are two obvious discrepancies: the names aren't very similar and the equality cases are handled differently. We could incorporate the existing box_above and box_below operators into an r-tree opclass if we defined overabove and overbelow to not match on the equality case: overbelow l.max_y < r.max_y overabove l.min_y > r.min_y However, it seems just plain weird to me to have different edge-case behaviors in the X and Y directions. So I don't much care for that. I would prefer that the operators added to the opclasses act the same in both directions. We could avoid any backwards-compatibility complaints if we left those two operators alone (maybe redocumenting them as "below or touching" and "above or touching", though these descriptions are a bit misleading) and invented four new operators to be the Y-direction opclass members, say <<^ below l.max_y < r.min_y >>^ above l.min_y > r.max_y &<^ overbelow l.max_y <= r.max_y &>^ overabove l.min_y >= r.min_y This has a lot to recommend it: backwards compatibility and operator names that line up with the X-direction case. On the other hand, it'll confuse people to have operators that are so close in function, and having one be indexable and the other not seems like a gotcha. Plan C would be to just change the above and below operators, on the grounds that it is an obvious typo that they don't match left and right to begin with. We have made greater changes in the behavior of geometric operators in the past, so this isn't an obviously bogus choice. Not quite sure what to do, but I'd like to do something with it. Thoughts? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| On 2005-06-23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked into the r-tree breakage discussed in this thread: > http://archives.postgresql.org/pgsql...3/msg01135.php See also http://archives.postgresql.org/pgsql...1/msg00328.php in which I made most of the same points. Notice also that contrib/seg and contrib/cube have their own, and incompatible, idea of what the semantics of &< and &> should be. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services |
| |||
| On Thu, Jun 23, 2005 at 05:59:25PM -0400, Tom Lane wrote: > > Fixing the existing operators seems relatively straightforward; there will > need to be some extension to rtstrat.c to represent "NOT this operator" > as well as "this operator", but that's not hard. I plan to do this, and > make the corresponding fixes in contrib/rtree_gist as well. Excellent. If the fix is straightforward, is it going to be backpatched at least to 8.0? Or is backpatching not worthwhile, considering that hardly anybody stumbles across the problem or complains about it? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Michael Fuhr <mike@fuhr.org> writes: > On Thu, Jun 23, 2005 at 05:59:25PM -0400, Tom Lane wrote: >> Fixing the existing operators seems relatively straightforward; there will >> need to be some extension to rtstrat.c to represent "NOT this operator" >> as well as "this operator", but that's not hard. I plan to do this, and >> make the corresponding fixes in contrib/rtree_gist as well. > Excellent. If the fix is straightforward, is it going to be > backpatched at least to 8.0? Or is backpatching not worthwhile, > considering that hardly anybody stumbles across the problem or > complains about it? In principle it could be backpatched, because this is just a change in the search behavior and not a change in either catalog entries or rtree index contents; hence no initdb needed. However, given that the behavior has been broken since the rtree code was written and nobody noticed except bwhite, I think it's pretty low-priority to back-patch. I find it more significant for 8.1 because (a) it's now more likely that indexscans will get used for these queries, and (b) I'm thinking we really ought to fold rtree_gist into the core so that we have at least some built-in gist opclasses (for testing purposes if nothing else). I started out looking for the bug in rtree_gist, and eventually realized that it had slavishly copied rtree's bug... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Andrew - Supernews <andrew+nonews@supernews.com> writes: > On 2005-06-23, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I looked into the r-tree breakage discussed in this thread: >> http://archives.postgresql.org/pgsql...3/msg01135.php > See also http://archives.postgresql.org/pgsql...1/msg00328.php > in which I made most of the same points. So you did --- I had forgotten. Good to see that we arrived at the same conclusions. > Notice also that contrib/seg and contrib/cube have their own, and > incompatible, idea of what the semantics of &< and &> should be. Um. Not sure what to do about these ... any opinions? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| Hmmm ... just when you thought it was safe to go back in the water ... I was only looking closely at the "box" case earlier today, assuming that the polygon code was set up identically. Well, it isn't. In particular it appears that the poly_overleft and poly_overright definitions are different from box's, which means that rtrees are still broken for polygon searches. I'm of the opinion that this is a flat-out bug and we should just fix it, ie, change the operator definitions. The polygon definitions aren't even self-consistent (overleft accepts equality and overright doesn't). poly_left result = polya->boundbox.high.x < polyb->boundbox.low.x; poly_overleft: result = polya->boundbox.low.x <= polyb->boundbox.high.x; poly_right: result = polya->boundbox.low.x > polyb->boundbox.high.x; poly_overright: result = polya->boundbox.high.x > polyb->boundbox.low.x; By analogy to the box case these should be poly_overleft: result = polya->boundbox.high.x <= polyb->boundbox.high.x; poly_overright: result = polya->boundbox.low.x >= polyb->boundbox.low.x; regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| FYI, TODO has: * Fix incorrect rtree results due to wrong assumptions about "over" operator semantics [rtree] --------------------------------------------------------------------------- Tom Lane wrote: > Hmmm ... just when you thought it was safe to go back in the water ... > > I was only looking closely at the "box" case earlier today, assuming > that the polygon code was set up identically. Well, it isn't. In > particular it appears that the poly_overleft and poly_overright > definitions are different from box's, which means that rtrees are > still broken for polygon searches. > > I'm of the opinion that this is a flat-out bug and we should just > fix it, ie, change the operator definitions. The polygon definitions > aren't even self-consistent (overleft accepts equality and overright > doesn't). > > poly_left > result = polya->boundbox.high.x < polyb->boundbox.low.x; > poly_overleft: > result = polya->boundbox.low.x <= polyb->boundbox.high.x; > poly_right: > result = polya->boundbox.low.x > polyb->boundbox.high.x; > poly_overright: > result = polya->boundbox.high.x > polyb->boundbox.low.x; > > By analogy to the box case these should be > > poly_overleft: > result = polya->boundbox.high.x <= polyb->boundbox.high.x; > poly_overright: > result = polya->boundbox.low.x >= polyb->boundbox.low.x; > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| On Thu, 23 Jun 2005, Tom Lane wrote: > both directions (X and Y) for 1-D inquiries. The same problems exist in > the contrib/rtree_gist implementation, because it just copied the r-tree > logic without inquiring too closely into it :-( you're right, it was the beginning of our GiST experiments. Later we were interested in split algorithm and we never actually used rtree because we have used pg_sphere for working with spherical data. Glad to see you fixed these longstanding problem ! Does it means we could expect rtree_gist opclasses moved to core ? Regards, Oleg __________________________________________________ ___________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| Tom Lane wrote: > However, given that the > behavior has been broken since the rtree code was written and nobody > noticed except bwhite, I think it's pretty low-priority to back-patch. Well, leave it to me to find the obscure bugs in other people's code, and miss the blatant ones in my own. It's been awhile since I've looked at this and I've pretty much swapped my PG interals knowledge out of my brain. As I recall you can (or could) demonstrate the error with the default test suite, but you have to forcibly override the search strategy cost constants so that PG will actually do r-tree index searches (or maybe it was comparisons, it's been awhile) instead of sequential scan. Check the thread, I think I did send in a test case. In reality, with the default constants, you'd need a rather large set before you saw any problems. If anyone is curious, my intended application was time intervals. That is to say, *real* mathematical intervals with two rational numbers as endpoints, not just durations (displacements) which as I recall is what SQL time "intervals" actually are. Frankly, I've always considered it a serious oversight that PG doesn't provide this as a native type with its own indexing and operators, especially given that the default r-tree operators don't really work with right-open intervals (though that's another rant). In any case 1D was pretty much universal, barring radical changes to the spacetime continuum. I abandoned the project, but not before writing a general set of 1D interval operators to handle all cases of open and closed endpoints. I was under the impression that a fix had already been applied, but it's nice to see it happen. I say this because we discussed possible fixes -- either changes to operator semantics or new operators -- and someone with a wizard hat nodded in agreement. -- Bill ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| ||||
| I wrote: > Andrew - Supernews <andrew+nonews@supernews.com> writes: >> Notice also that contrib/seg and contrib/cube have their own, and >> incompatible, idea of what the semantics of &< and &> should be. > Um. Not sure what to do about these ... any opinions? Having looked at this, I propose the following: contrib/seg: fix the semantics of &< and &> to agree with box's semantics. There's no obvious usefulness to the way these operators are defined now, and since the code is using the former rtree indexing logic, they are clearly broken as-is. contrib/cube: I quote from cube.c: /* The following four methods compare the projections of the boxes onto the 0-th coordinate axis. These methods are useless for dimensions larger than 2, but it seems that R-tree requires all its strategies map to real functions that return something */ Now that the module uses GIST instead of r-tree, there's no very strong reason why it should provide these operators at all. I propose removing all of << >> &< &> from contrib/cube, leaving only the four n-dimensional indexing operators (&& ~= ~ @). Any objections? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |