This is a discussion on poor use of indexes / feature: add OnlyUseIndex join/query plan hint? within the Pgsql Performance forums, part of the PostgreSQL category; --> Say (regardless of dbms) we have a tables as follows: order_assignee (orderid, userid) order_units (orderid, unitid) There are covering ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Say (regardless of dbms) we have a tables as follows: order_assignee (orderid, userid) order_units (orderid, unitid) There are covering indices as follows: userid, orderid on order_assignee and unitid, orderid on order_units (column order could be rdmbs dependent, index is first sorted by unitis/userid and inside those groups, by orderid) we want to select all the orders that included widgets (ID 23054) and were, at some point, handled by John (ID 7777). To provide some background, many people order widgets so they are present in no less than 200k orders, whereas John is a new hire who has handled 50 orders at most. so we write the following query SELECT order_assignees.orderid from order_assignees inner join order_units ON userid = 7777 and unitid = 23054 and order_assignees.orderid = order_units.orderid; (in future, we also rewrite it with subqueries, reverse subqueries, reverse join order, move conditions around, recreate indexes with all conceivable column orders, and for one or two columns, for a certain rdmbs, but bear with me). Now, what should happen in an ideal database in my head is that we (rather quickly) fetch two chunks of data (sorted, too) using indexes (not touching tables) and then merge join them with a lightning speed, getting 12 rows of results. Judging by query speed, that (or something similar) happens in both MySQL and MSSQL. In PostgreSQL, however, what happens is a quick bitmap scan on order_assignees, then a huge bitmap scan on order_units (I suppose it is checking the table, cause noone could possibly fetch 200k contiguous rows consisting of int, bigint and index overhead, for 180-200 seconds on a very powerful server) and then/at the same time we see a huge hash join. Why is that? How do I make it use indexes only for the first stage? I heard a lot about transactions / invisible rows before, esp. in count() slowness threads, but 1) MSSQL (and Oracle too , I suppose, don't have it here to test) handles this quickly using indexes and keeping transactions, too. 2) heck, I'm ok with MySQL way (as far as I understand it) - what if I don't use transactions (you can do that), or what if I am willing to accept potential inaccuracy for xNNN performance increase in some cases? I suggest that, even if PG authors insist on keeping "recheck the table" attitude to fetching stuff by default, they could add a hint to allow us mortals to use index only and nothing but index where appropriate (that would be most cases for a typical application I suppose). That would also solve a problem with slow count()... |