Unix Technical Forum

poor use of indexes / feature: add OnlyUseIndex join/query plan hint?

This is a discussion on poor use of indexes / feature: add OnlyUseIndex join/query plan hint? within the Pgsql Performance forums, part of the PostgreSQL category; --> Say (regardless of dbms) we have a tables as follows: order_assignee (orderid, userid) order_units (orderid, unitid) There are covering ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:09 AM
Sergei Shelukhin
 
Posts: n/a
Default poor use of indexes / feature: add OnlyUseIndex join/query plan hint?

Say (regardless of dbms) we have a tables as follows:
order_assignee (orderid, userid)
order_units (orderid, unitid)

There are covering indices as follows: userid, orderid on
order_assignee and unitid, orderid on order_units (column order could
be rdmbs dependent, index is first sorted by unitis/userid and inside
those groups, by orderid)

we want to select all the orders that included widgets (ID 23054) and
were, at some point, handled by John (ID 7777).
To provide some background, many people order widgets so they are
present in no less than 200k orders, whereas John is a new hire who
has handled 50 orders at most.

so we write the following query
SELECT order_assignees.orderid from order_assignees inner join
order_units ON userid = 7777 and unitid = 23054 and
order_assignees.orderid = order_units.orderid;
(in future, we also rewrite it with subqueries, reverse subqueries,
reverse join order, move conditions around, recreate indexes with all
conceivable column orders, and for one or two columns, for a certain
rdmbs, but bear with me).

Now, what should happen in an ideal database in my head is that we
(rather quickly) fetch two chunks of data (sorted, too) using indexes
(not touching tables) and then merge join them with a lightning speed,
getting 12 rows of results.
Judging by query speed, that (or something similar) happens in both
MySQL and MSSQL.
In PostgreSQL, however, what happens is a quick bitmap scan on
order_assignees, then a huge bitmap scan on order_units (I suppose it
is checking the table, cause noone could possibly fetch 200k
contiguous rows consisting of int, bigint and index overhead, for
180-200 seconds on a very powerful server) and then/at the same time
we see a huge hash join.
Why is that? How do I make it use indexes only for the first stage?

I heard a lot about transactions / invisible rows before, esp. in
count() slowness threads, but
1) MSSQL (and Oracle too , I suppose, don't have it here to test)
handles this quickly using indexes and keeping transactions, too.
2) heck, I'm ok with MySQL way (as far as I understand it) - what if I
don't use transactions (you can do that), or what if I am willing to
accept potential inaccuracy for xNNN performance increase in some
cases?


I suggest that, even if PG authors insist on keeping "recheck the
table" attitude to fetching stuff by default, they could add a hint to
allow us mortals to use index only and nothing but index where
appropriate (that would be most cases for a typical application I
suppose).
That would also solve a problem with slow count()...

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:07 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com