Unix Technical Forum

any way to get rid of Bitmap Heap Scan recheck?

This is a discussion on any way to get rid of Bitmap Heap Scan recheck? within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi. I have the following join condition in a query "posttag inner join tag ON posttag.tagid = tag.id and ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 10:46 AM
Sergei Shelukhin
 
Posts: n/a
Default any way to get rid of Bitmap Heap Scan recheck?

Hi.
I have the following join condition in a query
"posttag inner join tag ON posttag.tagid = tag.id and tag.name =
'blah'"
tag.id is PK, I have indexes on posttag.tagid and tag.name both
created with all the options set to default.
PG version is 8.1.


The query is very slow (3 minutes on test data), here's what takes all
the time, from explain results:

> Bitmap Heap Scan on tag (cost=897.06..345730.89 rows=115159 width=8)

Recheck Cond: ((name)::text = 'blah'::text)
-> Bitmap Index Scan on tag_idxn
(cost=0.00..897.06 rows=115159 width=0)
Index Cond: ((name)::text =
'blah'::text)

What is recheck? I googled some and found something about lossy
indexes but no fixes for this issue.
The only reason I ever have this index is to do joins like this one;
how do I make it not lossy?

If I cannot make it not lossy, is there any way to make it skip
recheck and say to hell with the losses?
The query without recheck will run like up to 100 times faster
according to overall query plan.

I'm pondering encoding the tag name to int or bytea field(s) and
joining on them but that's kinda ugly.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 10:48 AM
Sergei Shelukhin
 
Posts: n/a
Default Re: any way to get rid of Bitmap Heap Scan recheck?

Any ideas?

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 10:48 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: any way to get rid of Bitmap Heap Scan recheck?

Sergei Shelukhin wrote:
> Hi.
> I have the following join condition in a query
> "posttag inner join tag ON posttag.tagid = tag.id and tag.name =
> 'blah'"
> tag.id is PK, I have indexes on posttag.tagid and tag.name both
> created with all the options set to default.
> PG version is 8.1.
>
>
> The query is very slow (3 minutes on test data), here's what takes all
> the time, from explain results:
>
>> Bitmap Heap Scan on tag (cost=897.06..345730.89 rows=115159 width=8)

> Recheck Cond: ((name)::text = 'blah'::text)
> -> Bitmap Index Scan on tag_idxn
> (cost=0.00..897.06 rows=115159 width=0)
> Index Cond: ((name)::text =
> 'blah'::text)
>
> What is recheck? I googled some and found something about lossy
> indexes but no fixes for this issue.
> The only reason I ever have this index is to do joins like this one;
> how do I make it not lossy?
>
> If I cannot make it not lossy, is there any way to make it skip
> recheck and say to hell with the losses?
> The query without recheck will run like up to 100 times faster
> according to overall query plan.


A bitmapped index scan works in two stages. First the index or indexes
are scanned to create a bitmap representing matching tuples. That shows
up as Bitmap Index Scan in explain. Then all the matching tuples are
fetched from the heap, that's the Bitmap Heap Scan.

If the bitmap is larger than work_mem (because there's a lot of matching
tuples), it's stored in memory as lossy. In lossy mode, we don't store
every tuple in the bitmap, but each page with any matching tuples on it
is represented as a single bit. When performing the Bitmap Heap Scan
phase with a lossy bitmap, the pages need to be scanned, using the
Recheck condition, to see which tuples match.

The Recheck condition is always shown, even if the bitmap is not stored
as lossy and no rechecking is done.

Now let's get to your situation. The problem is almost certainly not the
rechecking or lossy bitmaps, but you can increase your work_mem to make
sure.

I'd suggest you do the usual drill: ANALYZE all relevant tables. If that
doesn't solve the problem, run EXPLAIN ANALYZE instead of just EXPLAIN.
See if you can figure something out of that, and if you need more help,
send the output back to the list together with the table definitions and
indexes of all tables involved in the query.

> I'm pondering encoding the tag name to int or bytea field(s) and
> joining on them but that's kinda ugly.


I doubt that helps, but it's hard to say without seeing the schema.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 10:48 AM
Tom Lane
 
Posts: n/a
Default Re: any way to get rid of Bitmap Heap Scan recheck?

Sergei Shelukhin <realgeek@gmail.com> writes:
> The query is very slow (3 minutes on test data), here's what takes all
> the time, from explain results:


>> Bitmap Heap Scan on tag (cost=897.06..345730.89 rows=115159 width=8)

> Recheck Cond: ((name)::text = 'blah'::text)
> -> Bitmap Index Scan on tag_idxn
> (cost=0.00..897.06 rows=115159 width=0)
> Index Cond: ((name)::text =
> 'blah'::text)


It's usually a good idea to do EXPLAIN ANALYZE on troublesome queries,
rather than trusting that the planner's estimates reflect reality.

> The query without recheck will run like up to 100 times faster
> according to overall query plan.


Sorry, but you have no concept what you're talking about. The
difference between indexscan and heap scan estimates here reflects
fetching rows from the heap, not recheck costs. Even if it were
a good idea to get rid of the recheck (which it is not), it wouldn't
reduce the costs materially.

If the table is fairly static then it might help to CLUSTER on that
index, so that the rows needed are brought together physically.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 10:49 AM
Sergei Shelukhin
 
Posts: n/a
Default Re: any way to get rid of Bitmap Heap Scan recheck?

The table is inserted into frequently so cluster will not be very
useful I guess.
The schema is as follows: there's posts table with many fields; tags
table with id, userid and name, and posttag table that stored postid
and tagid (many2many relationship of tags to posts). I need to find
post by exact tag match; there can be many records in tags table with
the same tag name because of the userid field.
I understand that I do not quite understand how the scan originally
described works; then, what would be the fastest way to perform the
query that selects by exact varchar col match?

Will adding an additional "index" column that will be calculated on
the webserver to artifically divide tags into "buckets" by creating an
index on (thisnewcolumn, name) and then, when the tag is searched for,
calculating it again and making a condition like thisnewcolumn = 5 and
tag.name = 'cats' help?

MS SQL server does similar queries very fast somehow on a similar
server, but postgres seems to be doing the scan directly from HD. For
popular tags like tags the time required to run the query on test data
can be up to ten minutes.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 10:49 AM
Sergei Shelukhin
 
Posts: n/a
Default Re: any way to get rid of Bitmap Heap Scan recheck?

Also I did explain analyze for these queries and query plans are
unchanged, and all tables are vacuum analyzed.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 10:49 AM
Sergei Shelukhin
 
Posts: n/a
Default Re: any way to get rid of Bitmap Heap Scan recheck?

Btw, I also started noticing bitmap heap scan on some queries that
join very large tables on int columns, and everywhere it shows up the
query runs slow (I just can't believe in 15 minutes required to join
two tables resulting in 180k rows with one int column each using only
indices in explain analyze, even if it runs entirely from swap!!)
Is disabling bitmap scans generally a good idea? I disabled it on one
query and it seeems to be running faster with simple seq scan, bgut
for my varchar query it will probably be suboptimal...

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 11:04 AM
Scott Marlowe
 
Posts: n/a
Default Re: any way to get rid of Bitmap Heap Scan recheck?

Sergei Shelukhin wrote:
> Hi.
> I have the following join condition in a query
> "posttag inner join tag ON posttag.tagid = tag.id and tag.name =
> 'blah'"
> tag.id is PK, I have indexes on posttag.tagid and tag.name both
> created with all the options set to default.
> PG version is 8.1.
>
>
> The query is very slow (3 minutes on test data), here's what takes all
> the time, from explain results:Any ideas?


Yes, post the output of

explain analyze select ... (rest of query here)

for starters

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:38 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com