Unix Technical Forum

Implementing Bitmap Indexes

This is a discussion on Implementing Bitmap Indexes within the pgsql Hackers forums, part of the PostgreSQL category; --> Hello. I'd like to implement bitmap indexes and want your comments. Here is an essence of what I've found ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 03:31 AM
Victor Y. Yegorov
 
Posts: n/a
Default Implementing Bitmap Indexes

Hello.

I'd like to implement bitmap indexes and want your comments. Here is
an essence of what I've found regarding bitmaps for the last month.

Consider the following table So, the bitmap for attribute A will be the
with 1 attribute A(int2): following:
# | A Val | Bitmap(s)
----+--- -----+---------------
1 | 1 1 | 11011001 0111
2 | 1 2 | 00100100 1000
3 | 2 3 | 00000010 0000
4 | 1
5 | 1
6 | 2
7 | 3
8 | 1
9 | 2
10 | 1
11 | 1
12 | 1

Some points:
1) If some new value will be inserted (say, 4) at some point of time, a new
bitmap for it will be added. Same for NULLs (if atrribute has no NOT NULL
contraint) --- one more bitmap. Or should we restrict "NOT NULL" for
bitmap'ed attributes?;

2) Queries, like "where A = 1" or "where A != 2" will require only 1 scan of
the index, while "where A < 3" will require 2 stages: 1st create a
list of
values lesser then 3, 2nd --- do OR of all bitmaps for that values.
For high cardinality attributes, this can take a lot of time;

3) Each bitmap is only a bitmap, so there should be an array of
corresponding
ctids pointers. Maybe, some more arrays (pages, don't know).

For 2)nd --- there are techniques, allowing better performance for "A < 3"
queries via increased storage space (see here for details:
http://delab.csd.auth.gr/papers/ADBIS03mmnm.pdf) and increased reaction time
for simple queries. I don't know, if they should be implemented, may later.

The most tricky part will be combinig multiple index scans on several
attributes --- as Neil Conway said on #postrgesql, this will be tricky,
as some
modifications will be needed in the index scan api. I remember, Tom Lane
suggested on-disk bitmaps --- implementing bitmap index access method
would be of much use not only for bitmap indexes, I think.

WAH compressing method should be used for bitmaps (to my mind). Also,
there is
a method of reordering heap tuples for better compression of bitmaps, I
thought
it may be possible to implement it as some option to the existing CLUSTER
command, papers:
WAH: http://www-library.lbl.gov/docs/LBNL...LBNL-49626.pdf
CLUSTER: http://www.cse.ohio-state.edu/~hakan...reordering.pdf

I'd like to hear from you, before starting to do something.

--

Victor

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 03:31 AM
Tom Lane
 
Posts: n/a
Default Re: Implementing Bitmap Indexes

"Victor Y. Yegorov" <viy@mits.lv> writes:
> I remember, Tom Lane suggested on-disk bitmaps


I have suggested no such thing, and in fact believe that the sort of
index structure you are proposing would be of very little use. What
I've been hoping to look into is *in memory* bitmaps used as an
interface between index scans and the subsequent heap lookups.
See eg this thread:
http://archives.postgresql.org/pgsql...0/msg00439.php
particularly
http://archives.postgresql.org/pgsql...0/msg00668.php

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 03:32 AM
Victor Y. Yegorov
 
Posts: n/a
Default Re: Implementing Bitmap Indexes

* Tom Lane <tgl@sss.pgh.pa.us> [29.01.2005 18:24]:
> "Victor Y. Yegorov" <viy@mits.lv> writes:
> > I remember, Tom Lane suggested on-disk bitmaps

>
> I have suggested no such thing, and in fact believe that the sort of
> index structure you are proposing would be of very little use.


Why? I thought they would be useful for data warehouse databases.

Maybe I said something "the wrong way", but what I'm trying to implement
is exactly what is said about in the first link you've posted below:
http://archives.postgresql.org/pgsql...0/msg00439.php

Or am I misunderstanding the point?


> What I've been hoping to look into is *in memory* bitmaps used as an
> interface between index scans and the subsequent heap lookups.


Sorry, that was what I've been speaking of.

Anyway, bitmap indexes API could be used for in-memory bitmaps you're speaking
of.


--

Victor Y. Yegorov

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 03:32 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Implementing Bitmap Indexes

On Sat, Jan 29, 2005 at 01:56:12PM +0200, Victor Y. Yegorov wrote:
> 2) Queries, like "where A = 1" or "where A != 2" will require only 1 scan of
> the index, while "where A < 3" will require 2 stages: 1st create a
> list of
> values lesser then 3, 2nd --- do OR of all bitmaps for that values.
> For high cardinality attributes, this can take a lot of time;
>
> 3) Each bitmap is only a bitmap, so there should be an array of
> corresponding
> ctids pointers. Maybe, some more arrays (pages, don't know).
>
> For 2)nd --- there are techniques, allowing better performance for "A < 3"
> queries via increased storage space (see here for details:
> http://delab.csd.auth.gr/papers/ADBIS03mmnm.pdf) and increased reaction time
> for simple queries. I don't know, if they should be implemented, may later.


Sorry if this is in the PDF but I didn't want to read 17 pages to find
out... for the example where 1 >= A >= 4, couldn't you just do NOT (A >=
3)? Granted, in this example it wouldn't matter, but it would be faster
to do this if you asked for A < 4. One downside is that you'd also have
to consider the NULL bitmap, if the field is nullable.
--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:27 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com