Unix Technical Forum

Create Index (Hash) on a Large Table Taking Days...

This is a discussion on Create Index (Hash) on a Large Table Taking Days... within the Pgsql General forums, part of the PostgreSQL category; --> I have a very large table (~5GB, 100mm rows) on which I am creating an index on an int4 ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 12:17 AM
ProfKheel
 
Posts: n/a
Default Create Index (Hash) on a Large Table Taking Days...

I have a very large table (~5GB, 100mm rows) on which I am creating an
index on an int4 column:

CREATE INDEX CONCURRENTLY rums_idx2
ON rums
USING hash
(user_id);

The process has been running for about two days, and the index has
been growing; when I run du i can see that it is ~2GB on the
filesystem, though it is incrementing at a snail's pace; maybe 500kB
per minute or so. From posts I've read it seems abnormal that it
should be taking this long. My suspicion is that the hash function is
what is slowing it down or maybe I should have fsync=off.

I have shmmax at about 256MB and my conf is as follows:

shared_buffers = 224MB # min 128kB or
max_connections*16kB
work_mem = 112MB # min 64kB
maintenance_work_mem = 212MB # min 1MB

Info:

postgresql-8.2
Linux vostro200 2.6.20-15-server #2 SMP Sun Apr 15 07:41:34 UTC 2007
i686 GNU/Linux
1GB RAM
Intel CoreDuo

Any ideas?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 12:17 AM
Harald Armin Massa
 
Posts: n/a
Default Re: Create Index (Hash) on a Large Table Taking Days...

>I have a very large table (~5GB, 100mm rows) on which I am creating an
>index on an int4 column:


>CREATE INDEX CONCURRENTLY rums_idx2
> ON rums
> USING hash
> (user_id);


Why are you using a hash index on an int4 column?

as you have 100mm (mm= Million?) rows, only 1 GIG ram and the
column_name is "user_id", I suspect:
- you have far less then 100*10E6 Users
- there are less distince user_ids than rows

I just know "hash" from general programming, thatfor my belly is
announcing "hash collisions, hash collisions" ...

(within PostgreSQL I once read that hash indices are only for some
cases I could not match to something that happens in my world; since
then I am using default btree and am VERY happy.)

Best wishes,

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:52 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com