Unix Technical Forum

horizontal partition

This is a discussion on horizontal partition within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi all, I have a big table with ~ 10 Milion rows, and is a very pain administer it, ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 12:03 PM
Gaetano Mendola
 
Posts: n/a
Default horizontal partition

Hi all,
I have a big table with ~ 10 Milion rows, and is a very
pain administer it, so after years I convinced my self
to partition it and replace the table usage ( only for reading )
with a view.

Now my user_logs table is splitted in 4:

user_logs
user_logs_2002
user_logs_2003
user_logs_2004

and the view v_user_logs is builded on top of these tables:


CREATE OR REPLACE VIEW v_user_logs AS
SELECT * FROM user_logs
UNION ALL
SELECT * FROM user_logs_2002
UNION ALL
SELECT * FROM user_logs_2003
UNION ALL
SELECT * FROM user_logs_2004
;


the view is performing really well:


empdb=# explain analyze select * from v_user_logs where id_user = sp_id_user('kalman');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan v_user_logs (cost=0.00..895.45 rows=645 width=88) (actual time=17.039..2345.388 rows=175 loops=1)
-> Append (cost=0.00..892.23 rows=645 width=67) (actual time=17.030..2344.195 rows=175 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..120.70 rows=60 width=67) (actual time=17.028..17.036 rows=1 loops=1)
-> Index Scan using idx_user_user_logs on user_logs (cost=0.00..120.40 rows=60 width=67) (actual time=17.012..17.018 rows=1 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 2" (cost=0.00..475.44 rows=316 width=67) (actual time=49.406..1220.400 rows=79 loops=1)
-> Index Scan using idx_user_user_logs_2004 on user_logs_2004 (cost=0.00..473.86 rows=316 width=67) (actual time=49.388..1219.386 rows=79 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 3" (cost=0.00..204.33 rows=188 width=67) (actual time=59.375..1068.806 rows=95 loops=1)
-> Index Scan using idx_user_user_logs_2003 on user_logs_2003 (cost=0.00..203.39 rows=188 width=67) (actual time=59.356..1067.934 rows=95 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 4" (cost=0.00..91.75 rows=81 width=67) (actual time=37.623..37.623 rows=0 loops=1)
-> Index Scan using idx_user_user_logs_2002 on user_logs_2002 (cost=0.00..91.35 rows=81 width=67) (actual time=37.618..37.618 rows=0 loops=1)
Index Cond: (id_user = 4185)
Total runtime: 2345.917 ms
(15 rows)



the problem is now if this view is used in others views like this:



CREATE OR REPLACE VIEW v_ua_user_login_logout_tmp AS
SELECT
u.login,
ul.*
FROM user_login u,
v_user_logs ul
WHERE
u.id_user = ul.id_user
;



empdb=# explain analyze select * from v_ua_user_login_logout_tmp where login = 'kalman';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=4.01..228669.81 rows=173 width=100) (actual time=1544.784..116490.363 rows=175 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Subquery Scan ul (cost=0.00..193326.71 rows=7067647 width=88) (actual time=5.677..108190.096 rows=7067831 loops=1)
-> Append (cost=0.00..157988.47 rows=7067647 width=67) (actual time=5.669..77109.995 rows=7067831 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..8158.48 rows=362548 width=67) (actual time=5.666..3379.178 rows=362862 loops=1)
-> Seq Scan on user_logs (cost=0.00..6345.74 rows=362548 width=67) (actual time=5.645..1395.673 rows=362862 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..93663.88 rows=4191588 width=67) (actual time=9.149..35094.798 rows=4191580 loops=1)
-> Seq Scan on user_logs_2004 (cost=0.00..72705.94 rows=4191588 width=67) (actual time=9.117..16531.486 rows=4191580 loops=1)
-> Subquery Scan "*SELECT* 3" (cost=0.00..44875.33 rows=2008233 width=67) (actual time=0.562..24017.680 rows=2008190 loops=1)
-> Seq Scan on user_logs_2003 (cost=0.00..34834.17 rows=2008233 width=67) (actual time=0.542..13224.265 rows=2008190 loops=1)
-> Subquery Scan "*SELECT* 4" (cost=0.00..11290.78 rows=505278 width=67) (actual time=7.100..3636.163 rows=505199 loops=1)
-> Seq Scan on user_logs_2002 (cost=0.00..8764.39 rows=505278 width=67) (actual time=6.446..1474.709 rows=505199 loops=1)
-> Hash (cost=4.00..4.00 rows=1 width=16) (actual time=0.083..0.083 rows=0 loops=1)
-> Index Scan using user_login_login_key on user_login u (cost=0.00..4.00 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1)
Index Cond: ((login)::text = 'kalman'::text)
Total runtime: 116491.056 ms
(16 rows)



as you can see the index scan is not used anymore.
Do you see any problem on this approach ?



Regards
Gaetano Mendola
















---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 12:04 PM
Gaetano Mendola
 
Posts: n/a
Default Re: horizontal partition

Josh Berkus wrote:
> Gaetano,
>
>
>>I did in that way just to not use postgresql specific feature.
>>I can give it a try and I let you know, however the question remain,
>>why the index usage is lost if used in that way ?

>
>
> Because PostgreSQL is materializing the entire UNION data set in the
> subselect. What Postgres version are you using? I thought this was fixed
> in 7.4, but maybe not ...
>


Yes, I'm using with 7.4.x, so it was not fixed...



Regards
Gaetano Mendola



---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 12:04 PM
Gaetano Mendola
 
Posts: n/a
Default Re: horizontal partition

Josh Berkus wrote:
> Gaetano,
>
>
>>I have a big table with ~ 10 Milion rows, and is a very
>>pain administer it, so after years I convinced my self
>>to partition it and replace the table usage ( only for reading )
>>with a view.
>>
>>Now my user_logs table is splitted in 4:
>>
>>user_logs
>>user_logs_2002
>>user_logs_2003
>>user_logs_2004

>
>
> Any reason you didn't use inheritance?


I did in that way just to not use postgresql specific feature.
I can give it a try and I let you know, however the question remain,
why the index usage is lost if used in that way ?

Regards
Gaetano Mendola











---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 12:04 PM
Josh Berkus
 
Posts: n/a
Default Re: horizontal partition

Gaetano,

> I have a big table with ~ 10 Milion rows, and is a very
> pain administer it, so after years I convinced my self
> to partition it and replace the table usage ( only for reading )
> with a view.
>
> Now my user_logs table is splitted in 4:
>
> user_logs
> user_logs_2002
> user_logs_2003
> user_logs_2004


Any reason you didn't use inheritance?

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 12:04 PM
Josh Berkus
 
Posts: n/a
Default Re: horizontal partition

Gaetano,

> I did in that way just to not use postgresql specific feature.
> I can give it a try and I let you know, however the question remain,
> why the index usage is lost if used in that way ?


Because PostgreSQL is materializing the entire UNION data set in the
subselect. What Postgres version are you using? I thought this was fixed
in 7.4, but maybe not ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 12:05 PM
Gaetano Mendola
 
Posts: n/a
Default Re: horizontal partition

Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>
>>The issue here is that the planner is capable of "pushing down" the WHERE
>>criteria into the first view, but not into the second, "nested" view, and so
>>postgres materializes the UNIONed data set before perfoming the join.

>
>
>>Thing is, I seem to recall that this particular issue was something Tom fixed
>>a while ago. Which is why I wanted to know what version Gaetano is using.

>
>
> It's still true that we can't generate a nestloop-with-inner-indexscan
> join plan if the inner side is anything more complex than a single table
> scan. Since that's the only plan that gives you any chance of not
> scanning the whole partitioned table, it's rather a hindrance :-(
>
> It might be possible to fix this by treating the nestloop's join
> conditions as "push down-able" criteria, instead of the present rather
> ad hoc method for generating nestloop/indexscan plans. It'd be quite
> a deal of work though, and I'm concerned about how slow the planner
> might run if we did do it like that.
>


I don't know if this will help my attempt to perform an horizontal
partition, if it do I think that it can solve lot of problems out there,
I tried the inheritance technique too:

The table user_logs is the original one, I created two tables extending this one:

CREATE TABLE user_logs_2003_h () inherits (user_logs);
CREATE TABLE user_logs_2002_h () inherits (user_logs);

I defined on this table the index already defined on user_logs.


And this is the result:

empdb=# explain analyze select * from user_logs where id_user = sp_id_user('kalman');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..426.33 rows=335 width=67) (actual time=20.891..129.218 rows=98 loops=1)
-> Append (cost=0.00..426.33 rows=335 width=67) (actual time=20.871..128.643 rows=98 loops=1)
-> Index Scan using idx_user_user_logs on user_logs (cost=0.00..133.11 rows=66 width=67) (actual time=20.864..44.594 rows=3 loops=1)
Index Cond: (id_user = 4185)
-> Index Scan using idx_user_user_logs_2003_h on user_logs_2003_h user_logs (cost=0.00..204.39 rows=189 width=67) (actual time=1.507..83.662 rows=95 loops=1)
Index Cond: (id_user = 4185)
-> Index Scan using idx_user_user_logs_2002_h on user_logs_2002_h user_logs (cost=0.00..88.83 rows=80 width=67) (actual time=0.206..0.206 rows=0 loops=1)
Index Cond: (id_user = 4185)
Total runtime: 129.500 ms
(9 rows)

that is good, but now look what happen in a view like this one:


create view to_delete AS
SELECT v.login,
u.*
from user_login v,
user_logs u
where v.id_user = u.id_user;



empdb=# explain analyze select * from to_delete where login = 'kalman';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=4.01..65421.05 rows=143 width=79) (actual time=1479.738..37121.511 rows=98 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Append (cost=0.00..50793.17 rows=2924633 width=67) (actual time=21.391..33987.363 rows=2927428 loops=1)
-> Seq Scan on user_logs u (cost=0.00..7195.22 rows=411244 width=67) (actual time=21.385..5641.307 rows=414039 loops=1)
-> Seq Scan on user_logs_2003_h u (cost=0.00..34833.95 rows=2008190 width=67) (actual time=0.024..18031.218 rows=2008190 loops=1)
-> Seq Scan on user_logs_2002_h u (cost=0.00..8764.00 rows=505199 width=67) (actual time=0.005..5733.554 rows=505199 loops=1)
-> Hash (cost=4.00..4.00 rows=2 width=16) (actual time=0.195..0.195 rows=0 loops=1)
-> Index Scan using user_login_login_key on user_login v (cost=0.00..4.00 rows=2 width=16) (actual time=0.155..0.161 rows=1 loops=1)
Index Cond: ((login)::text = 'kalman'::text)
Total runtime: 37122.069 ms
(10 rows)



and how you can see this path is not applicable too :-(

Any other suggestion ?



Regards
Gaetano Mendola








---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:23 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com