This is a discussion on horizontal partition within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi all, I have a big table with ~ 10 Milion rows, and is a very pain administer it, ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all, I have a big table with ~ 10 Milion rows, and is a very pain administer it, so after years I convinced my self to partition it and replace the table usage ( only for reading ) with a view. Now my user_logs table is splitted in 4: user_logs user_logs_2002 user_logs_2003 user_logs_2004 and the view v_user_logs is builded on top of these tables: CREATE OR REPLACE VIEW v_user_logs AS SELECT * FROM user_logs UNION ALL SELECT * FROM user_logs_2002 UNION ALL SELECT * FROM user_logs_2003 UNION ALL SELECT * FROM user_logs_2004 ; the view is performing really well: empdb=# explain analyze select * from v_user_logs where id_user = sp_id_user('kalman'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan v_user_logs (cost=0.00..895.45 rows=645 width=88) (actual time=17.039..2345.388 rows=175 loops=1) -> Append (cost=0.00..892.23 rows=645 width=67) (actual time=17.030..2344.195 rows=175 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..120.70 rows=60 width=67) (actual time=17.028..17.036 rows=1 loops=1) -> Index Scan using idx_user_user_logs on user_logs (cost=0.00..120.40 rows=60 width=67) (actual time=17.012..17.018 rows=1 loops=1) Index Cond: (id_user = 4185) -> Subquery Scan "*SELECT* 2" (cost=0.00..475.44 rows=316 width=67) (actual time=49.406..1220.400 rows=79 loops=1) -> Index Scan using idx_user_user_logs_2004 on user_logs_2004 (cost=0.00..473.86 rows=316 width=67) (actual time=49.388..1219.386 rows=79 loops=1) Index Cond: (id_user = 4185) -> Subquery Scan "*SELECT* 3" (cost=0.00..204.33 rows=188 width=67) (actual time=59.375..1068.806 rows=95 loops=1) -> Index Scan using idx_user_user_logs_2003 on user_logs_2003 (cost=0.00..203.39 rows=188 width=67) (actual time=59.356..1067.934 rows=95 loops=1) Index Cond: (id_user = 4185) -> Subquery Scan "*SELECT* 4" (cost=0.00..91.75 rows=81 width=67) (actual time=37.623..37.623 rows=0 loops=1) -> Index Scan using idx_user_user_logs_2002 on user_logs_2002 (cost=0.00..91.35 rows=81 width=67) (actual time=37.618..37.618 rows=0 loops=1) Index Cond: (id_user = 4185) Total runtime: 2345.917 ms (15 rows) the problem is now if this view is used in others views like this: CREATE OR REPLACE VIEW v_ua_user_login_logout_tmp AS SELECT u.login, ul.* FROM user_login u, v_user_logs ul WHERE u.id_user = ul.id_user ; empdb=# explain analyze select * from v_ua_user_login_logout_tmp where login = 'kalman'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=4.01..228669.81 rows=173 width=100) (actual time=1544.784..116490.363 rows=175 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Subquery Scan ul (cost=0.00..193326.71 rows=7067647 width=88) (actual time=5.677..108190.096 rows=7067831 loops=1) -> Append (cost=0.00..157988.47 rows=7067647 width=67) (actual time=5.669..77109.995 rows=7067831 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..8158.48 rows=362548 width=67) (actual time=5.666..3379.178 rows=362862 loops=1) -> Seq Scan on user_logs (cost=0.00..6345.74 rows=362548 width=67) (actual time=5.645..1395.673 rows=362862 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..93663.88 rows=4191588 width=67) (actual time=9.149..35094.798 rows=4191580 loops=1) -> Seq Scan on user_logs_2004 (cost=0.00..72705.94 rows=4191588 width=67) (actual time=9.117..16531.486 rows=4191580 loops=1) -> Subquery Scan "*SELECT* 3" (cost=0.00..44875.33 rows=2008233 width=67) (actual time=0.562..24017.680 rows=2008190 loops=1) -> Seq Scan on user_logs_2003 (cost=0.00..34834.17 rows=2008233 width=67) (actual time=0.542..13224.265 rows=2008190 loops=1) -> Subquery Scan "*SELECT* 4" (cost=0.00..11290.78 rows=505278 width=67) (actual time=7.100..3636.163 rows=505199 loops=1) -> Seq Scan on user_logs_2002 (cost=0.00..8764.39 rows=505278 width=67) (actual time=6.446..1474.709 rows=505199 loops=1) -> Hash (cost=4.00..4.00 rows=1 width=16) (actual time=0.083..0.083 rows=0 loops=1) -> Index Scan using user_login_login_key on user_login u (cost=0.00..4.00 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1) Index Cond: ((login)::text = 'kalman'::text) Total runtime: 116491.056 ms (16 rows) as you can see the index scan is not used anymore. Do you see any problem on this approach ? Regards Gaetano Mendola ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Josh Berkus wrote: > Gaetano, > > >>I did in that way just to not use postgresql specific feature. >>I can give it a try and I let you know, however the question remain, >>why the index usage is lost if used in that way ? > > > Because PostgreSQL is materializing the entire UNION data set in the > subselect. What Postgres version are you using? I thought this was fixed > in 7.4, but maybe not ... > Yes, I'm using with 7.4.x, so it was not fixed... Regards Gaetano Mendola ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Josh Berkus wrote: > Gaetano, > > >>I have a big table with ~ 10 Milion rows, and is a very >>pain administer it, so after years I convinced my self >>to partition it and replace the table usage ( only for reading ) >>with a view. >> >>Now my user_logs table is splitted in 4: >> >>user_logs >>user_logs_2002 >>user_logs_2003 >>user_logs_2004 > > > Any reason you didn't use inheritance? I did in that way just to not use postgresql specific feature. I can give it a try and I let you know, however the question remain, why the index usage is lost if used in that way ? Regards Gaetano Mendola ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Gaetano, > I have a big table with ~ 10 Milion rows, and is a very > pain administer it, so after years I convinced my self > to partition it and replace the table usage ( only for reading ) > with a view. > > Now my user_logs table is splitted in 4: > > user_logs > user_logs_2002 > user_logs_2003 > user_logs_2004 Any reason you didn't use inheritance? -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Gaetano, > I did in that way just to not use postgresql specific feature. > I can give it a try and I let you know, however the question remain, > why the index usage is lost if used in that way ? Because PostgreSQL is materializing the entire UNION data set in the subselect. What Postgres version are you using? I thought this was fixed in 7.4, but maybe not ... -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >>The issue here is that the planner is capable of "pushing down" the WHERE >>criteria into the first view, but not into the second, "nested" view, and so >>postgres materializes the UNIONed data set before perfoming the join. > > >>Thing is, I seem to recall that this particular issue was something Tom fixed >>a while ago. Which is why I wanted to know what version Gaetano is using. > > > It's still true that we can't generate a nestloop-with-inner-indexscan > join plan if the inner side is anything more complex than a single table > scan. Since that's the only plan that gives you any chance of not > scanning the whole partitioned table, it's rather a hindrance :-( > > It might be possible to fix this by treating the nestloop's join > conditions as "push down-able" criteria, instead of the present rather > ad hoc method for generating nestloop/indexscan plans. It'd be quite > a deal of work though, and I'm concerned about how slow the planner > might run if we did do it like that. > I don't know if this will help my attempt to perform an horizontal partition, if it do I think that it can solve lot of problems out there, I tried the inheritance technique too: The table user_logs is the original one, I created two tables extending this one: CREATE TABLE user_logs_2003_h () inherits (user_logs); CREATE TABLE user_logs_2002_h () inherits (user_logs); I defined on this table the index already defined on user_logs. And this is the result: empdb=# explain analyze select * from user_logs where id_user = sp_id_user('kalman'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.00..426.33 rows=335 width=67) (actual time=20.891..129.218 rows=98 loops=1) -> Append (cost=0.00..426.33 rows=335 width=67) (actual time=20.871..128.643 rows=98 loops=1) -> Index Scan using idx_user_user_logs on user_logs (cost=0.00..133.11 rows=66 width=67) (actual time=20.864..44.594 rows=3 loops=1) Index Cond: (id_user = 4185) -> Index Scan using idx_user_user_logs_2003_h on user_logs_2003_h user_logs (cost=0.00..204.39 rows=189 width=67) (actual time=1.507..83.662 rows=95 loops=1) Index Cond: (id_user = 4185) -> Index Scan using idx_user_user_logs_2002_h on user_logs_2002_h user_logs (cost=0.00..88.83 rows=80 width=67) (actual time=0.206..0.206 rows=0 loops=1) Index Cond: (id_user = 4185) Total runtime: 129.500 ms (9 rows) that is good, but now look what happen in a view like this one: create view to_delete AS SELECT v.login, u.* from user_login v, user_logs u where v.id_user = u.id_user; empdb=# explain analyze select * from to_delete where login = 'kalman'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=4.01..65421.05 rows=143 width=79) (actual time=1479.738..37121.511 rows=98 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Append (cost=0.00..50793.17 rows=2924633 width=67) (actual time=21.391..33987.363 rows=2927428 loops=1) -> Seq Scan on user_logs u (cost=0.00..7195.22 rows=411244 width=67) (actual time=21.385..5641.307 rows=414039 loops=1) -> Seq Scan on user_logs_2003_h u (cost=0.00..34833.95 rows=2008190 width=67) (actual time=0.024..18031.218 rows=2008190 loops=1) -> Seq Scan on user_logs_2002_h u (cost=0.00..8764.00 rows=505199 width=67) (actual time=0.005..5733.554 rows=505199 loops=1) -> Hash (cost=4.00..4.00 rows=2 width=16) (actual time=0.195..0.195 rows=0 loops=1) -> Index Scan using user_login_login_key on user_login v (cost=0.00..4.00 rows=2 width=16) (actual time=0.155..0.161 rows=1 loops=1) Index Cond: ((login)::text = 'kalman'::text) Total runtime: 37122.069 ms (10 rows) and how you can see this path is not applicable too :-( Any other suggestion ? Regards Gaetano Mendola ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |