Unix Technical Forum

Re: Why is plan (and performance) different on partitioned table?

This is a discussion on Re: Why is plan (and performance) different on partitioned table? within the Pgsql Performance forums, part of the PostgreSQL category; --> >If you don't have anything in the parent table br_1min, then deleting >the (presumably obsolete) pg_statistic rows for it ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 08:42 AM
Mark Liberman
 
Posts: n/a
Default Re: Why is plan (and performance) different on partitioned table?

>If you don't have anything in the parent table br_1min, then deleting
>the (presumably obsolete) pg_statistic rows for it should fix your
>immediate problem. Otherwise, consider applying the attached.


Tom, thanks alot for your reply. A few follow-up questions, and one potential "bug"?

I've been experimenting with deleting the rows from pg_statistics. FYI, there were statistics for all master tables prior to us partioning the data. We then manually inserted the rows into each inherited partition and, when done - did a truncate of the master table.

So, here's what I'm finding.

1) When I delete the rows from pg_statistics, the new plan is, indeed, a hash join.

explain analyze
SELECT *
FROM br_1min br1 JOIN br_mods mod on br1.modules_id = mod.id
WHERE ((end_time >= '2006-05-01 17:12:18-07' AND end_time < '2006-05-01 17:13:18-07'))
AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713, 153725,153739,153722) ;

Hash Join (cost=763.35..807.35 rows=1 width=109) (actual time=3.631..36.181 rows=45 loops=1)
Hash Cond: ("outer".modules_id = "inner".id)
-> Append (cost=1.04..40.64 rows=877 width=32) (actual time=0.198..34.872 rows=910 loops=1)
-> Bitmap Heap Scan on br_1min bfs1 (cost=1.04..8.70 rows=6 width=32) (actual time=0.060..0.060 rows=0 loops=1)
Recheck Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
-> Bitmap Index Scan on br_1min_end_idx (cost=0.00..1.04 rows=6 width=0) (actual time=0.054..0.054 rows=0 loops=1)
Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
-> Index Scan using br_1min_20557_end_idx on br_1min_20557 bfs1 (cost=0.00..25.91 rows=869 width=32) (actual time=0.136..1.858 rows=910 loops=1)
Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
-> Index Scan using br_1min_20570_end_idx on br_1min_20570 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=0.092..0.092 rows=0 loops=1)
Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
-> Index Scan using br_1min_20583_end_idx on br_1min_20583 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=32.034..32.034 rows=0 loops=1)
Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone))
-> Hash (cost=761.61..761.61 rows=281 width=77) (actual time=0.487..0.487 rows=45 loops=1)
-> Bitmap Heap Scan on br_mods mod (cost=20.98..761.61 rows=281 width=77) (actual time=0.264..0.435 rows=45 loops=1)
Recheck Cond: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722))
-> BitmapOr (cost=20.98..20.98 rows=281 width=0) (actual time=0.223..0.223 rows=0 loops=1)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.091..0.091 rows=14 loops=1)
Index Cond: (downloads_id = 153226)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.037..0.037 rows=2 loops=1)
Index Cond: (downloads_id = 153714)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.010..0.010 rows=2 loops=1)
Index Cond: (downloads_id = 153730)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
Index Cond: (downloads_id = 153728)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1)
Index Cond: (downloads_id = 153727)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
Index Cond: (downloads_id = 153724)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
Index Cond: (downloads_id = 153713)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1)
Index Cond: (downloads_id = 153725)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.031..0.031 rows=16 loops=1)
Index Cond: (downloads_id = 153739)
-> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (downloads_id = 153722)
Total runtime: 36.605 ms
(38 rows)


Note: there are 2 new partitions that our cron jobs automatically created yesterday that are being scanned, but they do not return any rows.

2) When I re-analyze the br_1min table, new rows do not appear in pg_statistics for that table.

Now, my questions:

1) If there are no statistics for the master table, does postgres use the statistics for any of the partitions, or does it create a plan without any statistics related to the partitioned tables (e.g. some default plan.)?

2) I'm curious where it got an estimate of 6 rows for br_1min in "Bitmap Heap Scan on br_1min bfs1 (cost=1.04..8.70 rows=6 width=32)" Any insight?

3) Basically, I'm wondering if this strategy of deleting the rows in pg_statistics for the master tables will work in all conditions, or if it runs the risk of again using faulty statistics and choosing a bad plan. Would I be better off setting enable_mergejoin = f in the session right before I issue this query and then resetting it after? What are the risks of that approach?


Now, the potentital bug:

It appears that after you truncate a table, the statistics for that table still remain in pg_statistics. And, as long as there are no rows added back to that table, the same statistics remain for that table, after an ANALYZE, - and are used by queries. Once, you re-insert any rows in the table, however, new statistics will be computed. So, the bug appears to be that after a truncate, if there are no rows in a table, the old, out-dated statistics do not get overwritten. To follow are some simple tests I did to illustrate that. Maybe this is by design, or, should I post this on pg-hackers? It might be that in my case, it's better that new statitics ARE NOT inserted into pg_statistics for empty tables, but maybe the fix could be to delete the old statistics for analyzes to an empty table.

Thanks again Tom for your feedback,

- Mark


prdb=# create table mark_temp (col1 int, col2 int);
CREATE TABLE
prdb=# create index mark_temp_idx on mark_temp(col1);
CREATE INDEX

.... I then inserted several thousand rows ....

prdb=# analyze mark_temp;
ANALYZE
prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp');
staattnum | stadistinct
-----------+-------------
1 | 9671
2 | 1
(2 rows)

prdb=# explain analyze select * from mark_temp where col1 = 1045;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using mark_temp_idx on mark_temp (cost=0.00..51.35 rows=27 width=8) (actual time=0.013..0.015 rows=1 loops=1)
Index Cond: (col1 = 1045)
Total runtime: 0.048 ms
(3 rows)

prdb=# truncate table mark_temp;
TRUNCATE TABLE
prdb=# analyze mark_temp;
ANALYZE

NOTE: STATISTICS ARE THE SAME AND IT'S STILL DOING AN INDEX SCAN INSTEAD OF A SEQ SCAN

prdb=# explain analyze select * from mark_temp where col1 = 1045;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using mark_temp_idx on mark_temp (cost=0.00..3.14 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (col1 = 1045)
Total runtime: 0.031 ms
(3 rows)

prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp');
staattnum | stadistinct
-----------+-------------
1 | 9671
2 | 1
(2 rows)

prdb=# insert into mark_temp (col1,col2) values (1,100);
INSERT 0 1
prdb=# analyze mark_temp;

NOTE: AFTER INSERT, THERE ARE NEW STATISTICS AND IT'S DOING A SEQ SCAN NOW

ANALYZE
prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp');
staattnum | stadistinct
-----------+-------------
1 | -1
2 | -1
(2 rows)

prdb=# explain analyze select * from mark_temp where col1 = 1045;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on mark_temp (cost=0.00..1.01 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1)
Filter: (col1 = 1045)
Total runtime: 0.029 ms
(3 rows)








Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 08:43 AM
Tom Lane
 
Posts: n/a
Default Re: Why is plan (and performance) different on partitioned table?

"Mark Liberman" <mliberman@mixedsignals.com> writes:
> Now, the potentital bug:
> It appears that after you truncate a table, the statistics for that =
> table still remain in pg_statistics.


That's intentional, on the theory that when the table is re-populated
the new contents will probably resemble the old.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:18 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com