This is a discussion on Re: Why is plan (and performance) different on partitioned table? within the Pgsql Performance forums, part of the PostgreSQL category; --> >If you don't have anything in the parent table br_1min, then deleting >the (presumably obsolete) pg_statistic rows for it ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| >If you don't have anything in the parent table br_1min, then deleting >the (presumably obsolete) pg_statistic rows for it should fix your >immediate problem. Otherwise, consider applying the attached. Tom, thanks alot for your reply. A few follow-up questions, and one potential "bug"? I've been experimenting with deleting the rows from pg_statistics. FYI, there were statistics for all master tables prior to us partioning the data. We then manually inserted the rows into each inherited partition and, when done - did a truncate of the master table. So, here's what I'm finding. 1) When I delete the rows from pg_statistics, the new plan is, indeed, a hash join. explain analyze SELECT * FROM br_1min br1 JOIN br_mods mod on br1.modules_id = mod.id WHERE ((end_time >= '2006-05-01 17:12:18-07' AND end_time < '2006-05-01 17:13:18-07')) AND mod.downloads_id IN (153226,153714,153730,153728,153727,153724,153713, 153725,153739,153722) ; Hash Join (cost=763.35..807.35 rows=1 width=109) (actual time=3.631..36.181 rows=45 loops=1) Hash Cond: ("outer".modules_id = "inner".id) -> Append (cost=1.04..40.64 rows=877 width=32) (actual time=0.198..34.872 rows=910 loops=1) -> Bitmap Heap Scan on br_1min bfs1 (cost=1.04..8.70 rows=6 width=32) (actual time=0.060..0.060 rows=0 loops=1) Recheck Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone)) -> Bitmap Index Scan on br_1min_end_idx (cost=0.00..1.04 rows=6 width=0) (actual time=0.054..0.054 rows=0 loops=1) Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone)) -> Index Scan using br_1min_20557_end_idx on br_1min_20557 bfs1 (cost=0.00..25.91 rows=869 width=32) (actual time=0.136..1.858 rows=910 loops=1) Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone)) -> Index Scan using br_1min_20570_end_idx on br_1min_20570 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=0.092..0.092 rows=0 loops=1) Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone)) -> Index Scan using br_1min_20583_end_idx on br_1min_20583 bfs1 (cost=0.00..3.02 rows=1 width=32) (actual time=32.034..32.034 rows=0 loops=1) Index Cond: ((end_time >= '2006-05-01 17:12:18-07'::timestamp with time zone) AND (end_time < '2006-05-01 17:13:18-07'::timestamp with time zone)) -> Hash (cost=761.61..761.61 rows=281 width=77) (actual time=0.487..0.487 rows=45 loops=1) -> Bitmap Heap Scan on br_mods mod (cost=20.98..761.61 rows=281 width=77) (actual time=0.264..0.435 rows=45 loops=1) Recheck Cond: ((downloads_id = 153226) OR (downloads_id = 153714) OR (downloads_id = 153730) OR (downloads_id = 153728) OR (downloads_id = 153727) OR (downloads_id = 153724) OR (downloads_id = 153713) OR (downloads_id = 153725) OR (downloads_id = 153739) OR (downloads_id = 153722)) -> BitmapOr (cost=20.98..20.98 rows=281 width=0) (actual time=0.223..0.223 rows=0 loops=1) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.091..0.091 rows=14 loops=1) Index Cond: (downloads_id = 153226) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.037..0.037 rows=2 loops=1) Index Cond: (downloads_id = 153714) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.010..0.010 rows=2 loops=1) Index Cond: (downloads_id = 153730) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1) Index Cond: (downloads_id = 153728) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.008..0.008 rows=2 loops=1) Index Cond: (downloads_id = 153727) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153724) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153713) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.007..0.007 rows=2 loops=1) Index Cond: (downloads_id = 153725) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.031..0.031 rows=16 loops=1) Index Cond: (downloads_id = 153739) -> Bitmap Index Scan on br_mods_downloads_id_idx (cost=0.00..2.10 rows=28 width=0) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: (downloads_id = 153722) Total runtime: 36.605 ms (38 rows) Note: there are 2 new partitions that our cron jobs automatically created yesterday that are being scanned, but they do not return any rows. 2) When I re-analyze the br_1min table, new rows do not appear in pg_statistics for that table. Now, my questions: 1) If there are no statistics for the master table, does postgres use the statistics for any of the partitions, or does it create a plan without any statistics related to the partitioned tables (e.g. some default plan.)? 2) I'm curious where it got an estimate of 6 rows for br_1min in "Bitmap Heap Scan on br_1min bfs1 (cost=1.04..8.70 rows=6 width=32)" Any insight? 3) Basically, I'm wondering if this strategy of deleting the rows in pg_statistics for the master tables will work in all conditions, or if it runs the risk of again using faulty statistics and choosing a bad plan. Would I be better off setting enable_mergejoin = f in the session right before I issue this query and then resetting it after? What are the risks of that approach? Now, the potentital bug: It appears that after you truncate a table, the statistics for that table still remain in pg_statistics. And, as long as there are no rows added back to that table, the same statistics remain for that table, after an ANALYZE, - and are used by queries. Once, you re-insert any rows in the table, however, new statistics will be computed. So, the bug appears to be that after a truncate, if there are no rows in a table, the old, out-dated statistics do not get overwritten. To follow are some simple tests I did to illustrate that. Maybe this is by design, or, should I post this on pg-hackers? It might be that in my case, it's better that new statitics ARE NOT inserted into pg_statistics for empty tables, but maybe the fix could be to delete the old statistics for analyzes to an empty table. Thanks again Tom for your feedback, - Mark prdb=# create table mark_temp (col1 int, col2 int); CREATE TABLE prdb=# create index mark_temp_idx on mark_temp(col1); CREATE INDEX .... I then inserted several thousand rows .... prdb=# analyze mark_temp; ANALYZE prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp'); staattnum | stadistinct -----------+------------- 1 | 9671 2 | 1 (2 rows) prdb=# explain analyze select * from mark_temp where col1 = 1045; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Index Scan using mark_temp_idx on mark_temp (cost=0.00..51.35 rows=27 width=8) (actual time=0.013..0.015 rows=1 loops=1) Index Cond: (col1 = 1045) Total runtime: 0.048 ms (3 rows) prdb=# truncate table mark_temp; TRUNCATE TABLE prdb=# analyze mark_temp; ANALYZE NOTE: STATISTICS ARE THE SAME AND IT'S STILL DOING AN INDEX SCAN INSTEAD OF A SEQ SCAN prdb=# explain analyze select * from mark_temp where col1 = 1045; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Index Scan using mark_temp_idx on mark_temp (cost=0.00..3.14 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (col1 = 1045) Total runtime: 0.031 ms (3 rows) prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp'); staattnum | stadistinct -----------+------------- 1 | 9671 2 | 1 (2 rows) prdb=# insert into mark_temp (col1,col2) values (1,100); INSERT 0 1 prdb=# analyze mark_temp; NOTE: AFTER INSERT, THERE ARE NEW STATISTICS AND IT'S DOING A SEQ SCAN NOW ANALYZE prdb=# select staattnum,stadistinct from pg_statistic where starelid = (select oid from pg_class where relname = 'mark_temp'); staattnum | stadistinct -----------+------------- 1 | -1 2 | -1 (2 rows) prdb=# explain analyze select * from mark_temp where col1 = 1045; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on mark_temp (cost=0.00..1.01 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1) Filter: (col1 = 1045) Total runtime: 0.029 ms (3 rows) |
| ||||
| "Mark Liberman" <mliberman@mixedsignals.com> writes: > Now, the potentital bug: > It appears that after you truncate a table, the statistics for that = > table still remain in pg_statistics. That's intentional, on the theory that when the table is re-populated the new contents will probably resemble the old. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |