This is a discussion on hashjoin chosen over 1000x faster plan within the Pgsql Performance forums, part of the PostgreSQL category; --> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > The point I'm trying to make is that at planning time the > pg_statistic ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > The point I'm trying to make is that at planning time the > pg_statistic row for this "Charge"."reopHistSeqNo" column showed > stanullfrac as 0.989; it doesn't seem to have taken this into account > when making its guess about how many rows would be joined when it was > compared to the primary key column of the "CaseHist" table. It certainly does take nulls into account, but the estimate of resulting rows was still nonzero; and even if it were zero, I'd be very hesitant to make it choose a plan that is fast only if there were exactly zero such rows and is slow otherwise. Most of the complaints we've had about issues of this sort involve the opposite problem, ie, the planner is choosing a plan that works well for few rows but falls down because reality involves many rows. "Fast-for-few-rows" plans are usually a lot more brittle than the alternatives in terms of the penalty you pay for too many rows, and so putting a thumb on the scales to push it towards a "fast" corner case sounds pretty unsafe to me. As Simon notes, the only technically sound way to handle this would involve run-time plan changeover, which is something we're not nearly ready to tackle. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| >>> On Wed, Oct 10, 2007 at 3:32 PM, in message <23650.1192048377@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'd be very hesitant > to make it choose a plan that is fast only if there were exactly zero > such rows and is slow otherwise. I'm not sure why it looks at the slow option at all; it seems like a remaining weakness in the OUTER JOIN optimizations. If I change the query to usean inner join between the CaseHist table and the view, I get more of what I was expecting for the "slow" option. This ten times faster, and I can't see why it would not be usable with an outer join. bigbird=# explain analyze bigbird-# SELECT bigbird-# "CH"."caseNo", bigbird-# "CH"."countyNo", bigbird-# "CH"."chargeNo", bigbird-# "CH"."statuteCite", bigbird-# "CH"."sevClsCode", bigbird-# "CH"."modSevClsCode", bigbird-# "CH"."descr", bigbird-# "CH"."offenseDate", bigbird-# "CH"."pleaCode", bigbird-# "CH"."pleaDate", bigbird-# "CH"."chargeSeqNo", bigbird-# "CHST"."eventDate" AS "reopEventDate", bigbird-# "CTHE"."descr" AS "reopEventDescr" bigbird-# FROM "Charge" "CH" bigbird-# LEFT OUTER JOIN "CaseHist" "CHST" bigbird-# ON ( "CHST"."countyNo" = "CH"."countyNo" bigbird(# AND "CHST"."caseNo" = "CH"."caseNo" bigbird(# AND "CHST"."histSeqNo" = "CH"."reopHistSeqNo" bigbird(# ) bigbird-# JOIN "CaseTypeHistEvent" "CTHE" bigbird-# ON ( "CHST"."eventType" = "CTHE"."eventType" bigbird(# AND "CHST"."caseType" = "CTHE"."caseType" bigbird(# AND "CHST"."countyNo" = "CTHE"."countyNo" bigbird(# ) bigbird-# WHERE ( bigbird(# ("CH"."caseNo" = '2004CF002575') bigbird(# AND ("CH"."countyNo" = 13)) bigbird-# ORDER BY bigbird-# "chargeNo", bigbird-# "chargeSeqNo" bigbird-# ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=126.69..126.70 rows=1 width=168) (actual time=36.854..36.855 rows=2 loops=1) Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo" -> Nested Loop Left Join (cost=0.00..126.68 rows=1 width=168) (actual time=36.465..36.623 rows=2 loops=1) Join Filter: ((d."countyNo")::smallint = (c."countyNo")::smallint) -> Nested Loop (cost=0.00..123.44 rows=1 width=185) (actual time=24.264..24.408 rows=2 loops=1) -> Index Scan using "ControlRecord_pkey" on "ControlRecord"c (cost=0.00..4.27 rows=1 width=2) (actual time=9.424..9.427 rows=1 loops=1) Index Cond: (13 = ("countyNo")::smallint) -> Nested Loop (cost=0.00..119.16 rows=1 width=185) (actual time=14.835..14.975 rows=2 loops=1) -> Nested Loop (cost=0.00..115.67 rows=1 width=131) (actual time=8.346..8.463 rows=2 loops=1) -> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual time=5.723..8.228 rows=4 loops=1) Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar)) -> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actual time=0.052..0..053 rows=0 loops=4) Index Cond: ((13 = ("CHST"."countyNo")::smallint) AND ('2004CF002575'::bpchar = ("CHST"."caseNo")::bpchar) AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint)) -> Index Scan using "CaseTypeHistEventB_pkey" on "CaseTypeHistEventB" b (cost=0.00..3.48 rows=1 width=69) (actual time=3.248..3.250 rows=1 loops=2) Index Cond: ((("CHST"."caseType")::bpchar = (b.."caseType")::bpchar) AND (("CHST"."eventType")::bpchar = (b."eventType")::bpchar)) -> Index Scan using "CaseTypeHistEventD_CaseType" on "CaseTypeHistEventD" d (cost=0.00..3.23 rows=1 width=17) (actual time=6.103..6..103 rows=0 loops=2) Index Cond: (((d."caseType")::bpchar = (b."caseType")::bpchar) AND ((d."eventType")::bpchar = (b."eventType")::bpchar)) Total runtime: 46.072 ms (18 rows) -Kevin ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| >>> On Wed, Oct 10, 2007 at 1:54 PM, in message <1192042492.4233.334.camel@ebony.site>, Simon Riggs <simon@2ndquadrant.com> wrote: > > But the planner doesn't work on probability. It works on a best-guess > selectivity, as known at planning time. The point I'm trying to make is that at planning time the pg_statistic row for this "Charge"."reopHistSeqNo" column showed stanullfrac as 0.989; it doesn't seem to have taken this into account when making its guess about how many rows would be joined when it was compared to the primary key column of the "CaseHist" table. I'm suggesting that it might be a good thing if it did. -Kevin ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| >>> On Wed, Oct 10, 2007 at 3:48 PM, in message <470CF450.EE98.0025.0@wicourts.gov>, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > > This ten times faster That understates it -- I forgot to get things cached, as I had done for all the other tests. When cached, this is sub-millisecond, although not quite the 1000-fold increase which I get when no matches are found. -Kevin Sort (cost=126.70..126.70 rows=1 width=168) (actual time=0.259..0..261 rows=2 loops=1) Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo" -> Nested Loop Left Join (cost=0.00..126.69 rows=1 width=168) (actual time=0.157..0.234 rows=2 loops=1) Join Filter: ((d."countyNo")::smallint = (c."countyNo")::smallint) -> Nested Loop (cost=0.00..123.44 rows=1 width=185) (actual time=0.139..0.203 rows=2 loops=1) -> Index Scan using "ControlRecord_pkey" on "ControlRecord"c (cost=0.00..4.27 rows=1 width=2) (actual time=0.024..0.026 rows=1 loops=1) Index Cond: (13 = ("countyNo")::smallint) -> Nested Loop (cost=0.00..119.17 rows=1 width=185) (actual time=0.109..0.169 rows=2 loops=1) -> Nested Loop (cost=0.00..115.67 rows=1 width=131) (actual time=0.087..0.127 rows=2 loops=1) -> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual time=0.038..0.051 rows=4 loops=1) Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar)) -> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actual time=0.014..0..015 rows=0 loops=4) Index Cond: ((13 = ("CHST"."countyNo")::smallint) AND ('2004CF002575'::bpchar = ("CHST"."caseNo")::bpchar) AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint)) -> Index Scan using "CaseTypeHistEventB_pkey" on "CaseTypeHistEventB" b (cost=0.00..3.48 rows=1 width=69) (actual time=0.015..0.017 rows=1 loops=2) Index Cond: ((("CHST"."caseType")::bpchar = (b.."caseType")::bpchar) AND (("CHST"."eventType")::bpchar = (b."eventType")::bpchar)) -> Index Scan using "CaseTypeHistEventD_CaseType" on "CaseTypeHistEventD" d (cost=0.00..3.23 rows=1 width=17) (actual time=0.011..0..011 rows=0 loops=2) Index Cond: (((d."caseType")::bpchar = (b."caseType")::bpchar) AND ((d."eventType")::bpchar = (b."eventType")::bpchar)) Total runtime: 0.605 ms (18 rows) ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I'm not sure why it looks at the slow option at all; it seems like a remain= > ing weakness in the OUTER JOIN optimizations. I think that comes mostly from the fact that you've got non-nullable targetlist entries in the definition of the CaseTypeHistEvent view. Those prevent that view from being flattened into the upper query when it's underneath an outer join, because the current variable-evaluation rules provide no other way to ensure that the values are forced NULL when they need to be. This is something we should fix someday but don't hold your breath waiting --- it's likely to take some pretty fundamental rejiggering of the planner's handling of Vars. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| >>> On Wed, Oct 10, 2007 at 3:48 PM, in message <470CF450.EE98.0025.0@wicourts.gov>, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > I'm not sure why it looks at the slow option at all; it seems like a > remaining weakness in the OUTER JOIN optimizations. If I change the query to > use an inner join between the CaseHist table and the view, I get more of what > I was expecting for the "slow" option. Just to wrap this up (from my perspective), it looks like we're headed to a workaround of using the underlying "base" table instead of the view. We ignore any county override of our description, but performance is good, and they were reluctant to change it to an inner join. -Kevin SELECT "CH"."caseNo", "CH"."countyNo", "CH"."chargeNo", "CH"."statuteCite", "CH"."sevClsCode", "CH"."modSevClsCode", "CH"."descr", "CH"."offenseDate", "CH"."pleaCode", "CH"."pleaDate", "CH"."chargeSeqNo", "CHST"."eventDate" AS "reopEventDate", "CTHE"."descr" AS "reopEventDescr" FROM "Charge" "CH" LEFT OUTER JOIN "CaseHist" "CHST" ON ( "CHST"."countyNo" = "CH"."countyNo" AND "CHST"."caseNo" = "CH"."caseNo" AND "CHST"."histSeqNo" = "CH"."reopHistSeqNo" ) LEFT OUTER JOIN "CaseTypeHistEventB" "CTHE" ON ( "CHST"."eventType" = "CTHE"."eventType" AND "CHST"."caseType" = "CTHE"."caseType" ) WHERE ( ("CH"."caseNo" = '2004CF002575') AND ("CH"."countyNo" = 13)) ORDER BY "chargeNo", "chargeSeqNo" ; Sort (cost=129.70..129.71 rows=4 width=168) (actual time=0.218..0..220 rows=4 loops=1) Sort Key: "CH"."chargeNo", "CH"."chargeSeqNo" -> Nested Loop Left Join (cost=0.00..129.66 rows=4 width=168) (actual time=0.059..0.190 rows=4 loops=1) -> Nested Loop Left Join (cost=0.00..115.67 rows=4 width=129) (actual time=0.055..0.139 rows=4 loops=1) -> Index Scan using "Charge_pkey" on "Charge" "CH" (cost=0.00..10.69 rows=4 width=112) (actual time=0.046..0.059 rows=4 loops=1) Index Cond: ((("countyNo")::smallint = 13) AND (("caseNo")::bpchar = '2004CF002575'::bpchar)) -> Index Scan using "CaseHist_pkey" on "CaseHist" "CHST" (cost=0.00..26.18 rows=5 width=41) (actual time=0.013..0.014 rows=0 loops=4) Index Cond: ((("CHST"."countyNo")::smallint = 13) AND (("CHST"."caseNo")::bpchar = '2004CF002575'::bpchar) AND (("CHST"."histSeqNo")::smallint = ("CH"."reopHistSeqNo")::smallint)) -> Index Scan using "CaseTypeHistEventB_pkey" on "CaseTypeHistEventB" "CTHE" (cost=0.00..3.48 rows=1 width=69) (actual time=0.008..0.009 rows=0 loops=4) Index Cond: ((("CHST"."caseType")::bpchar = ("CTHE"."caseType")::bpchar) AND (("CHST"."eventType")::bpchar = ("CTHE"."eventType")::bpchar)) Total runtime: 0.410 ms (11 rows) ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |