This is a discussion on very slow left join within the Pgsql Performance forums, part of the PostgreSQL category; --> I've inherited an Oracle database that I'm porting to Postgres, and this has been going quite well until now. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've inherited an Oracle database that I'm porting to Postgres, and this has been going quite well until now. Unfortunately, I've found one view (a largish left join) that runs several orders of magnitude slower on Postgres than it did on Oracle. => select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 8.2.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) (1 row) After analyzing the database, the explain analyze output for the query is: Nested Loop Left Join (cost=133.51..15846.99 rows=1 width=312) (actual time=109.131..550711.374 rows=1248 loops=1) Join Filter: (log.logkey = ln.logkey) -> Nested Loop (cost=133.51..267.44 rows=1 width=306) (actual time=15.316..74.074 rows=1248 loops=1) -> Merge Join (cost=133.51..267.16 rows=1 width=325) (actual time=15.300..60.332 rows=1248 loops=1) Merge Cond: (log.eventkey = e.eventkey) Join Filter: ((e.clientkey = log.clientkey) AND (e.premiseskey = log.premiseskey)) -> Index Scan using log_eventkey_idx on log (cost=0.00..3732.14 rows=36547 width=167) (actual time=0.015..25.385 rows=36547 loops=1) Filter: (logicaldel = 'N'::bpchar) -> Sort (cost=133.51..135.00 rows=595 width=328) (actual time=15.185..16.379 rows=1248 loops=1) Sort Key: e.eventkey -> Hash Join (cost=1.30..106.09 rows=595 width=328) (actual time=0.073..2.033 rows=1248 loops=1) Hash Cond: ((e.clientkey = p.clientkey) AND (e.premiseskey = p.premiseskey)) -> Seq Scan on event e (cost=0.00..89.48 rows=1248 width=246) (actual time=0.005..0.481 rows=1248 loops=1) -> Hash (cost=1.14..1.14 rows=11 width=82) (actual time=0.059..0.059 rows=11 loops=1) -> Seq Scan on premises p (cost=0.00..1.14 rows=11 width=82) (actual time=0.004..0.020 rows=11 loops=1) Filter: (logicaldel = 'N'::bpchar) -> Index Scan using severity_pk on severity s (cost=0.00..0.27 rows=1 width=49) (actual time=0.007..0.009 rows=1 loops=1248) Index Cond: (e.severitykey = s.severitykey) -> Seq Scan on lognote ln1 (cost=0.00..15552.67 rows=1195 width=175) (actual time=1.173..440.695 rows=1244 loops=1248) Filter: ((logicaldel = 'N'::bpchar) AND (subplan)) SubPlan -> Limit (cost=4.30..8.58 rows=1 width=34) (actual time=0.171..0.171 rows=1 loops=2982720) InitPlan -> GroupAggregate (cost=0.00..4.30 rows=1 width=110) (actual time=0.089..0.089 rows=1 loops=2982720) -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=110) (actual time=0.086..0.087 rows=1 loops=2982720) Index Cond: ((clientkey = $0) AND (premiseskey = $1) AND (logkey = $2)) Filter: ((logicaldel = 'N'::bpchar) AND ((lognotetext ~~ '_%;%'::text) OR (lognotetext ~~ '_%has modified Respond Status to%'::text))) -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=34) (actual time=0.170..0.170 rows=1 loops=2982720) Index Cond: ((clientkey = $0) AND (premiseskey = $1) AND (logkey = $2)) Filter: ((logicaldel = 'N'::bpchar) AND (lognotetime = $3)) Total runtime: 550712.393 ms (31 rows) Either side of the left join runs quite fast independently. (The full query also runs well when made into an inner join, but that's not the logic I want.) The biggest difference between running each side indpendently and together in a left join is that this line in the plan for the right side of the left join: -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=110) (actual time=0.086..0.087 rows=1 loops=2982720) ....becomes this line when run independantly: -> Index Scan using lognote_pk on lognote (cost=0.00..4.28 rows=1 width=110) (actual time=0.086..0.087 rows=1 loops=2390) That's quite a few more loops in the left join. Am I right to think that it's looping so much because the analyzer is so far off when guessing the rows for the left side of the join (1 vs. 1248)? Or is there something else going on? I've tried bumping up analyze stats on a few columns, but I'm not too sure how to spot which columns it might help with and, sure enough, it didn't help. The actual query: select * from ( select * from event e, severity s, premises p, log where p.clientkey = e.clientkey and p.premiseskey = e.premiseskey and p.logicaldel = 'N' and log.logicaldel = 'N' and e.clientkey = log.clientkey and e.premiseskey = log.premiseskey and e.eventkey = log.eventkey and e.severitykey = s.severitykey ) lj left join ( select clientkey, premiseskey, logkey, lognotetime, logicaldel, case when (case when instr(lognotetext,';') = 0 then instr(lognotetext,' has modified') else instr(lognotetext,';') end) = 0 then NULL else substr(lognotetext,1, ( case when instr(lognotetext,';') = 0 then instr(lognotetext,' has modified') else instr(lognotetext,';') end ) - 1) end as responderid from lognote ln1 where logicaldel = 'N' and lognotekey in ( select lognotekey from lognote where logicaldel = 'N' and clientkey = ln1.clientkey and premiseskey = ln1.premiseskey and logkey = ln1.logkey and lognotetime = ( select min(lognotetime) from lognote where logicaldel = 'N' and ( lognotetext like '_%;%' or lognotetext like '_%has modified Respond Status to%' ) and clientkey = ln1.clientkey and premiseskey = ln1.premiseskey and logkey = ln1.logkey group by clientkey, premiseskey, logkey ) order by lognotekey limit 1 ) ) ln on ( lj.logkey = ln.logkey ) The instr() function calls are calling this version of instr: http://www.postgresql.org/docs/8.2/i...RTING-APPENDIX The relevent schema: Table "public.event" Column | Type | Modifiers ----------------+-----------------------------+------------------------ clientkey | character(30) | not null premiseskey | character(30) | not null eventkey | character(30) | not null severitykey | character(30) | Indexes: "event_pk" PRIMARY KEY, btree (clientkey, premiseskey, eventkey), tablespace "data" Foreign-key constraints: "premisesevent" FOREIGN KEY (clientkey, premiseskey) REFERENCES premises(clientkey, premiseskey) DEFERRABLE INITIALLY DEFERRED Table "public.severity" Column | Type | Modifiers --------------------+---------------+----------- severitykey | character(30) | not null severityname | text | Indexes: "severity_pk" PRIMARY KEY, btree (severitykey), tablespace "data" Table "public.premises" Column | Type | Modifiers -----------------+-----------------------------+--------------------- clientkey | character(30) | not null premiseskey | character(30) | not null logicaldel | character(1) | default 'N'::bpchar Indexes: "premises_pk" PRIMARY KEY, btree (clientkey, premiseskey), tablespace "data" Foreign-key constraints: "clientpremises" FOREIGN KEY (clientkey) REFERENCES client(clientkey) DEFERRABLE INITIALLY DEFERRED Table "public.log" Column | Type | Modifiers ----------------+-----------------------------+--------------------- clientkey | character(30) | not null premiseskey | character(30) | not null logkey | character(30) | not null logicaldel | character(1) | default 'N'::bpchar eventkey | character(30) | Indexes: "log_pk" PRIMARY KEY, btree (clientkey, premiseskey, logkey), tablespace "data" "log_ak1" btree (clientkey, premiseskey, logtime, logkey), tablespace "data" "log_eventkey_idx" btree (eventkey), tablespace "data" Foreign-key constraints: "premiseslog" FOREIGN KEY (clientkey, premiseskey) REFERENCES premises(clientkey, premiseskey) DEFERRABLE INITIALLY DEFERRED Table "public.lognote" Column | Type | Modifiers -------------+-----------------------------+--------------------- clientkey | character(30) | not null premiseskey | character(30) | not null logkey | character(30) | not null lognotekey | character(30) | not null logicaldel | character(1) | default 'N'::bpchar lognotetext | text | lognotetime | timestamp without time zone | Indexes: "lognote_pk" PRIMARY KEY, btree (clientkey, premiseskey, logkey, lognotekey), tablespace "data" "lognotekey_idx" UNIQUE, btree (lognotekey), tablespace "data" Foreign-key constraints: "log_lognote_fk1" FOREIGN KEY (clientkey, premiseskey, logkey) REFERENCES log(clientkey, premiseskey, logkey) DEFERRABLE INITIALLY DEFERRED Any help would be appreciated! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On Fri, May 16, 2008 at 11:56 AM, Ben <bench@silentmedia.com> wrote: > I've inherited an Oracle database that I'm porting to Postgres, and this has > been going quite well until now. Unfortunately, I've found one view (a > largish left join) that runs several orders of magnitude slower on Postgres > than it did on Oracle. > > => select version(); > version > ---------------------------------------------------------------------------------------------------------- > PostgreSQL 8.2.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) > 4.1.1 20070105 (Red Hat 4.1.1-52) > (1 row) 1: Update to 8.2.7. It's pretty painless, and who knows what performance bugs you might be fighting that you don't really need to. > After analyzing the database, the explain analyze output for the query is: > > Nested Loop Left Join (cost=133.51..15846.99 rows=1 width=312) (actual > time=109.131..550711.374 rows=1248 loops=1) > Join Filter: (log.logkey = ln.logkey) > -> Nested Loop (cost=133.51..267.44 rows=1 width=306) (actual > time=15.316..74.074 rows=1248 loops=1) SNIP > Total runtime: 550712.393 ms Just for giggles, try running the query like so: set enable_nestloop = off; explain analyze ... and see what happens. I'm guessing that the nested loops are bad choices here. > (case when instr(lognotetext,';') = 0 then instr(lognotetext,' has > modified') > else instr(lognotetext,';') end) = 0 then NULL Try creating indexes on the functions above, and make sure you're running the db in the C local if you can. Note you may need to dump / initdb --locale=C / reload your data if you're not in the C locale already. text_pattern_ops may be applicable here, but I'm not sure how to use it in the above functions. > Table "public.event" > Column | Type | Modifiers > ----------------+-----------------------------+------------------------ > clientkey | character(30) | not null > premiseskey | character(30) | not null > eventkey | character(30) | not null > severitykey | character(30) | Do these really need to be character and not varchar? varchar / text are better optimized in pgsql, and character often need to be cast anyway, so you might as well start with varchar. Unless you REALLY need padding in your db, avoid char(x). Don't see anything else, but who knows what someone else might see. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| ||||
| On Fri, 16 May 2008, Scott Marlowe wrote: > Just for giggles, try running the query like so: > > set enable_nestloop = off; > explain analyze ... > > and see what happens. I'm guessing that the nested loops are bad choices here. You guess correctly, sir! Doing so shaves 3 orders of magnitude off the runtime. That's nice. thinks nested loops are the way to go? It would be handy if I could make it guess correctly to begin with and didn't have to turn nested loops off each time I run this. >> Table "public.event" >> Column | Type | Modifiers >> ----------------+-----------------------------+------------------------ >> clientkey | character(30) | not null >> premiseskey | character(30) | not null >> eventkey | character(30) | not null >> severitykey | character(30) | > > Do these really need to be character and not varchar? varchar / text > are better optimized in pgsql, and character often need to be cast > anyway, so you might as well start with varchar. Unless you REALLY > need padding in your db, avoid char(x). Unfortuantely, the people who created this database made all keys 30 character strings, and we're not near a place in our release cycle where we can fix that. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |