Unix Technical Forum

Query works when kludged, but would prefer "best practice" solution

This is a discussion on Query works when kludged, but would prefer "best practice" solution within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi all, Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan below. You can see it's pretty ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:34 AM
Carlo Stonebanks
 
Posts: n/a
Default Query works when kludged, but would prefer "best practice" solution

Hi all,

Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan below.
You can see it's pretty slow. Oddly enough, an index for facility_address_id
is available but not being used, but I suspect it's questionable whether it
would be an improvement.

I knew that the filter was best applied to the results of the join - my
attempts to restructure the query with subqueries, etc didn't fool the
planner - it always figured out a plan that had this problem SEQ SCAN +
FILTER in it.

Finally, I "hid" the condition from the planner with a coalesce function -
see "SOLUTION" in the "KLUDGED QUERY" plan below.

Sure enough, a new plan appeared with a remarkable performance improvement!

The purpose of this query is to find facilities within a geographical area
when the complete address data is missing (hence the facility_address_id is
NULL).

PG is 8.4.2 on RH linux server with 1GB ram, HDD is RAID 1.

I don't like kludging like this - so any and all help or advice is
appreciated!

Carlo

ORIGINAL QUERY
select
pp.provider_id,
pp.provider_practice_id,
nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join (select * from mdx_core.zips_in_mile_range('08820', 10)) as nearby
on f.default_country_code = 'US'
and f.default_postal_code = nearby.zip
where facility_address_id is null

Hash Join (cost=30258.99..107702.53 rows=9438 width=16) (actual
time=169.516..3064.188 rows=872 loops=1)
Hash Cond: (pp.facility_id = f.facility_id)
PROBLEM:
------------
-> Seq Scan on provider_practice pp (cost=0.00..74632.55 rows=724429
width=12) (actual time=0.039..1999.457 rows=728396 loops=1)
Filter: (facility_address_id IS NULL)
------------
-> Hash (cost=29954.15..29954.15 rows=24387 width=12) (actual
time=156.668..156.668 rows=907 loops=1)
-> Nested Loop (cost=0.00..29954.15 rows=24387 width=12) (actual
time=149.891..155.343 rows=907 loops=1)
-> Function Scan on zips_in_mile_range (cost=0.00..12.50
rows=1000 width=40) (actual time=149.850..149.920 rows=66 loops=1)
-> Index Scan using facility_country_postal_code_idx on
facility f (cost=0.00..29.64 rows=24 width=15) (actual time=0.015..0.048
rows=14 loops=66)
Index Cond: ((f.default_country_code = 'US'::bpchar) AND
((f.default_postal_code)::text = zips_in_mile_range.zip))
Total runtime: 3065.338 ms


KLUDGED QUERY

select
pp.provider_id,
pp.provider_practice_id,
nearby.distance
from mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join (select * from mdx_core.zips_in_mile_range('08820', 10)) as nearby
on f.default_country_code = 'US'
and f.default_postal_code = nearby.zip
and coalesce(pp.facility_address_id, -1) = -1

Nested Loop (cost=0.00..112618.87 rows=180 width=16) (actual
time=149.680..167.261 rows=872 loops=1)
-> Nested Loop (cost=0.00..29954.15 rows=24387 width=12) (actual
time=149.659..155.018 rows=907 loops=1)
-> Function Scan on zips_in_mile_range (cost=0.00..12.50 rows=1000
width=40) (actual time=149.620..149.698 rows=66 loops=1)
-> Index Scan using facility_country_postal_code_idx on facility f
(cost=0.00..29.64 rows=24 width=15) (actual time=0.015..0.045 rows=14
loops=66)
Index Cond: ((f.default_country_code = 'US'::bpchar) AND
((f.default_postal_code)::text = zips_in_mile_range.zip))
SOLUTION
-------------
-> Index Scan using provider_practice_facility_idx on provider_practice
pp (cost=0.00..3.38 rows=1 width=12) (actual time=0.007..0.009 rows=1
loops=907)
Index Cond: (f.facility_id = pp.facility_id)
Filter: (COALESCE(facility_address_id, -1) = -1)
-------------
Total runtime: 168.275 ms

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:34 AM
Merlin Moncure
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

On 9/17/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
> Hi all,
>
> Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan below.
> You can see it's pretty slow. Oddly enough, an index for facility_address_id
> is available but not being used, but I suspect it's questionable whether it
> would be an improvement.


This looks like it might be the problem tom caught and rigged a solution to:
http://people.planetpostgresql.org/d...r-03-2007.html
(look fro band-aid).

If that's the case, the solution is to wait for 8.2.5 (coming soon).

merlin

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:34 AM
Carlo Stonebanks
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

Well, there goes my dream of getting a recommendation that will deliver a
blinding insight into how to speed up all of my queries a thousand-fold.

Thanks Merlin!

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: September 17, 2007 8:03 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query works when kludged, but would prefer "best
practice" solution

On 9/17/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
> Hi all,
>
> Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan below.
> You can see it's pretty slow. Oddly enough, an index for

facility_address_id
> is available but not being used, but I suspect it's questionable whether

it
> would be an improvement.


This looks like it might be the problem tom caught and rigged a solution to:
http://people.planetpostgresql.org/d.../134-PostgreSQ
L-Weekly-News-September-03-2007.html
(look fro band-aid).

If that's the case, the solution is to wait for 8.2.5 (coming soon).

merlin



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:34 AM
Merlin Moncure
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

On 9/17/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
> Well, there goes my dream of getting a recommendation that will deliver a
> blinding insight into how to speed up all of my queries a thousand-fold.


that's easy...delete your data! :-)

merlin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 11:34 AM
Carlo Stonebanks
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

Thanks, it worked. Client happy. Big bonus in the mail.

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: September 17, 2007 8:18 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query works when kludged, but would prefer "best
practice" solution

On 9/17/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
> Well, there goes my dream of getting a recommendation that will deliver a
> blinding insight into how to speed up all of my queries a thousand-fold.


that's easy...delete your data! :-)

merlin



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 11:34 AM
Tom Lane
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 9/17/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
>> Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan below.


> This looks like it might be the problem tom caught and rigged a solution to:
> http://people.planetpostgresql.org/d...r-03-2007.html
> (look fro band-aid).


No, fraid not, that was about misestimation of outer joins, and I see no
outer join here.

What I do see is misestimation of a set-returning-function's output:

-> Function Scan on zips_in_mile_range (cost=0.00..12.50 rows=1000 width=40) (actual time=149.850..149.920 rows=66 loops=1)

There's not any very nice way to improve that in existing releases :-(.
In 8.3 it will be possible to add a ROWS option to function definitions
to replace the default "1000 rows" estimate with some other number, but
that still helps little if the number of result rows is widely variable.

As far as kluges go: rather than kluging conditions affecting unrelated
tables, maybe you could put in a dummy constraint on the function's
output --- ie, a condition you know is always true, but the planner
won't know that, and will scale down its result-rows estimate accordingly.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 11:34 AM
Carlo Stonebanks
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

Hi Tom,

Thanks for the suggestion - this concept is pretty new to me. Can you expand
a bit on the idea of how to place such a "dummy" constraint on a function,
and the conditions on which it affects the planner? Would this require that
constraint_exclusion be set on?

(When I go to sleep, I have a dream -- and in this dream Tom writes a
brilliant three line code sample that makes it all clear to me, and I wake
up a PostgreSQL guru)

;-)

Carlo

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: September 17, 2007 11:30 PM
To: Merlin Moncure
Cc: Carlo Stonebanks; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query works when kludged, but would prefer "best
practice" solution

"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 9/17/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
>> Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan

below.

> This looks like it might be the problem tom caught and rigged a solution

to:
>

http://people.planetpostgresql.org/d.../134-PostgreSQ
L-Weekly-News-September-03-2007.html
> (look fro band-aid).


No, fraid not, that was about misestimation of outer joins, and I see no
outer join here.

What I do see is misestimation of a set-returning-function's output:

-> Function Scan on zips_in_mile_range (cost=0.00..12.50 rows=1000
width=40) (actual time=149.850..149.920 rows=66 loops=1)

There's not any very nice way to improve that in existing releases :-(.
In 8.3 it will be possible to add a ROWS option to function definitions
to replace the default "1000 rows" estimate with some other number, but
that still helps little if the number of result rows is widely variable.

As far as kluges go: rather than kluging conditions affecting unrelated
tables, maybe you could put in a dummy constraint on the function's
output --- ie, a condition you know is always true, but the planner
won't know that, and will scale down its result-rows estimate accordingly.

regards, tom lane



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 11:34 AM
Tom Lane
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> Thanks for the suggestion - this concept is pretty new to me. Can you expand
> a bit on the idea of how to place such a "dummy" constraint on a function,
> and the conditions on which it affects the planner?


Let's say that you know that the function's result column "x" can only
range from 1 to 1000. The planner does not know that, and has no
statistics from which it could guess, so it's going to fall back on
default selectivity estimates for any WHERE clause involving x.
So for instance you could tack on something like

FROM ... (select * from myfunc() where x <= 1000) ...

which will change the actual query result not at all, but will cause the
planner to reduce its estimate of the number of rows out by whatever the
default selectivity estimate for an inequality is (from memory, 0.333,
but try it and see). If that's too much or not enough, you could try
some other clauses that will never really reject any rows, for instance

where x >= 1 and x <= 1000
where x <> -1
where x is not null

Of course this technique depends on knowing something that will always
be true about your data, but most people can think of something...

Now this is not going to affect the evaluation of the function itself at
all. What it will do is affect the shape of a join plan built atop that
function scan, since joins are pretty much all about minimizing the
number of intermediate rows.

> Would this require that
> constraint_exclusion be set on?


No.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 11:34 AM
Dave Dutcher
 
Posts: n/a
Default Re: Query works when kludged, but would prefer "best practice" solution

I think Tom is talking about something like this:

explain select * from foo();
QUERY PLAN
----------------------------------------------------------------------
Function Scan on foo (cost=0.00..12.50 rows=1000 width=50)

The planner is estimating the function will return 1000 rows.


explain select * from foo() where id > 0;
QUERY PLAN
---------------------------------------------------------------------
Function Scan on foo (cost=0.00..15.00 rows=333 width=50)
Filter: (id > 0)

In the second case I am asking for all ids greater than zero, but my ids are
all positive integers. The planner doesn't know that, so it assumes the
where clause will decrease the number of results.

I would still say this is a kludge, and since you already found a kludge
that works, this may not help you at all.

Dave


-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailtogsql-performance-owner@postgresql.org] On Behalf Of Carlo
Stonebanks
Sent: Tuesday, September 18, 2007 1:29 AM
To: 'Tom Lane'; 'Merlin Moncure'
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query works when kludged, but would prefer "best
practice" solution

Hi Tom,

Thanks for the suggestion - this concept is pretty new to me. Can you expand
a bit on the idea of how to place such a "dummy" constraint on a function,
and the conditions on which it affects the planner? Would this require that
constraint_exclusion be set on?

(When I go to sleep, I have a dream -- and in this dream Tom writes a
brilliant three line code sample that makes it all clear to me, and I wake
up a PostgreSQL guru)

;-)

Carlo

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: September 17, 2007 11:30 PM
To: Merlin Moncure
Cc: Carlo Stonebanks; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query works when kludged, but would prefer "best
practice" solution

"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 9/17/07, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
>> Please see the section marked as "PROBLEM" in "ORIGINAL QUERY" plan

below.

> This looks like it might be the problem tom caught and rigged a
> solution

to:
>

http://people.planetpostgresql.org/d.../134-PostgreSQ
L-Weekly-News-September-03-2007.html
> (look fro band-aid).


No, fraid not, that was about misestimation of outer joins, and I see no
outer join here.

What I do see is misestimation of a set-returning-function's output:

-> Function Scan on zips_in_mile_range (cost=0.00..12.50 rows=1000
width=40) (actual time=149.850..149.920 rows=66 loops=1)

There's not any very nice way to improve that in existing releases :-(.
In 8.3 it will be possible to add a ROWS option to function definitions to
replace the default "1000 rows" estimate with some other number, but that
still helps little if the number of result rows is widely variable.

As far as kluges go: rather than kluging conditions affecting unrelated
tables, maybe you could put in a dummy constraint on the function's output
--- ie, a condition you know is always true, but the planner won't know
that, and will scale down its result-rows estimate accordingly.

regards, tom lane



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:00 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com