Unix Technical Forum

plan difference between set-returning function with ROWS within IN() and a plain join

This is a discussion on plan difference between set-returning function with ROWS within IN() and a plain join within the Pgsql Performance forums, part of the PostgreSQL category; --> L.S. I'm noticing a difference in planning between a join and an in() clause, before trying to create an ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-07-2008, 11:18 AM
Frank van Vugt
 
Posts: n/a
Default plan difference between set-returning function with ROWS within IN() and a plain join

L.S.

I'm noticing a difference in planning between a join and an in() clause,
before trying to create an independent test-case, I'd like to know if there's
an obvious reason why this would be happening:


=> the relatively simple PLPGSQL si_credit_tree() function has 'ROWS 5' in
it's definition


df=# select version();
version
------------------------------------------------------------------------
PostgreSQL 8.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
(1 row)



db=# explain analyse
select sum(si.base_total_val)
from sales_invoice si, si_credit_tree(80500007) foo(id)
where si.id = foo.id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=42.73..42.74 rows=1 width=8) (actual time=0.458..0.459
rows=1 loops=1)
-> Nested Loop (cost=0.00..42.71 rows=5 width=8) (actual
time=0.361..0.429 rows=5 loops=1)
-> Function Scan on si_credit_tree foo (cost=0.00..1.30 rows=5
width=4) (actual time=0.339..0.347 rows=5 loops=1)
-> Index Scan using sales_invoice_pkey on sales_invoice si
(cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5)
Index Cond: (si.id = foo.id)

Total runtime: 0.562 ms




db=# explain analyse
select sum(base_total_val)
from sales_invoice
where id in (select id from si_credit_tree(80500007));
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=15338.31..15338.32 rows=1 width=8) (actual
time=3349.401..3349.402 rows=1 loops=1)
-> Seq Scan on sales_invoice (cost=0.00..15311.19 rows=10846 width=8)
(actual time=0.781..3279.046 rows=21703 loops=1)
Filter: (subplan)
SubPlan
-> Function Scan on si_credit_tree (cost=0.00..1.30 rows=5
width=0) (actual time=0.146..0.146 rows=1 loops=21703)

Total runtime: 3349.501 ms





I'd hoped the planner would use the ROWS=5 knowledge a bit better:


db=# explain analyse
select sum(base_total_val)
from sales_invoice
where id in (80500007,80500008,80500009,80500010,80500011);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=40.21..40.22 rows=1 width=8) (actual time=0.105..0.106
rows=1 loops=1)
-> Bitmap Heap Scan on sales_invoice (cost=21.29..40.19 rows=5 width=8)
(actual time=0.061..0.070 rows=5 loops=1)
Recheck Cond: (id = ANY
('{80500007,80500008,80500009,80500010,80500011}': :integer[]))
-> Bitmap Index Scan on sales_invoice_pkey (cost=0.00..21.29 rows=5
width=0) (actual time=0.049..0.049 rows=5 loops=1)
Index Cond: (id = ANY
('{80500007,80500008,80500009,80500010,80500011}': :integer[]))

Total runtime: 0.201 ms






--
Best,




Frank.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 05-07-2008, 11:18 AM
PFC
 
Posts: n/a
Default Re: plan difference between set-returning function with ROWS within IN() and a plain join

On Tue, 06 May 2008 10:21:43 +0200, Frank van Vugt <ftm.van.vugt@foxi.nl>
wrote:

> L.S.
>
> I'm noticing a difference in planning between a join and an in() clause,
> before trying to create an independent test-case, I'd like to know if
> there's
> an obvious reason why this would be happening:


Is the function STABLE ?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 05-07-2008, 11:18 AM
Frank van Vugt
 
Posts: n/a
Default Re: plan difference between set-returning function with ROWS within IN() and a plain join

> > I'm noticing a difference in planning between a join and an in() clause,
> > before trying to create an independent test-case, I'd like to know if
> > there's
> > an obvious reason why this would be happening:

>
> Is the function STABLE ?


Yep.

For the record, even changing it to immutable doesn't make a difference in
performance here.



--
Best,




Frank.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 05-07-2008, 11:18 AM
Tom Lane
 
Posts: n/a
Default Re: plan difference between set-returning function with ROWS within IN() and a plain join

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> db=# explain analyse
> select sum(base_total_val)
> from sales_invoice
> where id in (select id from si_credit_tree(80500007));


Did you check whether this query even gives the right answer? The
EXPLAIN output shows that 21703 rows of sales_invoice are being
selected, which is a whole lot different than the other behavior.

I think you forgot the alias foo(id) in the subselect and it's
actually reducing to "where id in (id)", ie, TRUE.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 05-07-2008, 11:19 AM
Frank van Vugt
 
Posts: n/a
Default Re: plan difference between set-returning function with ROWS within IN() and a plain join

> > db=# explain analyse
> > select sum(base_total_val)
> > from sales_invoice
> > where id in (select id from si_credit_tree(80500007));

>
> Did you check whether this query even gives the right answer?


You knew the right answer to that already

> I think you forgot the alias foo(id) in the subselect and it's
> actually reducing to "where id in (id)", ie, TRUE.


Tricky, but completely obvious once pointed out, that's _exactly_ what was
happening.


db=# explain analyse
select sum(base_total_val)
from sales_invoice
where id in (select id from si_credit_tree(80500007) foo(id));
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=42.79..42.80 rows=1 width=8) (actual time=0.440..0.441
rows=1 loops=1)
-> Nested Loop (cost=1.31..42.77 rows=5 width=8) (actual
time=0.346..0.413 rows=5 loops=1)
-> HashAggregate (cost=1.31..1.36 rows=5 width=4) (actual
time=0.327..0.335 rows=5 loops=1)
-> Function Scan on si_credit_tree foo (cost=0.00..1.30
rows=5 width=4) (actual time=0.300..0.306 rows=5 loops=1)
-> Index Scan using sales_invoice_pkey on sales_invoice
(cost=0.00..8.27 rows=1 width=12) (actual time=0.006..0.008 rows=1 loops=5)
Index Cond: (sales_invoice.id = foo.id)

Total runtime: 0.559 ms




Thanks for the replies!


--
Best,




Frank.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 05-10-2008, 03:06 PM
Merlin Moncure
 
Posts: n/a
Default Re: plan difference between set-returning function with ROWS within IN() and a plain join

On Tue, May 6, 2008 at 11:27 AM, Frank van Vugt <ftm.van.vugt@foxi.nl> wrote:
>> > db=# explain analyse
>> > select sum(base_total_val)
>> > from sales_invoice
>> > where id in (select id from si_credit_tree(80500007));

>>
>> Did you check whether this query even gives the right answer?

>
> You knew the right answer to that already
>
>> I think you forgot the alias foo(id) in the subselect and it's
>> actually reducing to "where id in (id)", ie, TRUE.

>
> Tricky, but completely obvious once pointed out, that's _exactly_ what was
> happening.


This is one of the reasons why, for a table named 'foo', I name the
columns 'foo_id', not 'id'. Also, if you prefix the id column with
the table name, you can usually use JOIN USING which is a little bit
tighter and easier than JOIN ON.

merlin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:58 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com