Unix Technical Forum

stable function optimizations, revisited

This is a discussion on stable function optimizations, revisited within the Pgsql Performance forums, part of the PostgreSQL category; --> I have recently been encountering a number of significant performance problems related to stable functions being called multiple times ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 08:48 AM
Phil Frost
 
Posts: n/a
Default stable function optimizations, revisited

I have recently been encountering a number of significant performance
problems related to stable functions being called multiple times when I
believe they could be called just once. Searching the ML archives, I see
I'm not the first:

<http://archives.postgresql.org/pgsql-hackers/2003-04/msg00890.php>
<http://archives.postgresql.org/pgsql-performance/2006-01/msg00140.php>

and so on. None of them seemed to resolve to a plan of action or elegant
workaround. It is mentioned that "stable" was added to allow such
functions to be used for index scans, but I could not tell if other
optimizations I would like are impossible, or possible and if so, might
or will never be implemented.

I have several examples of queries in which eliminating extra calls to a
stable function would result in very significant performance gains. All
of these cases were found while developing a real application, and
although I've simplified them to be more readable, they are not
contrived.

Problem 1: creating a view with multiple columns calculated from a
function.

create table sale(saleid serial, total numeric);
create function cost_of_sale(sale.saleid%type) returns numeric stable as $$
-- calculates the cost of purchasing the things sold in a sale
-- takes considerable time to calculate
$$;
create view convenient_view_on_sale as
select *,
cost_of_sale(saleid) as cost,
total - cost_of_sale(saleid) as profit,
case when total != 0 then (total-cost_of_sale(saleid)) / total * 100 end as margin;

Executing "select * from convenient_view_on_sale limit 1" will execute
cost_of_sale thrice. However, from the definition of stable, we know it
could have been called just once. As cost_of_sale takes hundreds of ms
to execute while the rest of the query is extremely simple, additional
calls in effect multiply the total execution time.

Nonsolution 1a: moving function to a subselect:

create view convenient_view_on_sale as
select *,
total - cost as profit,
case when total != 0 then (total-cost) / total * 100 end as margin
from (select *, cost_of_sale(saleid) as cost from sale) as subq;

The query planner will eliminate the subselect, and cost_of_sale will
still be executed thrice. I can observe no change in behaviour
whatsoever with this view definition.

PS: I wonder what the behaviour would be if I explicitly inlined
cost_of_sale here?

Nonsolution 1b: preventing optimization of the subselect with "offset 0"

create view convenient_view_on_sale as
select *,
total - cost as profit,
case when total != 0 then (total-cost) / total * 100 end as margin
from (select *, cost_of_sale(saleid) as cost from sale offset 0) as subq;

This helps in the case of a "select *"; the subquery will not be
eliminated due to the "offset 0", and cost_of_sale will be executed only
once. However, it will always be executed, even if none of the cost
related columns are selected. For exaple,
"select saleid from convenient_view_on_sale limit 1" will execute
cost_of_sale once, although it could have not been executed at all.

Problem 1 has a workaround: perform the dependant calculations (profit
and margin in this case) on the client, or in a stored procedure. This
is often inconvienent, but it works.

Problem 2: expensive functions returning composite types.

Consider that the purchases for a sale might have not yet been made, so
the exact cost can not be known, but a guess can be made based on the
current prices. cost_of_sale might be updated to reflect this:

create function cost_of_sale(sale.saleid%type, out cost numeric, out estimated bool)
stable as $$ ... $$;

create view convenient_view_on_sale as
select *, cost_of_sale(saleid) from sale;

Note that in many cases, calculating "cost" and "estimated" together
takes just as long as calculating either one. This is why both are
returned from the same function.

Now, I use python as a client, in particular the psycopg2 module. When I
do something such as "select cost from convenient_view_on_sale", the
values returned for the cost column (a composite type (numeric, bool))
are strings. Perhaps this is an issue with psycopg2, but as a user, this
is very annoying since I can not really get at the components of the
composite type without reimplementing pg's parser. Granted I could
probably do it simply in a way that work work most the time, but I feel
it would be error prone, and I'd rather not.

Thus, I seek a way to get the components of the cost column in top-level
columns. For example I try, "select (cost).cost, (cost).estimated", but
this now executes cost_of_sale twice, doubling the time of my query.

Since stable functions are the most common in my experience, and I have
quite a number of them that perform complex, slow queries, I'd really
like to see optimizations in this area. Until such a time, I would very
much appreciate any workaround suggestions.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:08 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com