Unix Technical Forum

Performance of the Materialize operator in a query plan

This is a discussion on Performance of the Materialize operator in a query plan within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi, I'm having trouble understanding the cost of the Materialize operator. Consider the following plan: Nested Loop (cost=2783.91..33217.37 rows=78634 ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-24-2008, 07:16 PM
Viktor Rosenfeld
 
Posts: n/a
Default Performance of the Materialize operator in a query plan

Hi,

I'm having trouble understanding the cost of the Materialize
operator. Consider the following plan:

Nested Loop (cost=2783.91..33217.37 rows=78634 width=44) (actual
time=77.164..2478.973 rows=309 loops=1)
Join Filter: ((rank2.pre <= rank5.pre) AND (rank5.pre <=
rank2.post))
-> Nested Loop (cost=0.00..12752.06 rows=1786 width=33)
(actual time=0.392..249.255 rows=9250 loops=1)
.....
-> Materialize (cost=2783.91..2787.87 rows=396 width=22)
(actual time=0.001..0.072 rows=587 loops=9250)
-> Nested Loop (cost=730.78..2783.51 rows=396
width=22) (actual time=7.637..27.030 rows=587 loops=1)
....

The cost of the inner-most Nested Loop is 27 ms, but the total cost of
the Materialize operator is 666 ms (9250 loops * 0.072 ms per
iteration). So, Materialize introduces more than 10x overhead. Is
this the cost of writing the table to temporary storage or am I
misreading the query plan output?

Furthermore, the outer table is almost 20x as big as the inner table.
Wouldn't the query be much faster by switching the inner with the
outer table? I have switched off GEQO, so I Postgres should find the
optimal query plan.

Cheers,
Viktor

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-24-2008, 07:16 PM
Tom Lane
 
Posts: n/a
Default Re: Performance of the Materialize operator in a query plan

Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
> I'm having trouble understanding the cost of the Materialize
> operator. Consider the following plan:


> Nested Loop (cost=2783.91..33217.37 rows=78634 width=44) (actual
> time=77.164..2478.973 rows=309 loops=1)
> Join Filter: ((rank2.pre <= rank5.pre) AND (rank5.pre <=
> rank2.post))
> -> Nested Loop (cost=0.00..12752.06 rows=1786 width=33)
> (actual time=0.392..249.255 rows=9250 loops=1)
> .....
> -> Materialize (cost=2783.91..2787.87 rows=396 width=22)
> (actual time=0.001..0.072 rows=587 loops=9250)
> -> Nested Loop (cost=730.78..2783.51 rows=396
> width=22) (actual time=7.637..27.030 rows=587 loops=1)
> ....


> The cost of the inner-most Nested Loop is 27 ms, but the total cost of
> the Materialize operator is 666 ms (9250 loops * 0.072 ms per
> iteration). So, Materialize introduces more than 10x overhead.


Not hardly. Had the Materialize not been there, we'd have executed
the inner nestloop 9250 times, for a total cost of 9250 * 27ms.
(Actually it might have been less due to cache effects, but still
a whole lot more than 0.072 per iteration.)

These numbers say that it's taking the Materialize about 120 microsec
per row returned, which seems a bit high to me considering that the
data is just sitting in a tuplestore. I surmise that you are using
a machine with slow gettimeofday() and that's causing the measurement
overhead to be high.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-24-2008, 07:16 PM
Viktor Rosenfeld
 
Posts: n/a
Default Re: Performance of the Materialize operator in a query plan

Hi Tom,

>> The cost of the inner-most Nested Loop is 27 ms, but the total cost
>> of
>> the Materialize operator is 666 ms (9250 loops * 0.072 ms per
>> iteration). So, Materialize introduces more than 10x overhead.

>
> Not hardly. Had the Materialize not been there, we'd have executed
> the inner nestloop 9250 times, for a total cost of 9250 * 27ms.
> (Actually it might have been less due to cache effects, but still
> a whole lot more than 0.072 per iteration.)


I realize that Materialize saves a big amount of time in the grand
scheme, but I'm still wondering about the descrepancy between the
total cost of Materialize and the contained Nested Loop.

> These numbers say that it's taking the Materialize about 120 microsec
> per row returned, which seems a bit high to me considering that the
> data is just sitting in a tuplestore. I surmise that you are using
> a machine with slow gettimeofday() and that's causing the measurement
> overhead to be high.


Do you mean, that the overhead is an artefact of timing the query? In
that case, the query should run faster than its evaluation with
EXPLAIN ANALYZE, correct?

Is there a way to test this assumption regarding the speed of
gettimeofday? I'm on a Macbook and have no idea about the performance
of its implementation.

Cheers,
Viktor

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-24-2008, 07:16 PM
PFC
 
Posts: n/a
Default Re: Performance of the Materialize operator in a query plan

> Do you mean, that the overhead is an artefact of timing the query? In
> that case, the query should run faster than its evaluation with EXPLAIN
> ANALYZE, correct?
>
> Is there a way to test this assumption regarding the speed of
> gettimeofday? I'm on a Macbook and have no idea about the performance
> of its implementation.


Run EXPLAIN ANALYZE query
Type \timing
Run SELECT count(*) FROM (query) AS foo

\timing gives timings as seen by the client. If you're local, and the
result set is one single integer, client timings are not very different
from server timings. If the client must retrieve lots of rows, this will
be different, hence the fake count(*) above to prevent this. You might
want to explain the count(*) also to be sure the same plan is used...

And yes EXPLAIN ANALYZE has overhead, sometimes significant. Think
Heisenberg... You will measure it easily with this dumb method


Here a very dumb query :

SELECT count(*) FROM test;
count
-------
99999
(1 ligne)

Temps : 26,924 ms


test=> EXPLAIN ANALYZE SELECT count(*) FROM test;
QUERY PLAN
--------------------------------------------------------------------------------
--------------------------------
Aggregate (cost=1692.99..1693.00 rows=1 width=0) (actual
time=66.314..66.314
r
ows=1 loops=1)
-> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=0) (actual
time=0.
013..34.888 rows=99999 loops=1)
Total runtime: 66.356 ms
(3 lignes)

Temps : 66,789 ms

Apparently measuring the time it takes to get a row from the table takes
2x as long as actually getting the row from the table. Which is
reassuring, in a way, since grabbing rows out of tables isn't such an
unusual operation.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-29-2008, 09:32 PM
Viktor Rosenfeld
 
Posts: n/a
Default Re: Performance of the Materialize operator in a query plan

Hi,

using this strategy to study the overhead of EXPLAIN ANALYZE was very
insightful. Apparently, measuring the performance of the query plan
introduced a overhead of more than 10 seconds in the query I was
looking at.

Thanks,
Viktor

Am 24.04.2008 um 19:05 schrieb PFC:
>> Do you mean, that the overhead is an artefact of timing the query?
>> In that case, the query should run faster than its evaluation with
>> EXPLAIN ANALYZE, correct?
>>
>> Is there a way to test this assumption regarding the speed of
>> gettimeofday? I'm on a Macbook and have no idea about the
>> performance of its implementation.

>
> Run EXPLAIN ANALYZE query
> Type \timing
> Run SELECT count(*) FROM (query) AS foo
>
> \timing gives timings as seen by the client. If you're local, and
> the result set is one single integer, client timings are not very
> different from server timings. If the client must retrieve lots of
> rows, this will be different, hence the fake count(*) above to
> prevent this. You might want to explain the count(*) also to be sure
> the same plan is used...
>
> And yes EXPLAIN ANALYZE has overhead, sometimes significant. Think
> Heisenberg... You will measure it easily with this dumb method
>
>
> Here a very dumb query :
>
> SELECT count(*) FROM test;
> count
> -------
> 99999
> (1 ligne)
>
> Temps : 26,924 ms
>
>
> test=> EXPLAIN ANALYZE SELECT count(*) FROM test;
> QUERY PLAN
> -------------------------------------------------------------------------------- --------------------------------
> Aggregate (cost=1692.99..1693.00 rows=1 width=0) (actual
> time=66.314..66.314
> r
> ows
> =1 loops=1)
> -> Seq Scan on test (cost=0.00..1442.99 rows=99999 width=0)
> (actual
> time
> =
> 0
> . 013
> ..34.888 rows=99999 loops=1)
> Total runtime: 66.356 ms
> (3 lignes)
>
> Temps : 66,789 ms
>
> Apparently measuring the time it takes to get a row from the table
> takes 2x as long as actually getting the row from the table. Which
> is reassuring, in a way, since grabbing rows out of tables isn't
> such an unusual operation.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:39 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com