Unix Technical Forum

weird behaviour on DISTINCT ON

This is a discussion on weird behaviour on DISTINCT ON within the pgsql Hackers forums, part of the PostgreSQL category; --> Hi all, I have a query that is something like this: SELECT DISTINCT ON ( x ) x, foo(x) ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 03:33 AM
Gaetano Mendola
 
Posts: n/a
Default weird behaviour on DISTINCT ON

Hi all,
I have a query that is something like this:


SELECT DISTINCT ON ( x ) x, foo(x)
FROM ...


now what do I see is that for each different x value
the foo is executed more than once, I guess this is because
the distinct filter out the rows after executing the query.

Is this behaviour the normal one? Shall be not documented ?



Regards
Gaetano Mendola




---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 03:34 AM
Gaetano Mendola
 
Posts: n/a
Default Re: weird behaviour on DISTINCT ON

Greg Stark wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
>
>
>>now what do I see is that for each different x value
>>the foo is executed more than once, I guess this is because
>>the distinct filter out the rows after executing the query.
>>
>>Is this behaviour the normal one? Shall be not documented ?

>
>
> Usually DISTINCT ON is only really useful if you're sorting on something.
> Otherwise the choice of which record is output is completely arbitrary.
>
> So the above would typically be something like:
>
> SELECT DISTINCT ON (x), y, foo(x)
> ...
> ORDER BY x, y
>
> Now you can see why every record does need to be looked at to handle that.
> In fact the ORDER BY kicks in before output columns are generated so you can
> do things like:
>
> SELECT DISTINCT ON (x), y, foo(x)
> ...
> ORDER BY x, y, foo(x)
>
> And of course obviously foo() has to be executed for every record to do this.
>
> Postgres doesn't try to detect cases where it's safe to change the regular
> order in which things are done and delay execution of functions whose results
> aren't needed right away.
>
> You could just use
>
> SELECT x, foo(x) from (SELECT x ... GROUP BY x)
>


I totally agree and this was clear after having seen what was happening,
my warning was due the fact that in the docs is written nowhere this
drawback. A novice, like I was 4 years ago ( I discover it in the code
only today ), can burn it self.


Regards
Gaetano Mendola














---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 03:34 AM
Greg Stark
 
Posts: n/a
Default Re: weird behaviour on DISTINCT ON


Gaetano Mendola <mendola@bigfoot.com> writes:

> now what do I see is that for each different x value
> the foo is executed more than once, I guess this is because
> the distinct filter out the rows after executing the query.
>
> Is this behaviour the normal one? Shall be not documented ?


Usually DISTINCT ON is only really useful if you're sorting on something.
Otherwise the choice of which record is output is completely arbitrary.

So the above would typically be something like:

SELECT DISTINCT ON (x), y, foo(x)
...
ORDER BY x, y

Now you can see why every record does need to be looked at to handle that.
In fact the ORDER BY kicks in before output columns are generated so you can
do things like:

SELECT DISTINCT ON (x), y, foo(x)
...
ORDER BY x, y, foo(x)

And of course obviously foo() has to be executed for every record to do this.

Postgres doesn't try to detect cases where it's safe to change the regular
order in which things are done and delay execution of functions whose results
aren't needed right away.

You could just use

SELECT x, foo(x) from (SELECT x ... GROUP BY x)

--
greg


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:28 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com