Unix Technical Forum

optimize query with a maximum(date) extraction

This is a discussion on optimize query with a maximum(date) extraction within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi all, I need to improve a query like : SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:28 AM
JS Ubei
 
Posts: n/a
Default optimize query with a maximum(date) extraction

Hi all,

I need to improve a query like :

SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;

Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with a constant ? isn't it ?

I know that I can't create a function index with an aggregative function.

How I can do ?

thanks,

jsubei




__________________________________________________ ___________________________
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:28 AM
Gregory Williamson
 
Posts: n/a
Default FW: optimize query with a maximum(date) extraction

bad address kep his from going to the list on my first try ... apologies to the moderators.

-----Original Message-----
From: Gregory Williamson
Sent: Wed 9/5/2007 4:59 AM
To: JS Ubei; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] optimize query with a maximum(date) extraction

In order to help others help you, you might provide the following:

table description (columns, types, indexes) (\d tablename from psql does nicely)

the same query run as "EXPLAIN ANALYZE <your query here>;"

Greg Williamson
Senior DBA
GlobeXplorer LLC, a DigitalGlobe company

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)



-----Original Message-----
From: pgsql-performance-owner@postgresql.org on behalf of JS Ubei
Sent: Wed 9/5/2007 3:53 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] optimize query with a maximum(date) extraction

Hi all,

I need to improve a query like :

SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;

Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with a constant ? isn't it ?

I know that I can't create a function index with an aggregative function.

How I can do ?

thanks,

jsubei




__________________________________________________ ___________________________
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:28 AM
Gregory Stark
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

"JS Ubei" <jsubei@yahoo.fr> writes:

> Hi all,
>
> I need to improve a query like :
>
> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
>
> Stupidly, I create a B-tree index on my_table(the_date), witch is logically
> not used in my query, because it's not with a constant ? isn't it ?


That's not so stupid, it would be possible for a database to make use of such
an index for this query. But it's not one of the plans Postgres knows how to
execute.

I don't think you'll find anything much faster for this particular query. You
could profile running these two (non-standard) queries:

SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC

I think the first of these can actually use your index but the latter can't
unless you create one for it specifically (which is not so easy -- it'll be
easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the
query you already have.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:28 AM
Gregory Stark
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

"Gregory Stark" <stark@enterprisedb.com> writes:

> "JS Ubei" <jsubei@yahoo.fr> writes:
>
>> I need to improve a query like :
>>
>> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;

>...
> I don't think you'll find anything much faster for this particular query. You
> could profile running these two (non-standard) queries:
>
> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC


Something else you might try:

select id,
(select min(the_date) from my_table where id=x.id) as min_date,
(select max(the_date) from my_table where id=x.id) as max_date
from (select distinct id from my_table)

Recent versions of Postgres do know how to use the index for a simple
ungrouped min() or max() like these subqueries.

This would be even better if you have a better source for the list of distinct
ids you're interested in than my_table. If you have a source that just has one
record for each id then you won't need an extra step to eliminate duplicates.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 11:28 AM
Peter Childs
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

On 05/09/07, Gregory Stark <stark@enterprisedb.com> wrote:
>
> "Gregory Stark" <stark@enterprisedb.com> writes:
>
> > "JS Ubei" <jsubei@yahoo.fr> writes:
> >
> >> I need to improve a query like :
> >>
> >> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;

> >...
> > I don't think you'll find anything much faster for this particular

> query. You
> > could profile running these two (non-standard) queries:
> >
> > SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY

> id, the_date ASC
> > SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY

> id, the_date DESC
>
> Something else you might try:
>
> select id,
> (select min(the_date) from my_table where id=x.id) as min_date,
> (select max(the_date) from my_table where id=x.id) as max_date
> from (select distinct id from my_table)
>
> Recent versions of Postgres do know how to use the index for a simple
> ungrouped min() or max() like these subqueries.
>
> This would be even better if you have a better source for the list of
> distinct
> ids you're interested in than my_table. If you have a source that just has
> one
> record for each id then you won't need an extra step to eliminate
> duplicates.
>
>

My personal reaction is why are you using distinct at all?

why not

select id,
min(the_date) as min_date,
max(the_date) as max_date
from my_table group by id;

Since 8.0 or was it earlier this will use an index should a reasonable one
exist.

Peter.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 11:28 AM
Pavel Stehule
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

>
> why not
>
> select id,
> min(the_date) as min_date,
> max(the_date) as max_date
> from my_table group by id;
>
> Since 8.0 or was it earlier this will use an index should a reasonable one
> exist.


without any limits, seq scan is optimal.

Regards
Pavel Stehule

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 11:28 AM
hubert depesz lubaczewski
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote:
> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC
> I think the first of these can actually use your index but the latter can't
> unless you create one for it specifically (which is not so easy -- it'll be
> easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the
> query you already have.


it's easy to fix the second query (fix to use index) - just change order
by to:
order by id desc, the_date desc.

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA. here's my CV!"
http://www.depesz.com/ - blog dla ciebie (i moje CV)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 11:29 AM
Gregory Stark
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

"hubert depesz lubaczewski" <depesz@depesz.com> writes:

> On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote:
>> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC
>> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC
>> I think the first of these can actually use your index but the latter can't
>> unless you create one for it specifically (which is not so easy -- it'll be
>> easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the
>> query you already have.

>
> it's easy to fix the second query (fix to use index) - just change order
> by to:
> order by id desc, the_date desc.


Cute. I didn't think of that, thanks

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 11:29 AM
Gregory Stark
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

"Peter Childs" <peterachilds@gmail.com> writes:

> My personal reaction is why are you using distinct at all?
>
> why not
>
> select id,
> min(the_date) as min_date,
> max(the_date) as max_date
> from my_table group by id;
>
> Since 8.0 or was it earlier this will use an index should a reasonable one
> exist.


That's not true for this query. In fact that was precisely the original query
he as looking to optimize.


--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-19-2008, 11:29 AM
Gregory Stark
 
Posts: n/a
Default Re: optimize query with a maximum(date) extraction

"Pavel Stehule" <pavel.stehule@gmail.com> writes:

>>
>> why not
>>
>> select id,
>> min(the_date) as min_date,
>> max(the_date) as max_date
>> from my_table group by id;
>>
>> Since 8.0 or was it earlier this will use an index should a reasonable one
>> exist.


As I mentioned in the other post that's not true for this query.

> without any limits, seq scan is optimal.


That's not necessarily true either. You could have ten distinct ids and
millions of dates for them. In that case a scan of the index which jumped
around to scan from the beginning and end of each distinct id value would be
faster. There's just no such plan type in Postgres currently.

You can simulate such a plan with the subqueries I described but there's a bit
more overhead than necessary and you need a reasonably efficient source of the
distinct ids. Also it may or may not be faster than simply scanning the whole
table like above and simulating it with subqueries makes it impossible to
choose the best plan.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:19 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com