This is a discussion on optimize query with a maximum(date) extraction within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi all, I need to improve a query like : SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all, I need to improve a query like : SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with a constant ? isn't it ? I know that I can't create a function index with an aggregative function. How I can do ? thanks, jsubei __________________________________________________ ___________________________ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| bad address kep his from going to the list on my first try ... apologies to the moderators. -----Original Message----- From: Gregory Williamson Sent: Wed 9/5/2007 4:59 AM To: JS Ubei; pgsql-performance@postgresql.org Subject: RE: [PERFORM] optimize query with a maximum(date) extraction In order to help others help you, you might provide the following: table description (columns, types, indexes) (\d tablename from psql does nicely) the same query run as "EXPLAIN ANALYZE <your query here>;" Greg Williamson Senior DBA GlobeXplorer LLC, a DigitalGlobe company Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. (My corporate masters made me say this.) -----Original Message----- From: pgsql-performance-owner@postgresql.org on behalf of JS Ubei Sent: Wed 9/5/2007 3:53 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] optimize query with a maximum(date) extraction Hi all, I need to improve a query like : SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; Stupidly, I create a B-tree index on my_table(the_date), witch is logically not used in my query, because it's not with a constant ? isn't it ? I know that I can't create a function index with an aggregative function. How I can do ? thanks, jsubei __________________________________________________ ___________________________ Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "JS Ubei" <jsubei@yahoo.fr> writes: > Hi all, > > I need to improve a query like : > > SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; > > Stupidly, I create a B-tree index on my_table(the_date), witch is logically > not used in my query, because it's not with a constant ? isn't it ? That's not so stupid, it would be possible for a database to make use of such an index for this query. But it's not one of the plans Postgres knows how to execute. I don't think you'll find anything much faster for this particular query. You could profile running these two (non-standard) queries: SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC I think the first of these can actually use your index but the latter can't unless you create one for it specifically (which is not so easy -- it'll be easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the query you already have. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Gregory Stark" <stark@enterprisedb.com> writes: > "JS Ubei" <jsubei@yahoo.fr> writes: > >> I need to improve a query like : >> >> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; >... > I don't think you'll find anything much faster for this particular query. You > could profile running these two (non-standard) queries: > > SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC > SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC Something else you might try: select id, (select min(the_date) from my_table where id=x.id) as min_date, (select max(the_date) from my_table where id=x.id) as max_date from (select distinct id from my_table) Recent versions of Postgres do know how to use the index for a simple ungrouped min() or max() like these subqueries. This would be even better if you have a better source for the list of distinct ids you're interested in than my_table. If you have a source that just has one record for each id then you won't need an extra step to eliminate duplicates. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On 05/09/07, Gregory Stark <stark@enterprisedb.com> wrote: > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > "JS Ubei" <jsubei@yahoo.fr> writes: > > > >> I need to improve a query like : > >> > >> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id; > >... > > I don't think you'll find anything much faster for this particular > query. You > > could profile running these two (non-standard) queries: > > > > SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY > id, the_date ASC > > SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY > id, the_date DESC > > Something else you might try: > > select id, > (select min(the_date) from my_table where id=x.id) as min_date, > (select max(the_date) from my_table where id=x.id) as max_date > from (select distinct id from my_table) > > Recent versions of Postgres do know how to use the index for a simple > ungrouped min() or max() like these subqueries. > > This would be even better if you have a better source for the list of > distinct > ids you're interested in than my_table. If you have a source that just has > one > record for each id then you won't need an extra step to eliminate > duplicates. > > My personal reaction is why are you using distinct at all? why not select id, min(the_date) as min_date, max(the_date) as max_date from my_table group by id; Since 8.0 or was it earlier this will use an index should a reasonable one exist. Peter. |
| |||
| > > why not > > select id, > min(the_date) as min_date, > max(the_date) as max_date > from my_table group by id; > > Since 8.0 or was it earlier this will use an index should a reasonable one > exist. without any limits, seq scan is optimal. Regards Pavel Stehule ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote: > SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC > SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC > I think the first of these can actually use your index but the latter can't > unless you create one for it specifically (which is not so easy -- it'll be > easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the > query you already have. it's easy to fix the second query (fix to use index) - just change order by to: order by id desc, the_date desc. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" http://www.depesz.com/ - blog dla ciebie (i moje CV) ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| "hubert depesz lubaczewski" <depesz@depesz.com> writes: > On Wed, Sep 05, 2007 at 12:30:21PM +0100, Gregory Stark wrote: >> SELECT DISTINCT ON (id) id, the_date AS min_date FROM my_table ORDER BY id, the_date ASC >> SELECT DISTINCT ON (id) id, the_date AS max_date FROM my_table ORDER BY id, the_date DESC >> I think the first of these can actually use your index but the latter can't >> unless you create one for it specifically (which is not so easy -- it'll be >> easier in 8.3 though). Worse, I'm not really sure it'll be any faster than the >> query you already have. > > it's easy to fix the second query (fix to use index) - just change order > by to: > order by id desc, the_date desc. Cute. I didn't think of that, thanks -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| "Peter Childs" <peterachilds@gmail.com> writes: > My personal reaction is why are you using distinct at all? > > why not > > select id, > min(the_date) as min_date, > max(the_date) as max_date > from my_table group by id; > > Since 8.0 or was it earlier this will use an index should a reasonable one > exist. That's not true for this query. In fact that was precisely the original query he as looking to optimize. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| "Pavel Stehule" <pavel.stehule@gmail.com> writes: >> >> why not >> >> select id, >> min(the_date) as min_date, >> max(the_date) as max_date >> from my_table group by id; >> >> Since 8.0 or was it earlier this will use an index should a reasonable one >> exist. As I mentioned in the other post that's not true for this query. > without any limits, seq scan is optimal. That's not necessarily true either. You could have ten distinct ids and millions of dates for them. In that case a scan of the index which jumped around to scan from the beginning and end of each distinct id value would be faster. There's just no such plan type in Postgres currently. You can simulate such a plan with the subqueries I described but there's a bit more overhead than necessary and you need a reasonably efficient source of the distinct ids. Also it may or may not be faster than simply scanning the whole table like above and simulating it with subqueries makes it impossible to choose the best plan. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |