Unix Technical Forum

Re: Optimizing a huge_table/tiny_table join

This is a discussion on Re: Optimizing a huge_table/tiny_table join within the Pgsql Performance forums, part of the PostgreSQL category; --> On 5/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > <kynn@panix.com> writes: > > Limit (cost=19676.75..21327.99 rows=6000 width=84) > > -> ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 08:51 AM
kynn@panix.com
 
Posts: n/a
Default Re: Optimizing a huge_table/tiny_table join


On 5/24/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> <kynn@panix.com> writes:
> > Limit (cost=19676.75..21327.99 rows=6000 width=84)
> > -> Hash Join (cost=19676.75..1062244.81 rows=3788315 width=84)
> > Hash Cond: (upper(("outer".id)::text) upper(("inner".id)::text))
> > -> Seq Scan on huge_table h (cost= 0.00..51292.43 rows=2525543 width=46)
> > -> Hash (cost=19676.00..19676.00 rows=300 width=38)
> > -> Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38)

>
> Um, if huge_table is so much bigger than tiny_table, why are the cost
> estimates for seqscanning them only about 2.5x different? There's
> something wacko about your statistics, methinks.




Well, they're not my statistics; they're explain's. You mean there's
a bug in explain? I agree that it makes no sense that the costs don't
differ as much as one would expect, but you can see right there the
numbers of rows for the two tables. At any rate, how would one go
about finding an explanation for these strange stats?

More bewildering still (and infuriating as hell--because it means that
all of my work for yesterday has been wasted) is that I can no longer
reproduce the best query plan I posted earlier, even though the tables
have not changed at all. (Hence I can't post the explain analyze for
the best query plan, which Josh Drake asked for.) No matter what
value I use for LIMIT, the query planner now insists on sequentially
scanning huge_table and ignoring the available index. (If I turn off
enable_seqscan, I get the second worst query plan I posted yesterday.)

Anyway, I take it that there is no way to bypass the optimizer and
instruct PostgreSQL exactly how one wants the search performed?

Thanks!

kj

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 08:51 AM
Andrew Sullivan
 
Posts: n/a
Default Re: Optimizing a huge_table/tiny_table join

On Thu, May 25, 2006 at 12:31:04PM -0400, kynn@panix.com wrote:
> Well, they're not my statistics; they're explain's. You mean there's


Explain doesn't get them from nowhere. How often is the table being
ANALYSEd?

> More bewildering still (and infuriating as hell--because it means that
> all of my work for yesterday has been wasted) is that I can no longer
> reproduce the best query plan I posted earlier, even though the tables
> have not changed at all. (Hence I can't post the explain analyze for


I find that very hard to believe. Didn't change _at all_? Are you
sure no VACUUMs or anything are happening automatically?

> Anyway, I take it that there is no way to bypass the optimizer and
> instruct PostgreSQL exactly how one wants the search performed?


No, there isn't.

A

--
Andrew Sullivan | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 08:51 AM
Dawid Kuroczko
 
Posts: n/a
Default Re: Optimizing a huge_table/tiny_table join

On 5/25/06, kynn@panix.com <kynn@panix.com> wrote:
> Well, they're not my statistics; they're explain's. You mean there's
> a bug in explain? I agree that it makes no sense that the costs don't
> differ as much as one would expect, but you can see right there the
> numbers of rows for the two tables. At any rate, how would one go
> about finding an explanation for these strange stats?


Well, the query planner uses statistics to deduce the best plan
possible. Explain includes this statistical data in its output.
See:
http://www.postgresql.org/docs/8.1/i...ner-stats.html
....for information about what it is all about.

The idea is that your statistics are probably not detailed enough
to help the planner. See ALTER TABLE SET STATISTICS to change
that.

> More bewildering still (and infuriating as hell--because it means that
> all of my work for yesterday has been wasted) is that I can no longer
> reproduce the best query plan I posted earlier, even though the tables
> have not changed at all. (Hence I can't post the explain analyze for
> the best query plan, which Josh Drake asked for.) No matter what
> value I use for LIMIT, the query planner now insists on sequentially
> scanning huge_table and ignoring the available index. (If I turn off
> enable_seqscan, I get the second worst query plan I posted yesterday.)
>
> Anyway, I take it that there is no way to bypass the optimizer and
> instruct PostgreSQL exactly how one wants the search performed?


There is no way to bypass. But there are many ways to tune it.



Hmm, there is a probability (though statistics are more probable
go) that you're using some older version of PostgreSQL, and you're
hitting same problem as I did:

http://archives.postgresql.org/pgsql...7/msg00345.php

Tom has provided back then a patch, which fixed it:

http://archives.postgresql.org/pgsql...7/msg00352.php

....but I don't remember when it made into release.

Regfa

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 08:52 AM
Jim Nasby
 
Posts: n/a
Default Re: Optimizing a huge_table/tiny_table join

On May 25, 2006, at 12:07 PM, Dawid Kuroczko wrote:
> On 5/25/06, kynn@panix.com <kynn@panix.com> wrote:
>> Well, they're not my statistics; they're explain's. You mean there's
>> a bug in explain? I agree that it makes no sense that the costs
>> don't
>> differ as much as one would expect, but you can see right there the
>> numbers of rows for the two tables. At any rate, how would one go
>> about finding an explanation for these strange stats?

>
> Well, the query planner uses statistics to deduce the best plan
> possible. Explain includes this statistical data in its output.
> See:
> http://www.postgresql.org/docs/8.1/i...ner-stats.html
> ...for information about what it is all about.
>
> The idea is that your statistics are probably not detailed enough
> to help the planner. See ALTER TABLE SET STATISTICS to change
> that.


http://www.pervasive-postgres.com/lp/newsletters/2006/
Insights_postgres_Mar.asp#4 might also be worth your time to read.

> Hmm, there is a probability (though statistics are more probable
> go) that you're using some older version of PostgreSQL, and you're
> hitting same problem as I did:
>
> http://archives.postgresql.org/pgsql...7/msg00345.php
>
> Tom has provided back then a patch, which fixed it:
>
> http://archives.postgresql.org/pgsql...7/msg00352.php
>
> ...but I don't remember when it made into release.


According to cvs, it's been in since 8.1 and 8.0.4:

Revision 1.111.4.2: download - view: text, markup, annotated - select
for diffs
Fri Jul 22 19:12:33 2005 UTC (10 months ago) by tgl
Branches: REL8_0_STABLE
CVS tags: REL8_0_8, REL8_0_7, REL8_0_6, REL8_0_5, REL8_0_4
Diff to: previous 1.111.4.1: preferred, colored; branchpoint 1.111:
preferred, colored; next MAIN 1.112: preferred, colored
Changes since revision 1.111.4.1: +18 -37 lines

Fix compare_fuzzy_path_costs() to behave a bit more sanely. The
original
coding would ignore startup cost differences of less than 1% of the
estimated total cost; which was OK for normal planning but highly not OK
if a very small LIMIT was applied afterwards, so that startup cost
becomes
the name of the game. Instead, compare startup and total costs fuzzily
but independently. This changes the plan selected for two queries in
the
regression tests; adjust expected-output files for resulting changes in
row order. Per reports from Dawid Kuroczko and Sam Mason.

Revision 1.124: download - view: text, markup, annotated - select for
diffs
Fri Jul 22 19:12:01 2005 UTC (10 months ago) by tgl
Branches: MAIN
CVS tags: REL8_1_0BETA3, REL8_1_0BETA2, REL8_1_0BETA1
Diff to: previous 1.123: preferred, colored
Changes since revision 1.123: +18 -37 lines

Fix compare_fuzzy_path_costs() to behave a bit more sanely. The
original
coding would ignore startup cost differences of less than 1% of the
estimated total cost; which was OK for normal planning but highly not OK
if a very small LIMIT was applied afterwards, so that startup cost
becomes
the name of the game. Instead, compare startup and total costs fuzzily
but independently. This changes the plan selected for two queries in
the
regression tests; adjust expected-output files for resulting changes in
row order. Per reports from Dawid Kuroczko and Sam Mason.

--
Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:25 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com