Unix Technical Forum

TB-sized databases

This is a discussion on TB-sized databases within the Pgsql Performance forums, part of the PostgreSQL category; --> Tom Lane wrote: > Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: >> Tom Lane wrote: >>> There's something fishy about this --- ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #51 (permalink)  
Old 04-19-2008, 11:43 AM
Ron Mayer
 
Posts: n/a
Default Re: TB-sized databases

Tom Lane wrote:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>> Tom Lane wrote:
>>> There's something fishy about this --- given that that plan has a lower
>>> cost estimate, it should've picked it without any artificial
>>> constraints.


One final thing I find curious about this is that the estimated
number of rows is much closer in the "offset 0" form of the query.

Since the logic itself is identical, I would have expected the
estimated total number of rows for both forms of this query to
be identical.

Any reason the two plans estimate a different total number of rows?



(explain statements for the two forms of the same query
from earlier in the thread here:
http://archives.postgresql.org/pgsql...2/msg00088.php )
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #52 (permalink)  
Old 04-19-2008, 11:43 AM
Simon Riggs
 
Posts: n/a
Default Re: TB-sized databases

On Fri, 2007-12-07 at 12:45 -0500, Robert Treat wrote:
> On Thursday 06 December 2007 04:38, Simon Riggs wrote:


> > > I think you're completly overlooking the effect of disk latency has on
> > > query times. We run queries all the time that can vary from 4 hours to
> > > 12 hours in time based solely on the amount of concurrent load on the
> > > system, even though they always plan with the same cost.

> >
> > Not at all. If we had statement_cost_limit then it would be applied
> > after planning and before execution begins. The limit would be based
> > upon the planner's estimate, not the likely actual execution time.
> >

>
> This is nice, but it doesnt prevent "slow queries" reliably (which seemed to
> be in the original complaints), since query time cannot be directly traced
> back to statement cost.


Hmm, well it can be directly traced, just not with the accuracy you
desire.

We can improve the accuracy, but then we would need to run the query
first in order to find out it was killing us.

> > So yes a query may vary in execution time by a large factor as you
> > suggest, and it would be difficult to set the proposed parameter
> > accurately. However, the same is also true of statement_timeout, which
> > we currently support, so I don't see this point as an blocker.
> >
> > Which leaves us at the burning question: Would you use such a facility,
> > or would the difficulty in setting it exactly prevent you from using it
> > for real?

>
> I'm not sure. My personal instincts are that the solution is too fuzzy for me
> to rely on, and if it isnt reliable, it's not a good solution. If you look at
> all of the things people seem to think this will solve, I think I can raise
> an alternative option that would be a more definitive solution:
>
> "prevent queries from taking longer than x" -> statement_timeout.
>
> "prevent planner from switching to bad plan" -> hint system
>
> "prevent query from consuming too many resources" -> true resource
> restrictions at the database level


I like and agree with your list, as an overview. I differ slightly on
specifics.

> I'm not so much against the idea of a statement cost limit, but I think we
> need to realize that it does not really solve as many problems as people
> think, in cases where it will help it often will do so poorly, and that there
> are probably better solutions available to those problems. Of course if you
> back me into a corner I'll agree a poor solution is better than no solution,
> so...


statement_cost_limit isn't a panacea for all performance ills, its just
one weapon in the armoury. I'm caught somewhat in that whatever I
propose as a concrete next step, somebody says I should have picked
another. Oh well.

On specific points:

With hints I prefer a declarative approach, will discuss later in
release cycle.

The true resource restrictions sound good, but its still magic numbers.
How many I/Os are allowed before you kill the query? How much CPU? Those
are still going to be guessed at. How do we tell the difference between
a random I/O and a sequential I/O - there's no difference as far as
Postgres is concerned in the buffer manager, but it can cause a huge
performance difference. Whether you use real resource limits or
statement cost limits you still need to work out the size of your table
and then guess at appropriate limits.

Every other system I've seen uses resource limits, but the big problem
is that they are applied after something has been running for a long
time. It's kinda like saying I'll treat the gangrene when it reaches my
knee. I prefer to avoid the problem before it starts to hurt at all, so
I advocate learning the lessons from other systems, not simply follow
them. But having said that, I'm not against having them; its up to the
administrator how they want to manage their database, not me.

What resource limit parameters would you choose? (temp disk space etc..)

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:12 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com