This is a discussion on TB-sized databases within the Pgsql Performance forums, part of the PostgreSQL category; --> On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > In fact an even more useful option would be ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > In fact an even more useful option would be to ask the planner to throw > error if the expected cost exceeds a certain threshold... Well, I've suggested it before: statement_cost_limit on pgsql-hackers, 1 March 2006 Would people like me to re-write and resubmit this patch for 8.4? Tom's previous concerns were along the lines of "How would know what to set it to?", given that the planner costs are mostly arbitrary numbers. Any bright ideas, or is it we want it and we don't care about the possible difficulties? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Tue, 2007-11-27 at 18:06 -0500, Pablo Alcaraz wrote: > Simon Riggs wrote: > > All of those responses have cooked up quite a few topics into one. Large > > databases might mean text warehouses, XML message stores, relational > > archives and fact-based business data warehouses. > > > > The main thing is that TB-sized databases are performance critical. So > > it all depends upon your workload really as to how well PostgreSQL, or > > another other RDBMS vendor can handle them. > > > > > > Anyway, my reason for replying to this thread is that I'm planning > > changes for PostgreSQL 8.4+ that will make allow us to get bigger and > > faster databases. If anybody has specific concerns then I'd like to hear > > them so I can consider those things in the planning stages > it would be nice to do something with selects so we can recover a rowset > on huge tables using a criteria with indexes without fall running a full > scan. > > In my opinion, by definition, a huge database sooner or later will have > tables far bigger than RAM available (same for their indexes). I think > the queries need to be solved using indexes enough smart to be fast on disk. OK, I agree with this one. I'd thought that index-only plans were only for OLTP, but now I see they can also make a big difference with DW queries. So I'm very interested in this area now. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Wed, 28 Nov 2007, Simon Riggs wrote: > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? Yes please. The more options, the better. > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. > > Any bright ideas, or is it we want it and we don't care about the > possible difficulties? I think this is something that the average person should just knuckle down and work out. At the moment on my work's system, we call EXPLAIN before queries to find out if it will take too long. This would improve performance by stopping us having to pass the query into the query planner twice. Matthew -- An ant doesn't have a lot of processing power available to it. I'm not trying to be speciesist - I wouldn't want to detract you from such a wonderful creature, but, well, there isn't a lot there, is there? -- Computer Science Lecturer ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Wed, 28 Nov 2007, Simon Riggs wrote: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... > > Well, I've suggested it before: > > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? > > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. arbitrary numbers are fine if they are relativly consistant with each other. will a plan with a estimated cost of 1,000,000 take approximatly 100 times as long as one with a cost of 10,000? or more importantly, will a plan with an estimated cost of 2000 reliably take longer then one with an estimated cost of 1000? David Lang > Any bright ideas, or is it we want it and we don't care about the > possible difficulties? > > ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... > > Well, I've suggested it before: > > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? > > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. Hm, that's only kind of true. Since 8.mumble seq_page_cost is itself configurable meaning you can adjust the base unit and calibrate all the parameters to be time in whatever unit you choose. But even assuming you haven't so adjusted seq_page_cost and all the other parameters to match the numbers aren't entirely arbitrary. They represent time in units of "however long a single sequential page read takes". Obviously few people know how long such a page read takes but surely you would just run a few sequential reads of large tables and set the limit to some multiple of whatever you find. This isn't going to precise to the level of being able to avoid executing any query which will take over 1000ms. But it is going to be able to catch unconstrained cross joins or large sequential scans or such. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Gregory Stark <stark@enterprisedb.com> writes: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> Tom's previous concerns were along the lines of "How would know what to >> set it to?", given that the planner costs are mostly arbitrary numbers. > Hm, that's only kind of true. The units are not the problem. The problem is that you are staking non-failure of your application on the planner's estimates being pretty well in line with reality. Not merely in line enough that it picks a reasonably cheap plan, but in line enough that if it thinks plan A is 10x more expensive than plan B, then the actual ratio is indeed somewhere near 10. Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what you really want to tolerate, but how useful is it then? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > "Simon Riggs" <simon@2ndquadrant.com> writes: > >> Tom's previous concerns were along the lines of "How would know what to > >> set it to?", given that the planner costs are mostly arbitrary numbers. > > > Hm, that's only kind of true. > > The units are not the problem. The problem is that you are staking > non-failure of your application on the planner's estimates being > pretty well in line with reality. Not merely in line enough that > it picks a reasonably cheap plan, but in line enough that if it > thinks plan A is 10x more expensive than plan B, then the actual > ratio is indeed somewhere near 10. > > Given that this list spends all day every day discussing cases where the > planner is wrong, I'd have to think that that's a bet I wouldn't take. I think you have a point, but the alternative is often much worse. If an SQL statement fails because of too high cost, we can investigate the problem and re-submit. If a website slows down because somebody allowed a very large query to execute then everybody is affected, not just the person who ran the bad query. Either way the guy that ran the query loses, but without constraints in place one guy can kill everybody else also. > You could probably avoid this risk by setting the cutoff at something > like 100 or 1000 times what you really want to tolerate, but how > useful is it then? Still fairly useful, as long as we understand its a blunt instrument. If the whole performance of your system depends upon indexed access then rogue queries can have disastrous, unpredictable consequences. Many sites construct their SQL dynamically, so a mistake in a seldom used code path can allow killer queries through. Even the best DBAs have been known to make mistakes. e.g. An 80GB table has 8 million blocks in it. - So putting a statement_cost limit = 1 million would allow some fairly large queries but prevent anything that did a SeqScan (or worse). - Setting it 10 million is going to prevent things like sorting the whole table without a LIMIT - Setting it at 100 million is going to prevent unconstrained product joins etc.. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| * Simon Riggs (simon@2ndquadrant.com) wrote: > On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > > Given that this list spends all day every day discussing cases where the > > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > I think you have a point, but the alternative is often much worse. I'm not convinced you've outlined the consequences of implementing a plan cost limit sufficiently. > If an SQL statement fails because of too high cost, we can investigate > the problem and re-submit. If a website slows down because somebody > allowed a very large query to execute then everybody is affected, not > just the person who ran the bad query. Either way the guy that ran the > query loses, but without constraints in place one guy can kill everybody > else also. It's entirely possible (likely even) that most of the users accessing a webpage are using the same queries and the same tables. If the estimates for those tables ends up changing enough that PG adjusts the plan cost to be above the plan cost limit then *all* of the users would be affected. The plan cost isn't going to change for just one user if it's the same query that a bunch of users are using. I'm not sure if handling the true 'rougue query' case with this limit would actually be a net improvment overall in a website-based situation. I could see it being useful to set a 'notice_on_high_cost_query' variable where someone working in a data warehouse situation would get a notice if the query he's hand-crafting has a very high cost (in which case he could ctrl-c it if he thinks something is wrong, rather than waiting 5 hours before realizing he forgot a join clause), but the website with the one rougue query run by one user seems a stretch. Thanks, Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHTuwNrzgMPqB3kigRAvCTAJ9pOdRQYBqzRM53vjwpXi MOf5yBfwCghM/d ur95/WD8krLq4sXiaJg+/rc= =1LMH -----END PGP SIGNATURE----- |
| |||
| On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: > Given that this list spends all day every day discussing cases where the > planner is wrong, I'd have to think that that's a bet I wouldn't take. > > You could probably avoid this risk by setting the cutoff at something > like 100 or 1000 times what you really want to tolerate, but how > useful is it then? It would still be useful in the sense that if the planner is taking wrong estimates you must correct it somehow... raise statistics target, rewrite query or other tweaking, you should do something. An error is sometimes better than gradually decreasing performance because of too low statistics target for example. So if the error is thrown because of wrong estimate, it is still a valid error raising a signal that the DBA has to do something about it. It's still true that if the planner estimates too low, it will raise no error and will take the resources. But that's just what we have now, so it wouldn't be a regression of any kind... Cheers, Csaba. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Simon Riggs wrote: > On Wed, 2007-11-28 at 14:48 +0100, Csaba Nagy wrote: > > >> In fact an even more useful option would be to ask the planner to throw >> error if the expected cost exceeds a certain threshold... >> > > Well, I've suggested it before: > > statement_cost_limit on pgsql-hackers, 1 March 2006 > > Would people like me to re-write and resubmit this patch for 8.4? > > Tom's previous concerns were along the lines of "How would know what to > set it to?", given that the planner costs are mostly arbitrary numbers. > > Any bright ideas, or is it we want it and we don't care about the > possible difficulties? > > Knowing how to set it is a problem - but a possibly bigger one is that monster query crippling your DW system, so I'd say lets have it. Cheers Mark ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |