This is a discussion on Help me recovering data within the pgsql Hackers forums, part of the PostgreSQL category; --> >>in the foot. We've seen several instances of people blowing away >>pg_xlog and pg_clog, for example, because they "don't ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| >>in the foot. We've seen several instances of people blowing away >>pg_xlog and pg_clog, for example, because they "don't need log files". >>Or how about failing to keep adequate backups? That's a sure way for an >>ignorant admin to lose data too. >> >> > >There is a difference between actively doing something stupid and failing >to realize a maintenence task is required. > >PostgreSQL should stop working. When the admin tries to understand why, >they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum >thingy." That is a whole lot better than falling off a cliff you didn't >even know was there. > > There is another way to look at this as lends itself to mohawksoft's argument. More often than not DBAs and Sysadmins are neither one. They are people that get shoved into the job because they happen to mention around the water cooler that they "once" installed linux/freebsd -- whatever. Maybe it is an executive that has some of his brains left after sitting behind a desk all day for the last 10 years. One day he/she gets a thought in his head to create a new project named "foo". He does not want to waste his internal resources so said executive decides he will do it himself as a hobby. For some reason, the project actually succeeds (I have seen this many times) and the company starts using it. Well guess what... it uses PostgreSQL. The guy isn't a DBA, heck he is even really a programmer. He had know idea about this "vacuum" thing. He had never heard of other databases having to do it. So they run for a year, and then all of a sudden **BOOM** the world ends. Do you think they are going to care that we "documented" the issue? Uhmmm no they won't. Chances are they will drop kick PostgreSQL and bad talk it to all their other executive friends. In short, this whole argument has the mark of irresponsibility on both parties but it is is the PostgreSQL projects responisbility to make reasonable effort to produce a piece of software that doesn't break. We are not talking about a user who ran a query: delete from foo; At this point we have a known critical bug. Usually the PostgreSQL community is all over critical bugs. Why is this any different? It sounds to me that people are just annoyed that users don't RTFM. Get over it. Most won't. If users RTFM more often, it would put most support companies out of business. Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Christopher Kings-Lynne wrote: >> At this point we have a known critical bug. Usually the PostgreSQL >> community >> is all over critical bugs. Why is this any different? >> >> It sounds to me that people are just annoyed that users don't RTFM. >> Get over it. Most won't. If users RTFM more often, it would put most >> support companies out of business. > > > I wonder if I should point out that we just had 3 people suffering XID > wraparound failure in 2 days in the IRC channel... I have had half a dozen new customers in the last six months that have had the same problem. Nothing like the phone call: Uhmmm I am a new customer, help I can't see my databases. Sincerely, Joshua D. Drake > > Chris -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| > At this point we have a known critical bug. Usually the PostgreSQL > community > is all over critical bugs. Why is this any different? > > It sounds to me that people are just annoyed that users don't RTFM. Get > over it. Most won't. If users RTFM more often, it would put most support > companies out of business. I wonder if I should point out that we just had 3 people suffering XID wraparound failure in 2 days in the IRC channel... Chris ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| > On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote: > >> > >> > Once autovacuum gets to the point where it's used by default, this >> > particular failure mode should be a thing of the past, but in the >> > meantime I'm not going to panic about it. >> >> I don't know how to say this without sounding like a jerk, (I guess >> that's >> my role sometimes) but would you go back and re-read this sentence? >> >> To paraphrase: "I know this causes a catestrophic data loss, and we have >> plans to fix it in the future, but for now, I'm not going panic about >> it." > > Do you have a useful suggestion about how to fix it? "Stop working" is > handwaving and merely basically saying, "one of you people should do > something about this" is not a solution to the problem, it's not even an > approach towards a solution to the problem. Actually, it is not a solution to the problem of losing data. It is a drop dead last ditch failsafe that EVERY PRODUCT should have before losing data. > ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >> Right, but since the how to resolve it currently involves executing a >> query, simply stopping dead won't allow you to resolve it. Also, if we >> stop at the exact wraparound point, can we run into problems actually >> trying to do the vacuum if that's still the resolution technique? > > We'd have to do something with a fair amount of slop. The idea I was > toying with just now involved a forcible shutdown once we get within > say 100,000 transactions of a wrap failure; but apply this check only > when in interactive operation. This would allow the DBA to perform > the needed VACUUMing manually in a standalone backend. > > The real question here is exactly how large a cluestick do you want to > hit the DBA with. I don't think we can "guarantee" no data loss with > anything less than forced shutdown, but that's not so much a cluestick > as a clue howitzer. I think a DBA or accidental DBA would prefer stating in a meeting: "Yea, the database shut down because I didn't perform normal maintenence, its fixed now and we have a script in place so it won't happen again" Over "Yea, the database lost all its data and we have to restore from our last backup because I didn't perform normal maintenence." One gets a "boy are you lucky" over a "you're fired." > > Maybe > > (a) within 200,000 transactions of wrap, every transaction start > delivers a WARNING message; > > (b) within 100,000 transactions, forced shutdown as above. I agree. ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Bruno Wolff III <bruno@wolff.to> writes: > I don't think there is much point in making it configurable. If they knew > to do that they would most likely know to vacuum as well. Agreed. > However, 100K out of 1G seems too small. Just to get wrap around there > must be a pretty high transaction rate, so 100K may not give much warning. > 1M or 10M seem to be better. Good point. Even 10M is less than 1% of the ID space. Dunno about you, but the last couple cars I've owned start flashing warnings when the gas tank is about 20% full, not 1% full... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| > pgsql@mohawksoft.com writes: >> Maybe I'm missing something, but shouldn't the prospect of data loss >> (even >> in the presense of admin ignorance) be something that should be >> unacceptable? Certainly within the realm "normal PostgreSQL" operation. > > [ shrug... ] The DBA will always be able to find a way to shoot himself > in the foot. We've seen several instances of people blowing away > pg_xlog and pg_clog, for example, because they "don't need log files". > Or how about failing to keep adequate backups? That's a sure way for an > ignorant admin to lose data too. There is a difference between actively doing something stupid and failing to realize a maintenence task is required. PostgreSQL should stop working. When the admin tries to understand why, they can read a troubleshooting FAQ and say "oops, I gotta run this vacuum thingy." That is a whole lot better than falling off a cliff you didn't even know was there. > > Once autovacuum gets to the point where it's used by default, this > particular failure mode should be a thing of the past, but in the > meantime I'm not going to panic about it. I don't know how to say this without sounding like a jerk, (I guess that's my role sometimes) but would you go back and re-read this sentence? To paraphrase: "I know this causes a catestrophic data loss, and we have plans to fix it in the future, but for now, I'm not going panic about it." What would you do if the FreeBSD group or Linux kernel group said this about a file system? If you failed to run fsck after 100 mounts, you loose your data? I thought PostgreSQL was about "protecting your data." How many times have we smugly said, "yea, you can use MySQL if you don't care about your data." Any data loss caused by postgresql should be seen as unacceptable. It's funny, while I've known about this for a while, and it has always seemed a sort of distant edge condition that is easily avoided. However, with todays faster machines and disks, it is easier to reach this limitation than ever before. All PostgreSQL needs is one or two VERY UPSET mainstream users who lose data to completely reverse the momemntum that it is gaining. No amount of engineering discussion about it not being the fault of postgresql will be lost, and rightfully so, IMHO. Sorry. ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| On Wed, 16 Feb 2005 pgsql@mohawksoft.com wrote: > > > > Once autovacuum gets to the point where it's used by default, this > > particular failure mode should be a thing of the past, but in the > > meantime I'm not going to panic about it. > > I don't know how to say this without sounding like a jerk, (I guess that's > my role sometimes) but would you go back and re-read this sentence? > > To paraphrase: "I know this causes a catestrophic data loss, and we have > plans to fix it in the future, but for now, I'm not going panic about it." Do you have a useful suggestion about how to fix it? "Stop working" is handwaving and merely basically saying, "one of you people should do something about this" is not a solution to the problem, it's not even an approach towards a solution to the problem. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| >Do you have a useful suggestion about how to fix it? "Stop working" is >handwaving and merely basically saying, "one of you people should do >something about this" is not a solution to the problem, it's not even an >approach towards a solution to the problem. > > I believe that the ability for PostgreSQL to stop accepting queries and to log to the file or STDERR why it stopped working and how to resolve it is appropriate. Also it is probably appropriate to warn ahead of time... WARNING: Only 50,000 transactions left before lock out or something like that. J >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| ||||
| >> The checkpointer is entirely incapable of either detecting the problem >> (it doesn't have enough infrastructure to examine pg_database in a >> reasonable way) or preventing backends from doing anything if it did >> know there was a problem. > > Well, I guess I meant 'some regularly running process'... > >>>I think people'd rather their db just stopped accepting new transactions >>>rather than just losing data... >> >> Not being able to issue new transactions *is* data loss --- how are you >> going to get the system out of that state? > > Not allowing any transactions except a vacuum... > >> autovacuum is the correct long-term solution to this, not some kind of >> automatic hara-kiri. > > Yeah, seems like it should really happen soon... > > Chris Maybe I'm missing something, but shouldn't the prospect of data loss (even in the presense of admin ignorance) be something that should be unacceptable? Certainly within the realm "normal PostgreSQL" operation. ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |