This is a discussion on Do I have a corrupted database? within the Pgsql General forums, part of the PostgreSQL category; --> I fear I have a corrupted database, and I'm not sure what to do. Environment: Windows Server 2003 8GB ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I fear I have a corrupted database, and I'm not sure what to do. Environment: Windows Server 2003 8GB RAM Dual processor, quad core 2.6Ghz Postgres 8.2.3 (The IT dept wants to upgrade to 8.2.9, but they are asking me what to do about this corrupt database before they proceed) The database files and logs are stored on a SAN drive 2008-08-23 06:57:06 FATAL: could not create sigchld waiter thread: error code 1816 *** ack! 13 hour hole! What the...? 2008-08-23 20:00:27 ERROR: xlog flush request E0/293CF278 is not satisfied --- flushed only to E0/21B1B7F0 2008-08-23 20:00:27 CONTEXT: writing block 94218 of relation 16712/16713/16725 2008-08-23 20:04:36 DETAIL: Multiple failures --- write error may be permanent. 2008-08-23 20:04:36 ERROR: xlog flush request E0/4FC5BEB8 is not satisfied --- flushed only to E0/21B9E270 2008-08-23 20:04:36 CONTEXT: writing block 81033 of relation 16712/16713/16725 2008-08-23 20:04:36 STATEMENT: BEGIN TRANSACTION; ... just a normal SQL stored proc... 2008-08-23 20:04:36 DETAIL: Multiple failures --- write error may be permanent. 2008-08-23 20:04:36 ERROR: xlog flush request E0/314D8248 is not satisfied --- flushed only to E0/21B9E358 2008-08-23 20:04:36 CONTEXT: writing block 371418 of relation 16712/16713/16719 2008-08-23 20:04:36 STATEMENT: BEGIN TRANSACTION;... just a normal SQL stored proc... repeats for quite a while. A few days later, after a restart, we are seeing these showing up quite often: 2008-08-26 11:59:42 FATAL: the database system is starting up 2008-08-26 11:59:42 FATAL: the database system is starting up 2008-08-26 11:59:43 FATAL: the database system is starting up 2008-08-26 11:59:43 FATAL: the database system is starting up 2008-08-26 11:59:43 FATAL: the database system is starting up 2008-08-26 11:59:43 LOG: database system is ready 2008-08-26 11:59:55 PANIC: right sibling's left-link doesn't match 2008-08-26 11:59:55 STATEMENT: BEGIN TRANSACTION;INSERT INTO ...SQL scrubbed... This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 2008-08-26 11:59:55 LOG: server process (PID 2228) exited with exit code 3 2008-08-26 11:59:55 LOG: terminating any other active server processes 2008-08-26 11:59:55 LOG: all server processes terminated; reinitializing 2008-08-26 11:59:55 LOG: database system was interrupted at 2008-08-26 11:59:43 Pacific Daylight Time 2008-08-26 11:59:55 LOG: checkpoint record is at E2/F88B6C0 2008-08-26 11:59:55 LOG: redo record is at E2/F88B6C0; undo record is at 0/0; shutdown TRUE 2008-08-26 11:59:55 LOG: next transaction ID: 0/396816257; next OID: 58100 2008-08-26 11:59:55 LOG: next MultiXactId: 3; next MultiXactOffset: 5 2008-08-26 11:59:55 LOG: database system was not properly shut down; automatic recovery in progress 2008-08-26 11:59:55 LOG: redo starts at E2/F88B710 2008-08-26 11:59:55 LOG: record with zero length at E2/F984928 2008-08-26 11:59:55 LOG: redo done at E2/F9848F8 2008-08-26 11:59:55 FATAL: the database system is starting up 2008-08-26 11:59:56 FATAL: the database system is starting up 2008-08-26 11:59:56 FATAL: the database system is starting up 2008-08-26 11:59:56 FATAL: the database system is starting up 2008-08-26 11:59:56 FATAL: the database system is starting up 2008-08-26 11:59:56 FATAL: the database system is starting up 2008-08-26 11:59:56 FATAL: the database system is starting up 2008-08-26 11:59:56 LOG: database system is ready That section is repeating over and over. Oddly enough, the system actually seems to be running mostly. I need to do some diagnostics of our app to see what is going on at that layer and what is and isn't working. I found an article online with a similar problem, but no resolution: http://www.mydatabasesupport.com/for...-shutdown.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general |
| |||
| William Garrison wrote: > I fear I have a corrupted database, and I'm not sure what to do. First, make sure you have a recent backup. If your backups rotate, stop the rotation so that all currently available historical copies of the database are preserved from now on - just in case you need them. Now, if possible dump your database with pg_dump. Restore the dump to a test database instance and make sure that it all goes OK. Once that's done, so you know you have a decent recovery point to work from in case you make a mistake during your recovery efforts. After that I don't have all that much to offer, especially as you're using an operating system I don't have much experience with Pg on and you're using an (unspecified) SAN. Normally I'd ask if you'd verified your RAID array / tested your disks. In this case, I'm wondering if there's any chance there was a service interruption on the SAN that might've caused some sort of intermittent or partial writes. > 2008-08-23 20:00:27 ERROR: xlog flush request E0/293CF278 is not > satisfied --- flushed only to E0/21B1B7F0 > 2008-08-23 20:00:27 CONTEXT: writing block 94218 of relation > 16712/16713/16725 > 2008-08-23 20:04:36 DETAIL: Multiple failures --- write error may be > permanent. Yeah, I'm really wondering about the SAN and SAN connection. What sort of SAN is it? How is the host connected? Does it have any sort of logging and monitoring that might let you see if there was a problem around the time Pg was complaining? Have you checked the Windows error logs? -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general |
| |||
| Craig Ringer wrote: > William Garrison wrote: > >> I fear I have a corrupted database, and I'm not sure what to do. >> > > First, make sure you have a recent backup. If your backups rotate, stop > the rotation so that all currently available historical copies of the > database are preserved from now on - just in case you need them. > Since I made my post, we found that we can't do a pg_dump. time this error appears in the logs, postgres forcably closes any connections (including any running instances of pgadmin or pg_dump) when it runs this little recovery process. We have backups from some days ago plus transaction logs. We also have a snapshot of the file system, and I'm hoping to find a way to attach that onto another system. I've had trouble with that in the past. As for the SAN and the Windows event log: Our IT guy says the SAN reported no failures at the time. I don't know much about the SAN itself, I just know it uses dual fiber-channels and all the drives are in some RAID array. I think it also is hardened against nuclear strikes and has a built-in laser defense system. At the time of the problem, the Windows event log indicates no problems writing to the drives, or any other failures of any kind really. No other apps crashed, no unusual memory usage, plenty of disk space. So the cause is a complete mystery. We tried to REINDEX each table, and we are getting duplicate key errors so the reindex fails. I can fix those records manually, but I was hoping to dump the database, find the duplicates using another system, then delete/repair the bad records and restore onto the production machine. But since the backup/restore isn't working, that isn't looking like a viable option. Are there any kind of repair tools for a postgres database? Any sort of routine where I can take it offline and run like pg_fsck --all and it will come back with a report or a repair procedure? > Now, if possible dump your database with pg_dump. Restore the dump to a > test database instance and make sure that it all goes OK. > > Once that's done, so you know you have a decent recovery point to work > from in case you make a mistake during your recovery efforts. > > After that I don't have all that much to offer, especially as you're > using an operating system I don't have much experience with Pg on and > you're using an (unspecified) SAN. > > Normally I'd ask if you'd verified your RAID array / tested your disks. > In this case, I'm wondering if there's any chance there was a service > interruption on the SAN that might've caused some sort of intermittent > or partial writes. > > >> 2008-08-23 20:00:27 ERROR: xlog flush request E0/293CF278 is not >> satisfied --- flushed only to E0/21B1B7F0 >> 2008-08-23 20:00:27 CONTEXT: writing block 94218 of relation >> 16712/16713/16725 >> 2008-08-23 20:04:36 DETAIL: Multiple failures --- write error may be >> permanent. >> > > Yeah, I'm really wondering about the SAN and SAN connection. What sort > of SAN is it? How is the host connected? Does it have any sort of > logging and monitoring that might let you see if there was a problem > around the time Pg was complaining? > > Have you checked the Windows error logs? > > -- > Craig Ringer > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general |
| ||||
| On Wed, Aug 27, 2008 at 01:45:43PM -0400, William Garrison wrote: > Since I made my post, we found that we can't do a pg_dump. > time this error appears in the logs, postgres forcably closes any > connections (including any running instances of pgadmin or pg_dump) when > it runs this little recovery process. We have backups from some days > ago plus transaction logs. We also have a snapshot of the file system, > and I'm hoping to find a way to attach that onto another system. I've > had trouble with that in the past. You're going to have to be more specific. What do you mean by "this error"? It is possible to startup postgresql such that it will not use any system indexes. > Are there any kind of repair tools for a postgres database? Any sort of > routine where I can take it offline and run like pg_fsck --all and it > will come back with a report or a repair procedure? There is no tools that do fixing, only the DB server itself. If you can't get it to work within postgresql, then pgfsck can attempt to do a raw data dump. It doesn't guarentee the integrity of the data but it may be able to get your data out. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFItc+CIB7bNG8LQkwRAhcVAJ0ZpiYeviwtDTs7RWaTXV nBeHL/HACeLGOT fYxqQcKan3G08PZn8aTGzdo= =Y73r -----END PGP SIGNATURE----- |
| Thread Tools | |
| Display Modes | |
|
|
| ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Corrupted database | Ramaiyan:Palanivel | Ingres | 1 | 04-20-2008 06:06 PM |
| help with corrupted database | strk | pgsql Hackers | 5 | 04-11-2008 04:11 AM |
| Postgresql Database Corrupted | karthik keyan | pgsql Admins | 1 | 04-10-2008 09:17 AM |
| Why database is corrupted after re-booting | Andrus | Pgsql General | 42 | 04-09-2008 07:42 AM |
| Re: Why database is corrupted after re-booting | Welty, Richard | Pgsql General | 1 | 04-09-2008 07:41 AM |