This is a discussion on URGENT: Out of disk space pg_xlog within the Pgsql Performance forums, part of the PostgreSQL category; --> I created a 10GB partition for pg_xlog and ran out of disk space today during a long running update. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I created a 10GB partition for pg_xlog and ran out of disk space today during a long running update. My checkpoint_segments is set to 12, but there are 622 files in pg_xlog. What size should the pg_xlog partition be? Postmaster is currently not starting up (critical for my organization) and reports "FATAL: The database system is starting up" . The log reports: 2006-12-22 10:50:09 LOG: checkpoint record is at 2E/87A323C8 2006-12-22 10:50:09 LOG: redo record is at 2E/8729A6E8; undo record is at 0/0; shutdown FALSE 2006-12-22 10:50:09 LOG: next transaction ID: 0/25144015; next OID: 140986 2006-12-22 10:50:09 LOG: next MultiXactId: 12149; next MultiXactOffset: 24306 2006-12-22 10:50:09 LOG: database system was not properly shut down; automatic recovery in progress 2006-12-22 10:50:09 LOG: redo starts at 2E/8729A6E8 This has been running for 20 minutes. What can I do? Please help! ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Sorry for my rushed posting, as I was in a bit of a panic. We moved the pg_xlog directory over to a 70GB partition, and after 15-20 minutes the automatic recovery finished. Everything is working fine now. I would still appreciate a PG guru explaining how to estimate size for a pg_xlog partition. It seems like it can vary considerably depending on how intensive your current transactions are. Is there a way to determine a maximum? On Fri, 22 Dec 2006 11:06:46 -0500, "Jeremy Haile" <jhaile@fastmail.fm> said: > I created a 10GB partition for pg_xlog and ran out of disk space today > during a long running update. My checkpoint_segments is set to 12, but > there are 622 files in pg_xlog. What size should the pg_xlog partition > be? > > Postmaster is currently not starting up (critical for my organization) > and reports "FATAL: The database system is starting up" . > > The log reports: > 2006-12-22 10:50:09 LOG: checkpoint record is at 2E/87A323C8 > 2006-12-22 10:50:09 LOG: redo record is at 2E/8729A6E8; undo record is > at 0/0; shutdown FALSE > 2006-12-22 10:50:09 LOG: next transaction ID: 0/25144015; next OID: > 140986 > 2006-12-22 10:50:09 LOG: next MultiXactId: 12149; next MultiXactOffset: > 24306 > 2006-12-22 10:50:09 LOG: database system was not properly shut down; > automatic recovery in progress > 2006-12-22 10:50:09 LOG: redo starts at 2E/8729A6E8 > > > This has been running for 20 minutes. What can I do? Please help! > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| > 2006-12-22 10:50:09 LOG: database system was not properly shut down; > automatic recovery in progress > 2006-12-22 10:50:09 LOG: redo starts at 2E/8729A6E8 > > > This has been running for 20 minutes. What can I do? Please help! 1. Turn off postgresql. 2. Make tar backup of entire thing 3. Move pg_xlog somehwere that has space 4. ln postgresql to new pg_xlog directory 5. Start postgresql 6. Look for errors 7. Report back Sincerely. Joshua D. Drake > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Fri, 2006-12-22 at 11:52 -0500, Jeremy Haile wrote: > I would still appreciate ... explaining how to estimate size for a > pg_xlog partition. It seems like it can vary considerably depending on > how intensive your current transactions are. Is there a way to > determine a maximum? There should be at most 2*checkpoint_segments+1 files in pg_xlog, which are 16MB each. So you shouldn't be having a problem. If there are more than this, it could be because you have currently/previously had archive_command set and the archive command failed to execute correctly, or the database was shutdown/crashed prior to the archive commands being executed. IIRC there was a bug that allowed this to happen, but that was some time ago. Perhaps you could show us the dir listing, so we can check that there is not a new problem emerging? Can you also show us the contents of the pg_xlog/archive_status directory? Thanks. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| As I understand it, the log space accumulates for the oldest transaction which is still running, and all transactions which started after it. I don't think there is any particular limit besides available disk space. Long running transactions can cause various problems, including table and index bloat which can degrade performance. You should probably look at whether the long running transaction could be broken down into a number of smaller ones. -Kevin >>> On Fri, Dec 22, 2006 at 10:52 AM, in message <1166806378.10592.281708161@webmail.messagingengin e.com>, "Jeremy Haile" <jhaile@fastmail.fm> wrote: > Sorry for my rushed posting, as I was in a bit of a panic. > > We moved the pg_xlog directory over to a 70GB partition, and after 15- 20 > minutes the automatic recovery finished. Everything is working fine > now. > > I would still appreciate a PG guru explaining how to estimate size for a > pg_xlog partition. It seems like it can vary considerably depending on > how intensive your current transactions are. Is there a way to > determine a maximum? > > On Fri, 22 Dec 2006 11:06:46 - 0500, "Jeremy Haile" <jhaile@fastmail.fm> > said: >> I created a 10GB partition for pg_xlog and ran out of disk space today >> during a long running update. My checkpoint_segments is set to 12, but >> there are 622 files in pg_xlog. What size should the pg_xlog partition >> be? >> >> Postmaster is currently not starting up (critical for my organization) >> and reports "FATAL: The database system is starting up" . >> >> The log reports: >> 2006- 12- 22 10:50:09 LOG: checkpoint record is at 2E/87A323C8 >> 2006- 12- 22 10:50:09 LOG: redo record is at 2E/8729A6E8; undo record is >> at 0/0; shutdown FALSE >> 2006- 12- 22 10:50:09 LOG: next transaction ID: 0/25144015; next OID: >> 140986 >> 2006- 12- 22 10:50:09 LOG: next MultiXactId: 12149; next MultiXactOffset: >> 24306 >> 2006- 12- 22 10:50:09 LOG: database system was not properly shut down; >> automatic recovery in progress >> 2006- 12- 22 10:50:09 LOG: redo starts at 2E/8729A6E8 >> >> >> This has been running for 20 minutes. What can I do? Please help! >> >> --------------------------- (end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org > > --------------------------- (end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| The archive_status directory is empty. I've never seen any files in there and I've never set archive_command. Well, the problem has since resolved, but here is what is in the directory now. Previously there were hundreds of files, but these disappeared after Postgres performed the automatic recovery. 12/22/2006 11:16 AM 16,777,216 0000000100000030000000D2 12/22/2006 11:17 AM 16,777,216 0000000100000030000000D3 12/22/2006 11:17 AM 16,777,216 0000000100000030000000D4 12/22/2006 11:17 AM 16,777,216 0000000100000030000000D5 12/22/2006 11:18 AM 16,777,216 0000000100000030000000D6 12/22/2006 11:19 AM 16,777,216 0000000100000030000000D7 12/22/2006 11:19 AM 16,777,216 0000000100000030000000D8 12/22/2006 11:19 AM 16,777,216 0000000100000030000000D9 12/22/2006 11:19 AM 16,777,216 0000000100000030000000DA 12/22/2006 11:21 AM 16,777,216 0000000100000030000000DB 12/22/2006 10:07 AM 16,777,216 0000000100000030000000DC 12/22/2006 10:07 AM 16,777,216 0000000100000030000000DD 12/22/2006 10:07 AM 16,777,216 0000000100000030000000DE 12/22/2006 10:33 AM 16,777,216 0000000100000030000000DF 12/22/2006 10:08 AM 16,777,216 0000000100000030000000E0 12/22/2006 10:32 AM 16,777,216 0000000100000030000000E1 12/22/2006 10:08 AM 16,777,216 0000000100000030000000E2 12/22/2006 10:08 AM 16,777,216 0000000100000030000000E3 12/22/2006 10:17 AM 16,777,216 0000000100000030000000E4 12/22/2006 10:11 AM 16,777,216 0000000100000030000000E5 12/22/2006 11:10 AM 16,777,216 0000000100000030000000E6 12/22/2006 11:11 AM 16,777,216 0000000100000030000000E7 12/22/2006 11:15 AM 16,777,216 0000000100000030000000E8 12/22/2006 11:15 AM 16,777,216 0000000100000030000000E9 12/22/2006 11:15 AM 16,777,216 0000000100000030000000EA 12/22/2006 11:16 AM 16,777,216 0000000100000030000000EB 12/22/2006 11:16 AM 16,777,216 0000000100000030000000EC 12/22/2006 11:16 AM 16,777,216 0000000100000030000000ED 12/18/2006 08:52 PM <DIR> archive_status 28 File(s) 469,762,048 bytes 3 Dir(s) 10,206,756,864 bytes free On Fri, 22 Dec 2006 17:02:43 +0000, "Simon Riggs" <simon@2ndquadrant.com> said: > On Fri, 2006-12-22 at 11:52 -0500, Jeremy Haile wrote: > > > I would still appreciate ... explaining how to estimate size for a > > pg_xlog partition. It seems like it can vary considerably depending on > > how intensive your current transactions are. Is there a way to > > determine a maximum? > > There should be at most 2*checkpoint_segments+1 files in pg_xlog, which > are 16MB each. So you shouldn't be having a problem. > > If there are more than this, it could be because you have > currently/previously had archive_command set and the archive command > failed to execute correctly, or the database was shutdown/crashed prior > to the archive commands being executed. > > IIRC there was a bug that allowed this to happen, but that was some time > ago. > > Perhaps you could show us the dir listing, so we can check that there is > not a new problem emerging? Can you also show us the contents of the > pg_xlog/archive_status directory? Thanks. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Fri, 2006-12-22 at 12:30 -0500, Jeremy Haile wrote: > The archive_status directory is empty. I've never seen any files in > there and I've never set archive_command. > > Well, the problem has since resolved, but here is what is in the > directory now. Previously there were hundreds of files, but these > disappeared after Postgres performed the automatic recovery. What were you doing before the server crashed? Did you previously have checkpoint_segments set higher? When/how was it reduced? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| checkpoint_segments has been set at 12 for a while and was never set higher than that. (before that it was set to the PG default - 3 I think) Before the server crashed I was running an update that updates a boolean flag on two large tables (10 million rows each) for transactions older than today (roughly 80% of the rows) The transaction ran for a long time and I assume is what caused the pg_xlog to fill up. On Fri, 22 Dec 2006 17:36:39 +0000, "Simon Riggs" <simon@2ndquadrant.com> said: > On Fri, 2006-12-22 at 12:30 -0500, Jeremy Haile wrote: > > The archive_status directory is empty. I've never seen any files in > > there and I've never set archive_command. > > > > Well, the problem has since resolved, but here is what is in the > > directory now. Previously there were hundreds of files, but these > > disappeared after Postgres performed the automatic recovery. > > What were you doing before the server crashed? > > Did you previously have checkpoint_segments set higher? When/how was it > reduced? > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > As I understand it, the log space accumulates for the oldest transaction > which is still running, and all transactions which started after it. No, pg_xlog can be truncated as soon as a checkpoint occurs. If Jeremy wasn't using archive_command then the only possible explanation for bloated pg_xlog is that checkpoints were failing. Which is not unlikely if the *data* partition runs out of space. Were there gripes in the log before the system crash? The scenario we've seen in the past is * data partition out of space, so writes fail * each time Postgres attempts a checkpoint, writes fail, so the checkpoint fails. No data loss at this point, the dirty buffers just stay in memory. * pg_xlog bloats because we can't truncate away old segments * eventually pg_xlog runs out of space, at which point we PANIC and can't continue running the database Once you free some space on the data partition and restart, you should be good to go --- there will be no loss of committed transactions, since all the operations are in pg_xlog. Might take a little while to replay all that log though :-( regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| >>> On Fri, Dec 22, 2006 at 12:14 PM, in message <26238.1166811258@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> As I understand it, the log space accumulates for the oldest transaction >> which is still running, and all transactions which started after it. > > No, pg_xlog can be truncated as soon as a checkpoint occurs. Thanks. Good to know. I had missed that. > If Jeremy > wasn't using archive_command then the only possible explanation for > bloated pg_xlog is that checkpoints were failing. Which is not unlikely > if the *data* partition runs out of space. Were there gripes in the log > before the system crash? The scenario we've seen in the past is > > * data partition out of space, so writes fail > * each time Postgres attempts a checkpoint, writes fail, so the > checkpoint fails. No data loss at this point, the dirty buffers > just stay in memory. > * pg_xlog bloats because we can't truncate away old segments So, at this point, if space is freed on the data partition somehow, Postgres recovers with no problems? (i.e.,, the database is still running and no requests have been terminated abnormally due to the space problems?) > * eventually pg_xlog runs out of space, at which point we PANIC > and can't continue running the database > > Once you free some space on the data partition and restart, you should > be good to go --- there will be no loss of committed transactions, since > all the operations are in pg_xlog. Might take a little while to replay > all that log though :- ( Just to confirm what I would assume at this point -- non-committed transactions should roll back cleanly; it is reasonable to assume no corruption at this point? Thanks, -Kevin ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |