This is a discussion on PITR Backups within the Pgsql Performance forums, part of the PostgreSQL category; --> On Mon, 2007-06-25 at 08:28 -0700, Dan Gorman wrote: > I took several snapshots. In all cases the FS ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Mon, 2007-06-25 at 08:28 -0700, Dan Gorman wrote: > I took several snapshots. In all cases the FS was fine. In one case > the db looked like on recovery it thought there were outstanding > pages to be written to disk as seen below and the db wouldn't start. > > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21 > 00:39:43 PDTLOG: redo done at 71/99870670 > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21 > 00:39:43 PDTWARNING: page 28905 of relation 1663/16384/76718 was > uninitialized OK, please put log_min_messages = DEBUG2 and re-run the recovery please. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Dan Gorman <dgorman@hi5.com> writes: > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21 > 00:39:43 PDTLOG: redo done at 71/99870670 > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21 > 00:39:43 PDTWARNING: page 28905 of relation 1663/16384/76718 was > uninitialized > ... lots of these ... > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21 > 00:39:43 PDTPANIC: WAL contains references to invalid pages (BTW, you'll find putting a space at the end of log_line_prefix does wonders for log readability.) Reformatting and sorting, we have WARNING: page 3535207 of relation 1663/16384/33190 was uninitialized WARNING: page 3535208 of relation 1663/16384/33190 did not exist WARNING: page 3535209 of relation 1663/16384/33190 did not exist WARNING: page 800226 of relation 1663/16384/33204 did not exist WARNING: page 17306 of relation 1663/16384/76710 did not exist WARNING: page 13626 of relation 1663/16384/76716 did not exist WARNING: page 28900 of relation 1663/16384/76718 was uninitialized WARNING: page 28902 of relation 1663/16384/76718 was uninitialized WARNING: page 28903 of relation 1663/16384/76718 was uninitialized WARNING: page 28904 of relation 1663/16384/76718 was uninitialized WARNING: page 28905 of relation 1663/16384/76718 was uninitialized WARNING: page 28906 of relation 1663/16384/76718 was uninitialized WARNING: page 28907 of relation 1663/16384/76718 was uninitialized WARNING: page 28908 of relation 1663/16384/76718 was uninitialized WARNING: page 28909 of relation 1663/16384/76718 was uninitialized WARNING: page 28910 of relation 1663/16384/76718 was uninitialized WARNING: page 28911 of relation 1663/16384/76718 was uninitialized WARNING: page 28912 of relation 1663/16384/76718 was uninitialized WARNING: page 28913 of relation 1663/16384/76718 was uninitialized WARNING: page 28914 of relation 1663/16384/76718 was uninitialized WARNING: page 28915 of relation 1663/16384/76718 was uninitialized WARNING: page 28916 of relation 1663/16384/76718 was uninitialized WARNING: page 28917 of relation 1663/16384/76718 was uninitialized WARNING: page 28918 of relation 1663/16384/76718 was uninitialized WARNING: page 26706 of relation 1663/16384/76719 was uninitialized WARNING: page 26708 of relation 1663/16384/76719 was uninitialized WARNING: page 26710 of relation 1663/16384/76719 was uninitialized WARNING: page 26711 of relation 1663/16384/76719 was uninitialized WARNING: page 26712 of relation 1663/16384/76719 was uninitialized WARNING: page 26713 of relation 1663/16384/76719 was uninitialized So the problems were pretty localized, probably at the ends of these files. Can you go back to the source database and check which tables these are --- match the last number cited in each line against pg_class.relfilenode? Are they tables or indexes, and about how big are they? A possible explanation is we stopped scanning WAL before reaching records that truncated or dropped these tables. But it's not clear why. Could we see the last few log lines before the "redo done" one? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Mon, 2007-06-25 at 12:34 -0400, Tom Lane wrote: > Dan Gorman <dgorman@hi5.com> writes: > > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21 > > 00:39:43 PDTLOG: redo done at 71/99870670 This is mid-way through an xlog file. > > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21 > > 00:39:43 PDTWARNING: page 28905 of relation 1663/16384/76718 was > > uninitialized > > ... lots of these ... > > Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21 > > 00:39:43 PDTPANIC: WAL contains references to invalid pages > > (BTW, you'll find putting a space at the end of log_line_prefix > does wonders for log readability.) > > Reformatting and sorting, we have > > WARNING: page 28900 of relation 1663/16384/76718 was uninitialized > WARNING: page 28902 of relation 1663/16384/76718 was uninitialized > WARNING: page 26706 of relation 1663/16384/76719 was uninitialized > WARNING: page 26708 of relation 1663/16384/76719 was uninitialized Those two are interesting because we appear to have two valid pages in the middle of some uninitialized ones. That implies were not looking at an unapplied truncation. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Thanks for the pointers to a) make it readable and b) log min messages I didn't however keep the snapshots around. I could try and re-set this scenario up. I was in the middle of doing some data migration with Netapp and wanted to just 'test' it to make sure it was sane. If you guys would like me to try to 'break' it again and keep the db around for further testing let me know. Regards, Dan Gorman On Jun 25, 2007, at 9:34 AM, Tom Lane wrote: > Dan Gorman <dgorman@hi5.com> writes: >> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21 >> 00:39:43 PDTLOG: redo done at 71/99870670 >> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21 >> 00:39:43 PDTWARNING: page 28905 of relation 1663/16384/76718 was >> uninitialized >> ... lots of these ... >> Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21 >> 00:39:43 PDTPANIC: WAL contains references to invalid pages > > (BTW, you'll find putting a space at the end of log_line_prefix > does wonders for log readability.) > > Reformatting and sorting, we have > > WARNING: page 3535207 of relation 1663/16384/33190 was uninitialized > WARNING: page 3535208 of relation 1663/16384/33190 did not exist > WARNING: page 3535209 of relation 1663/16384/33190 did not exist > > WARNING: page 800226 of relation 1663/16384/33204 did not exist > > WARNING: page 17306 of relation 1663/16384/76710 did not exist > > WARNING: page 13626 of relation 1663/16384/76716 did not exist > > WARNING: page 28900 of relation 1663/16384/76718 was uninitialized > WARNING: page 28902 of relation 1663/16384/76718 was uninitialized > WARNING: page 28903 of relation 1663/16384/76718 was uninitialized > WARNING: page 28904 of relation 1663/16384/76718 was uninitialized > WARNING: page 28905 of relation 1663/16384/76718 was uninitialized > WARNING: page 28906 of relation 1663/16384/76718 was uninitialized > WARNING: page 28907 of relation 1663/16384/76718 was uninitialized > WARNING: page 28908 of relation 1663/16384/76718 was uninitialized > WARNING: page 28909 of relation 1663/16384/76718 was uninitialized > WARNING: page 28910 of relation 1663/16384/76718 was uninitialized > WARNING: page 28911 of relation 1663/16384/76718 was uninitialized > WARNING: page 28912 of relation 1663/16384/76718 was uninitialized > WARNING: page 28913 of relation 1663/16384/76718 was uninitialized > WARNING: page 28914 of relation 1663/16384/76718 was uninitialized > WARNING: page 28915 of relation 1663/16384/76718 was uninitialized > WARNING: page 28916 of relation 1663/16384/76718 was uninitialized > WARNING: page 28917 of relation 1663/16384/76718 was uninitialized > WARNING: page 28918 of relation 1663/16384/76718 was uninitialized > > WARNING: page 26706 of relation 1663/16384/76719 was uninitialized > WARNING: page 26708 of relation 1663/16384/76719 was uninitialized > WARNING: page 26710 of relation 1663/16384/76719 was uninitialized > WARNING: page 26711 of relation 1663/16384/76719 was uninitialized > WARNING: page 26712 of relation 1663/16384/76719 was uninitialized > WARNING: page 26713 of relation 1663/16384/76719 was uninitialized > > So the problems were pretty localized, probably at the ends of these > files. Can you go back to the source database and check which > tables these are --- match the last number cited in each line > against pg_class.relfilenode? Are they tables or indexes, and > about how big are they? > > A possible explanation is we stopped scanning WAL before reaching > records that truncated or dropped these tables. But it's not clear > why. > Could we see the last few log lines before the "redo done" one? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: >> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized >> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized > >> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized >> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized > > Those two are interesting because we appear to have two valid pages in > the middle of some uninitialized ones. That implies were not looking at > an unapplied truncation. You don't have fsync off do you? That could explain missing pages at the end of a file like this too. And it would explain how you could have two written in the midst of others that are missing. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: >> Reformatting and sorting, we have >> >> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized >> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized >> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized >> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized > Those two are interesting because we appear to have two valid pages in > the middle of some uninitialized ones. That implies were not looking at > an unapplied truncation. Not necessarily --- it's possible the WAL sequence simply didn't touch those pages. Your suggestion to rerun the recovery with higher log_min_messages is a good one, because that way we'd get some detail about what the WAL records that touched the pages were. I think DEBUG1 would be sufficient for that, though, and DEBUG2 might be pretty durn verbose. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| No, however, I will attach the postgreql.conf so everyone can look at other settings just in case. Regards, Dan Gorman On Jun 25, 2007, at 10:07 AM, Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > >>> WARNING: page 28900 of relation 1663/16384/76718 was uninitialized >>> WARNING: page 28902 of relation 1663/16384/76718 was uninitialized >> >>> WARNING: page 26706 of relation 1663/16384/76719 was uninitialized >>> WARNING: page 26708 of relation 1663/16384/76719 was uninitialized >> >> Those two are interesting because we appear to have two valid >> pages in >> the middle of some uninitialized ones. That implies were not >> looking at >> an unapplied truncation. > > You don't have fsync off do you? That could explain missing pages > at the end > of a file like this too. And it would explain how you could have > two written > in the midst of others that are missing. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| Dan Gorman <dgorman@hi5.com> writes: > I didn't however keep the snapshots around. I could try and re-set > this scenario up. I was in the middle of doing some data migration > with Netapp and wanted to just 'test' it to make sure it was sane. > If you guys would like me to try to 'break' it again and keep the db > around for further testing let me know. Yeah, please do. It's not entirely clear whether you've found a bug or not, and it'd be good to determine that ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |