This is a discussion on Re: Postgresql Performance on an HP DL385 and within the Pgsql Performance forums, part of the PostgreSQL category; --> mark@mark.mielke.cc writes: > WAL file is never appended - only re-written? > If so, then I'm wrong, and ext2 ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| mark@mark.mielke.cc writes: > WAL file is never appended - only re-written? > If so, then I'm wrong, and ext2 is fine. The requirement is that no > file system structures change as a result of any writes that > PostgreSQL does. If no file system structures change, then I take > everything back as uninformed. That risk certainly exists in the general data directory, but AFAIK it's not a problem for pg_xlog. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Tue, Aug 15, 2006 at 02:15:05PM -0500, Jim C. Nasby wrote: >Now, if >fsync'ing a file also ensures that all the metadata is written, then >we're probably fine... ....and it does. Unclean shutdowns cause problems in general because filesystems operate asynchronously. postgres (and other similar programs) go to great lengths to make sure that critical operations are performed synchronously. If the program *doesn't* do that, metadata journaling isn't a magic wand which will guarantee data integrity--it won't. If the program *does* do that, all the metadata journaling adds is the ability to skip fsck and start up faster. Mike Stone ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Tue, Aug 15, 2006 at 03:39:51PM -0400, mark@mark.mielke.cc wrote: >No. This is not true. Updating the file system structure (inodes, indirect >blocks) touches a separate part of the disk than the actual data. If >the file system structure is modified, say, to extend a file to allow >it to contain more data, but the data itself is not written, then upon >a restore, with a system such as ext2, or ext3 with writeback, or xfs, >it is possible that the end of the file, even the postgres log file, >will contain a random block of data from the disk. If this random block >of data happens to look like a valid xlog block, it may be played back, >and the database corrupted. you're conflating a whole lot of different issues here. You're ignoring the fact that postgres preallocates the xlog segment, you're ignoring the fact that you can sync a directory entry, you're ignoring the fact that syncing some metadata (such as atime) doesn't matter (only the block allocation is important in this case, and the blocks are pre-allocated). >This is also wrong. fsck is needed because the file system is broken. nope, the file system *may* be broken. the dirty flag simply indicates that the filesystem needs to be checked to find out whether or not it is broken. >I don't mean to be offensive, but I won't accept what you say, as it does >not make sense with my understanding of how file systems work. :-) <shrug> I'm not getting paid to convince you of anything. Mike Stone ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Tue, Aug 15, 2006 at 04:58:59PM -0400, Michael Stone wrote: > On Tue, Aug 15, 2006 at 03:39:51PM -0400, mark@mark.mielke.cc wrote: > >No. This is not true. Updating the file system structure (inodes, indirect > >blocks) touches a separate part of the disk than the actual data. If > >the file system structure is modified, say, to extend a file to allow > >it to contain more data, but the data itself is not written, then upon > >a restore, with a system such as ext2, or ext3 with writeback, or xfs, > >it is possible that the end of the file, even the postgres log file, > >will contain a random block of data from the disk. If this random block > >of data happens to look like a valid xlog block, it may be played back, > >and the database corrupted. > you're conflating a whole lot of different issues here. You're ignoring > the fact that postgres preallocates the xlog segment, you're ignoring > the fact that you can sync a directory entry, you're ignoring the fact > that syncing some metadata (such as atime) doesn't matter (only the > block allocation is important in this case, and the blocks are > pre-allocated). Yes, no, no, no. :-) I didn't know that the xlog segment only uses pre-allocated space. I ignore mtime/atime as they don't count as file system structure changes to me. It's updating a field in place. No change to the structure. With the pre-allocation knowledge, I agree with you. Not sure how I missed that in my reviewing of the archives... I did know it pre-allocated once upon a time... Hmm.... > >This is also wrong. fsck is needed because the file system is broken. > nope, the file system *may* be broken. the dirty flag simply indicates > that the filesystem needs to be checked to find out whether or not it is > broken. Ah, but if we knew it wasn't broken, then fsck wouldn't be needed, now would it? So we assume that it is broken. A little bit of a game, but it is important to me. If I assumed the file system was not broken, I wouldn't run fsck. I run fsck, because I assume it may be broken. If broken, it indicates potential corruption. The difference for me, is that if you are correct, that the xlog is safe, than for a disk that only uses xlog, fsck is not ever necessary, even after a system crash. If fsck is necessary, then there is potential for a problem. With the pre-allocation knowledge, I'm tempted to agree with you that fsck is not ever necessary for partitions that only hold a properly pre-allocated xlog. > >I don't mean to be offensive, but I won't accept what you say, as it does > >not make sense with my understanding of how file systems work. :-) > <shrug> I'm not getting paid to convince you of anything. Just getting you to back up your claim a bit... As I said, no intent to offend. I learned from it. Thanks, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ .. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Tue, Aug 15, 2006 at 05:38:43PM -0400, mark@mark.mielke.cc wrote: > I didn't know that the xlog segment only uses pre-allocated space. I > ignore mtime/atime as they don't count as file system structure > changes to me. It's updating a field in place. No change to the structure. > > With the pre-allocation knowledge, I agree with you. Not sure how I > missed that in my reviewing of the archives... I did know it > pre-allocated once upon a time... Hmm.... This is only valid if the pre-allocation is also fsync'd *and* fsync ensures that both the metadata and file data are on disk. Anyone actually checked that? BTW, I did see some anecdotal evidence on one of the lists a while ago. A PostgreSQL DBA had suggested doing a 'pull the power cord' test to the other DBAs (all of which were responsible for different RDBMSes, including a bunch of well known names). They all thought he was off his rocker. Not too long after that, an unplanned power outage did occur, and PostgreSQL was the only RDBMS that recovered every single database without intervention. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Tue, Aug 15, 2006 at 05:20:25PM -0500, Jim C. Nasby wrote: > This is only valid if the pre-allocation is also fsync'd *and* fsync > ensures that both the metadata and file data are on disk. Anyone > actually checked that? fsync() does that, yes. fdatasync() (if it exists), OTOH, doesn't sync the metadata. /* Steinar */ -- Homepage: http://www.sesse.net/ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Tue, 15 Aug 2006 mark@mark.mielke.cc wrote: >>> This is also wrong. fsck is needed because the file system is broken. >> nope, the file system *may* be broken. the dirty flag simply indicates >> that the filesystem needs to be checked to find out whether or not it is >> broken. > > Ah, but if we knew it wasn't broken, then fsck wouldn't be needed, now > would it? So we assume that it is broken. A little bit of a game, but > it is important to me. If I assumed the file system was not broken, I > wouldn't run fsck. I run fsck, because I assume it may be broken. If > broken, it indicates potential corruption. note tha the ext3, reiserfs, jfs, and xfs developers (at least) consider fsck nessasary even for journaling fileysstems. they just let you get away without it being mandatory after a unclean shutdown. David Lang ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| "Steinar H. Gunderson" <sgunderson@bigfoot.com> writes: > On Tue, Aug 15, 2006 at 05:20:25PM -0500, Jim C. Nasby wrote: >> This is only valid if the pre-allocation is also fsync'd *and* fsync >> ensures that both the metadata and file data are on disk. Anyone >> actually checked that? > fsync() does that, yes. fdatasync() (if it exists), OTOH, doesn't sync the > metadata. Well, the POSIX spec says that fsync should do that ;-) My guess is that most/all kernel filesystem layers do indeed try to sync everything that the spec says they should. The Achilles' heel of the whole business is disk drives that lie about write completion. The kernel is just as vulnerable to that as any application ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| Hi, Jim, Jim C. Nasby wrote: > Well, if the controller is caching with a BBU, I'm not sure that order > matters anymore, because the controller should be able to re-order at > will. Theoretically. > somewhere would be great. Well, actually, the controller should not reorder over write barriers. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |