Hi, Alex,
Comments below.
Axel Schwenke wrote:
> Jerry Stuckle <jstucklex@attglobal.net> wrote:
>
>>Axel Schwenke wrote:
>>
>>>Jerry Stuckle <jstucklex@attglobal.net> wrote:
>>>
>>>
>>>>So? If the file itself is corrupted, all it will do is recover a
>>>>corrupted file. What's the gain there?
>>>
>>>The gain is, that you have a chance to recover at all. With no files,
>>>there is *no* way to recover.
>>
>>What you don't get it that it's not the presence or absence of the files
>>- it's the CONTENTS of the files that matters.
>
>
> Agreed. But Alf worried he could lose whole tables aka files.
>
>
>>There is very little
>>chance you will lose the files completely in the case of a crash. There
>>is a much bigger (although admittedly still small) that the files will
>>be corrupted. And a huge chance if you have more than one table your
>>database will be inconsistent.
>>
>>
>>>However, thats not a real problem. MySQL never touches the datafile
>>>itself once it is created. Only exception: REPAIR TABLE. This will
>>>recreate the datafile (as new file with extension .TMD) and then
>>>rename files.
>>
>>Excuse me? MySQL ALWAYS touches the data file.
>
>
> Sorry, I didn't express myself clear here: MyISAM never touches the
> metadata for a data file. The file itself is created with CREATE TABLE.
> Later on there is data appended to the file or some block inside the
> file is modified. But the file itself stays there and there is
> virtually no chance to lose it. So indeed there is no gain from using
> a filesystem with metadata journaling (in fact most "journaling"
> filesystems use the journal only for metadata).
>
>
>>And it is constantly rewriting the files to disk.
>
> ...
>
>>Yes, I know exactly how MySQL works. Yep, and it has to rewrite a
>>portion of the file to do all of this.
>
>
> What do you call "rewrite"?
>
> Of cource MySQL writes modified data. MySQL never reads an otherwise
> unmodified record and rewrites it somewhere else.
>
Just what you are calling it. It reads in a block of data and writes it
back out to disk.
Even in variable length rows where the new row is longer than the old
one and MySQL appends it to the end of the file, MySQL has to go back
and rewrite the original row to mark it as invalid.
>
>>>Most file operations on MyISAM tables are easier, faster and less
>>>risky, if the table uses fixed length records. Then there is no need to
>>>collapse adjacent unused records into one, UPDATE can be done in place,
>>>there will be no fragmentation and such.
>>
>>... what happens if the row spans a disk and the
>>system crashes between writes, for instance? Depending on exactly where
>>the block was split, you could completely screw up that row, bug be very
>>difficult to detect. Sure, it's only one row. But data corruption like
>>this can be much worse than just losing a row. The latter is easier to
>>determine.
>
>
> Agreed. But then again I don't know how *exactly* MyISAM does those
> nonatomic writes. One could imagine that the record is first written
> with a "this record is invalid" flag set. As soon as the complete
> record was written successfully, this flag is cleared in an atomic
> write. I know Monty is very fond of atomic operations.
>
Part of it is MyISAM. But part of it is the OS, also. For instance,
what happens if the row spans two physical blocks of data which are not
contiguous? In that case the OS has to write the first block, seek to
the next one and write that one.
There isn't anything Monty can do about that, unfortunately.
> But still there is no difference to what I said: If mysqld crashes,
> there is a good chance that all records that mysqld was writing to
> are damaged. Either incomplete or lost or such.
>
That is true.
> However, there is only very little chance to lose data that was not
> written to at the time of the crash.
>
Actually, you would lose all data which wasn't written to the disk.
> Dynamic vs. fixed format: Dynamic row format is susceptible to the
> following problem: imagine there is a hole between two records that
> will be filled by INSERT. The new record contains information about
> its used and unused length. While writing the record, mysqld crashes
> and garbles the length information. Now this record could look longer
> than the original hole and shadow one or more of the following
> (otherwise untouched) records. This would be hard to spot. Similar
> problems exist with merging holes.
>
Yep, a serious problem.
> Fixed length records don't have this problem and are therefore more
> robust.
>
I agree there. But there can be other problems as I noted before. And
a single corrupted row may be worse than a completely crashed dataset
because it's so difficult to find that row. For instance - let's say we
have a bank account number which is a string and spans two blocks.
Someone makes a $10M deposit to your account. In the middle MySQL
crashes. The account number is now incorrect - the first 1/2 has been
written to one block but the 2nd 1/2 never made it out. So it credited
the deposit to my account.
Wait a sec - I LIKE that idea! :-)
>
>>>The MyISAM engine is quite simple. Data and index are held in separate
>>>files. Data is structured in records. Whenever a record is modified,
>>>it's written to disk immediately (however the operation system might
>>>cache this). MyISAM never touches records without need. So if mysqld
>>>goes down while in normal operation, only those records can be damaged
>>>that were in use by active UPDATE, DELETE or INSERT operations.
>>
>>But the caching is all too important. It's not unusual to have hundreds
>>of MB of disk cache in a busy system. That's a lot of data which can be
>>lost.
>
>
> Sure. But this problem was out of scope. We didn't talk about what
> happens if the whole machine goes down, only what happens if mysqld
> crashes.
>
> Having the whole system crashing is also hard for "real" database
> engines. I remember several passages in the InnoDB manual about
> certain operating systems ignoring O_DIRECT for the tx log. Also
> there may be "hidden" caches in disk controllers and in the disks.
>
Agreed it's a problem. Most databases handle this with a log/journal
which writes directly to the file system and doesn't return until the
record is written. Once that is done, the real data is written
asynchronously to the tables.
In that way a crash loses at most the last record written (in the case
of an incomplete journal entry). But it still needs a consistent point
(i.e. a backup) to roll forward the log from.
But, as you pointed out, not all OS's support this. They should,
however, for critical data.
And BTW - some even have an option to have their own file system which
is not dependent on the OS at all. They are just provided with a space
on the disk (i.e. a partition) and handle their own I/O completely.
This, obviously, is the most secure because the RDB can handle corrupted
files - they know both the external and internal format for the data.
It's also the most efficient. But it's the hardest to implement.
>
> XL
> --
> Axel Schwenke, Senior Software Developer, MySQL AB
>
> Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
> MySQL User Forums: http://forums.mysql.com/
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================