Grant wrote:
> Hi Karl,
> On Thu, 22 Sep 2005 11:29:21 +1000, a <a@a> wrote:
>
>>Lol fair enough, I think this is the human interaction Usenet became
>>famous for? Forgive me I'm new to newsgroups.. anyways I'd still be
>>curious if any non-trolling newsreaders had any thoughts on the power
>>shortage being an influence on the start of HD death.
>
>
> Of course, weak power is first suspect. When the obvious answers fail,
> time to _really_ think about creating test cases. I had a nasty one a
> couple months back, data corruption due to NIC, SATA HDD, disk
> controller or what?
>
> MD5 not as useful to me as diffing good/bad data (not that I knew which
> was which) and a pattern emerged: bit 0x20 was failing to be a one,
> very occasionally.
>
> Define a test: the linux-kernel compile / diff took too long, plus I
> needed a test that was OS independent, linux or windows. What worked
> was copy/compare a .iso image file. Sometimes would get a single byte
> error in one of five tries.
>
> linux: cp, cmp
> winxp: copy, fc /b
>
> Eliminate the suspects: Swapped power supply, OS, disk --> leaving
> memory, the shop swapped memory stick for new one, problem gone.
>
> This box would pass memtest86, and was built new in April.
>
> In hindsight the clues were occasional kernel source tree corruption,
> or a very rare segfault. I use 'cp -al' a lot which makes for a fragile
> file system and was blaming 'finger trouble' when source trees went off.
>
>
>
> However, the fault you describe reminds me of last time I used ext3
> and getting a lockup with nothing in the logs.
>
> What kernel you using? Some subtle filesystem interaction bugs were
> ironed out in recent months, I assume ext3 is now reliable again.
>
>
> Power supplies do wear out, the ripple current takes its toll on the
> main filter capacitors, as does overloading or running at full load.
> Open the thing and check if the big filter caps have a domed top,
> they supposed to be flat. If the power supply is too small, poor
> load regulation will cause things to go wrong, but when I've had
> that it was an obvious failure after things warmed up, and removing
> a drive resolved the issue.
>
> Cheers,
> Grant.
>
Hmmm yeah I skipped over most of the personal interaction of some posts,
sorry but I gotta study for a Cisco linux test, anyways yeah the latest
hd to start clicking is a Western Digital that was heating up a fair
bit. Isn't anymore but that's cause I've taken most of the data off it.
Unfortunately or fortunately (depending how you look at it) I've moved
house so the old power supply I was using before isn't the same. This HD
has taken longer to show signs of clicking though, about one year.
And to be honest I didn't follow you with most of the methods you used
to test it under windows and linux.. hmmm maybe if it was in a slightly
more user friendly explanation :P
Haha nice work to the guy who quit smoking.. yeah I'm on that wagon too
atm
Cheers and best wishes,
Karl