Unix Technical Forum

An observation

This is a discussion on An observation within the Slackware Linux Support forums, part of the Unix Operating Systems category; --> I have been using Slackware for over 10 years now and I know this has nothing to do with ...


Go Back   Unix Technical Forum > Unix Operating Systems > Slackware Linux Support

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-20-2008, 10:59 AM
WIdgeteye
 
Posts: n/a
Default An observation


I have been using Slackware for over 10 years now and I know this has
nothing to do with Slack but I thought I would throw that in there for the
hell of it.

Anyway over the last ten years I have noticed something that's a little
disturbing as far as the filesystem/hardrives are concerned.

Over the years I have been downloading, testing and then rming very large
files. The drives I have used to do this work on have always inevitably
failed. I have been through 5 drives in the last ten years, and now the
sixth is starting to fail.

The symptoms are always the same, I will be working with a large file and
there will be a lockup, not a hard lockup but the more I mess with the
computer trying to kill the offending app the harder the lockup gets until
I finally have to just do a hard reset. As the 'puter reboots, it of
course, runs e2fsck and finds a few bad inodes and fixes things up and
we're back to normal again. But as time goes by, over several weeks maybe,
the problem gets worse and worse as far as working with large files is
concerned.

The computer may run fine as long as I just use the computer normally,
such as surfing the net and doing email and that sort of thing. But if I
do anything with large files there will be a crash to contend with and I
finally have to replace the drive.

I have tried reformatting to see if a fresh filesystem would help but to
no avail.

I really don't think it is the fault of the drives but I could be wrong
there too, it's been known to happen.
And before you ask, I use Western Digital drives. I use western digital
for the root filesystem and the data drives. I have never had a failure
on the root filesystem drive. Just the work drive.
BTW, my computers run 24/7 the uptime on this one was 206 days until this
last problem began.


Here are some of my thoughts on the situation:

The constant writing and erasing of the drive just wears the media itself
out.

But why??? Aren't the drives still magnetic media? Ya know, it's been so
long since I studied hardware I don't even know what they have been doing
in the advancement of harddrives in the past few years.

I'm just writing this to get some others' thoughts on this situation.
Your thoughts and suggestions are all welcome.

Thanks,
Widgeteye
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-20-2008, 10:59 AM
Sylvain Robitaille
 
Posts: n/a
Default Re: An observation

WIdgeteye wrote:

> Over the years I have been downloading, testing and then rming very large
> files. The drives I have used to do this work on have always inevitably
> failed. I have been through 5 drives in the last ten years, and now the
> sixth is starting to fail.


Are you sure the problem is with the disk hardware? By your description
of the problem, I would tend to suspect either a shortage of memory, or a
failure in physical memory, not the disk. Whe you bring the system back
up after a hard reset, I would expect e2fsk to find errors on the file
system containing the file you were working on, almost "by definition".

The more times you do this, the more errors I would expect e2fsck to
find. The file was not unlikely to have become corrupted after the
first hard-reset.

> The computer may run fine as long as I just use the computer normally,
> such as surfing the net and doing email and that sort of thing.


Generally low memory requirement types of things, yes. What sorts of
work are you doing on the large files? How large are they? Can you
split them up (see the split manual page) and work on them in sections,
to determine whether or not the same operations done on smaller files
exhibit similar behaviour?

> But if I do anything with large files there will be a crash to contend
> with and I finally have to replace the drive.


Please define "do anything". There are people who use Linux for
processing large audio files, for example, and do not have to replace
disks every couple of years. In fact, the newest disk on any of my
computers is _in_ my audio workstation, and it's about 3 to 5 years
old. (mind you that system has taken to failing to boot, but that looks
like a BIOS problem, not a disk failure or file system problem ... I
need to spend some time working on it before I can use it again ...)

> I have tried reformatting to see if a fresh filesystem would help but
> to no avail.


Assuming you have a problem elsewhere, that isn't surprising.

> I really don't think it is the fault of the drives but I could be wrong
> there too, it's been known to happen.


I don't think the drives are at fault either. One drive failing I would
believe, maybe even two, but this many drives "failing" this consistently
suggests to me that they never failed in the first place and that the
problem is elsewhere.

> The constant writing and erasing of the drive just wears the media
> itself out.


No.

I manage a news server (which writes and erases files constantly) that
has been running for years without any interruption, let alone failure.
The only interruptions this system has seen in the last 5 years or so have
been when we needed to move the physical system to another location in
the machine room, or more recently when we upgraded it from a commercial
Unix system to Slackware Linux on newer hardware.

I also manage mail servers which spend all their time writing and erasing
files (approximately 200K messages per day among four mail servers).
Disks can (and sometimes do) fail, but they don't "wear out" just because
the system is writing and erasing files all the time. If they did,
I'd have to recommend changing to a different brand of disks.

> But why??? Aren't the drives still magnetic media?


Yes, unless you're using CompactFlash cards, USB keychain drives, or
other drives of that sort.

> Ya know, it's been so long since I studied hardware I don't even
> know what they have been doing in the advancement of harddrives in
> the past few years.


If we're talking about "regular" hard disks, they've been miniaturizing
them, increasing the data density, increasing rotational speeds, and
reducing manufacturing costs. There have been some advances in the
interfaces from hard drives to the rest of the system (serial ATA, for
example), but the physical drive itself is basically and conceptually
the same as it's been for years. (smaller, faster, cheaper, perhaps,
but it's still some number of rotating platters, each with a magnetic
head floating just beyond the surface ...)

--
----------------------------------------------------------------------
Sylvain Robitaille syl@alcor.concordia.ca

Systems analyst Concordia University
Instructional & Information Technology Montreal, Quebec, Canada
----------------------------------------------------------------------
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-20-2008, 10:59 AM
Miguel De Anda
 
Posts: n/a
Default Re: An observation

WIdgeteye wrote:

>
> Over the years I have been downloading, testing and then rming very large
> files. The drives I have used to do this work on have always inevitably
> failed. I have been through 5 drives in the last ten years, and now the
> sixth is starting to fail.
>


Funny you should mention this... just last night I was playing with vmware
and it kept crashing on me. I was trying to setup slackware 10.2 on it and
it seemed that whenever I created partitions on the virtual drive it would
lock up the system. It was a hard lock and I couldn't do anything except
hit the reset button (I couldn't even ssh into the machine). Anyway, this
happend 3 times in a row then I decided to do it differently. I create the
new virtual machine again and had it allocate the disk space immediately
and this caused it to lock up again. I was pretty sure it was a disk
failure.

I just so happened to have an extra drive on the computer (empty) so I
proceeded to create the virtual machine on this drive. It worked fine on
the first try. The failing disk is only about 4 months old and since its a
sata drive, I can't seem to get smartd stats on it.

> The symptoms are always the same, I will be working with a large file and
> there will be a lockup, not a hard lockup but the more I mess with the
> computer trying to kill the offending app the harder the lockup gets until
> I finally have to just do a hard reset. As the 'puter reboots, it of
> course, runs e2fsck and finds a few bad inodes and fixes things up and
> we're back to normal again. But as time goes by, over several weeks maybe,
> the problem gets worse and worse as far as working with large files is
> concerned.
>


I've also seen this happen on my last few drives. I've denied the possibly
of it being "linux" breaking my hardware but possibly I was just fooling
myself. Are different filesystems more/less prone to hardware damage? Would
reiserfs be better? I've stuck with ext3 for a while since people seem to
suggest it for its maturity.

> I really don't think it is the fault of the drives but I could be wrong
> there too, it's been known to happen.
> And before you ask, I use Western Digital drives. I use western digital
> for the root filesystem and the data drives. I have never had a failure
> on the root filesystem drive. Just the work drive.
> BTW, my computers run 24/7 the uptime on this one was 206 days until this
> last problem began.


Mine was only 12 days (booted to windows to play a game a few weeks ago).

-Miguel

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-20-2008, 10:59 AM
No_One
 
Posts: n/a
Default Re: An observation

On 2005-09-21, WIdgeteye <None@none.none> wrote:
>
> I have been using Slackware for over 10 years now and I know this has
> nothing to do with Slack but I thought I would throw that in there for the
> hell of it.
>
> I really don't think it is the fault of the drives but I could be wrong
> there too, it's been known to happen.
> And before you ask, I use Western Digital drives. I use western digital
> for the root filesystem and the data drives. I have never had a failure
> on the root filesystem drive. Just the work drive.
> BTW, my computers run 24/7 the uptime on this one was 206 days until this
> last problem began.


My systems run 24/7 and all use Western Digital. Up until last year I was
still using a Western Digital 40meg that ran 24/7 for misc or old compressed
data storage..that drive was almost 18 years old.

Like you the bulk of the files I edit or work with are large, by my standards
10-15 megs, and I've never had the problems you discribe with Western Digital.

I've had, and continue to have constant problems with laptop hard
drives...they always fail after several years...but never a hard drive
failure like you describe on any of the desktops and never with such frequency.

I can only hazard a wild guess and say your problems might not relate to
your hard drive....have you considered using a different file system like
Reiser or try any number of dard drive utilities to test your drives.
I've heard, and I stress no first hand knowledge here, of hard drives
failing due to excessive heat in the box or poor ambient conditions, to much
moisture in the air, etc....just a thought.


ken



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-20-2008, 10:59 AM
WIdgeteye
 
Posts: n/a
Default Re: An observation

On Wed, 21 Sep 2005 17:07:49 +0000, No_One wrote:


> Like you the bulk of the files I edit or work with are large, by my standards
> 10-15 megs, and I've never had the problems you discribe with Western Digital.


The files I work with are in the hundreds of meg over into the gig sizes.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-20-2008, 10:59 AM
WIdgeteye
 
Posts: n/a
Default Re: An observation

On Wed, 21 Sep 2005 10:04:33 -0700, Miguel De Anda wrote:

>
> I've also seen this happen on my last few drives. I've denied the possibly
> of it being "linux" breaking my hardware but possibly I was just fooling
> myself. Are different filesystems more/less prone to hardware damage? Would
> reiserfs be better? I've stuck with ext3 for a while since people seem to
> suggest it for its maturity.


I too have been using ext3 for the last couple of years. The problem made
the change from ext2 to ext3 too.

I have never tried reiserfs. I don't know dude, I'm in listening mode
right now.




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-20-2008, 11:00 AM
=?iso-8859-1?Q?Ib_H=F8jme?=
 
Posts: n/a
Default Re: An observation

On Wed, 21 Sep 2005 18:14:38 +0200, WIdgeteye <None@none.none> wrote:

>
> I have been using Slackware for over 10 years now and I know this has
> nothing to do with Slack but I thought I would throw that in there for
> the
> hell of it.
>
> Anyway over the last ten years I have noticed something that's a little
> disturbing as far as the filesystem/hardrives are concerned.
>
> Over the years I have been downloading, testing and then rming very large
> files. The drives I have used to do this work on have always inevitably
> failed. I have been through 5 drives in the last ten years, and now the
> sixth is starting to fail.
>


I have experienced the same issues/problems over the years.

My 'problem' is that I insist on re-building my machine so some times I
end up with a strange mix of hw :-) But I have always been able to pinpoint
disk failures to faulty discs by running a manufacturers diagnostic
utility.
This has the added bonus of being able to get a replacement disc if the
faulty one still is under warranty.

I have also come to the conclusion that if a disc fails (with ensuing fsck
repairs) there is only one thing you can do - salvage as much data as
possible,
and reformat the disc. This will detect and remap sector defects and you
will
probably save yourself the trouble of recovering from a totally trashed
filesystem.
I think that the discs are suspect to wear 'n tear due to heat, vibration,
aso.
So if you only repair the filesystem the underlying problem, a bad sector,
will
show uo again.

I have been using ext2, ext3 and reiserfs and it makes no difference, once
a disc
has problems it will only get worse over time. And i have not been working
with
very large files.

The above is just my experience and gut feelings accumulated over
10+ years,
nothing scientific :-)


Best regards

Ib
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-20-2008, 11:00 AM
No_One
 
Posts: n/a
Default Re: An observation

On 2005-09-21, WIdgeteye <None@none.none> wrote:
> On Wed, 21 Sep 2005 17:07:49 +0000, No_One wrote:
>
>
>> Like you the bulk of the files I edit or work with are large, by my standards
>> 10-15 megs, and I've never had the problems you discribe with Western Digital.

>
> The files I work with are in the hundreds of meg over into the gig sizes.
>
>


ahh..well, a gigzilla user if ever I've seen one. Your's is bigger than
mine, no question about it.

ken
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 02-20-2008, 11:00 AM
WIdgeteye
 
Posts: n/a
Default Re: An observation

On Wed, 21 Sep 2005 16:48:52 +0000, Sylvain Robitaille wrote:


> Are you sure the problem is with the disk hardware? By your description
> of the problem, I would tend to suspect either a shortage of memory, or a
> failure in physical memory, not the disk. Whe you bring the system back
> up after a hard reset, I would expect e2fsk to find errors on the file
> system containing the file you were working on, almost "by definition".



It isn't hardware, I have 1 gig of memory and have been through 4 mother
board and processor upgrades in just the last few years. The problem
remains.



> The more times you do this, the more errors I would expect e2fsck to
> find. The file was not unlikely to have become corrupted after the
> first hard-reset.
>
>> The computer may run fine as long as I just use the computer normally,
>> such as surfing the net and doing email and that sort of thing.

>
> Generally low memory requirement types of things, yes. What sorts of
> work are you doing on the large files?


par2 r file.par
rar e file.rar
burn to dvd
erase.

Get the picture??

> How large are they?


From 700 meg to 1.4 gig for most.

> can you split them up (see the split manual page) and work on them in
> sections, to determine whether or not the same operations done on
> smaller files exhibit similar behaviour?


no


>> But if I do anything with large files there will be a crash to contend
>> with and I finally have to replace the drive.

>
> Please define "do anything".


See above


>> I have tried reformatting to see if a fresh filesystem would help but
>> to no avail.

>
> Assuming you have a problem elsewhere, that isn't surprising.


There's no problem elsewhere.







>> The constant writing and erasing of the drive just wears the media
>> itself out.

>
> No.
>
> I manage a news server (which writes and erases files constantly) that


HUndreds of megs and gigs at a time?



> I also manage mail servers which spend all their time writing and
> erasing files (approximately 200K messages per day among four mail
> servers). Disks can (and sometimes do) fail, but they don't "wear out"
> just because the system is writing and erasing files all the time. If
> they did, I'd have to recommend changing to a different brand of disks.


200k per day != 700M - 1.5G per hour


Thanks

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 02-20-2008, 11:00 AM
Keith Keller
 
Posts: n/a
Default Re: An observation

On 2005-09-21, WIdgeteye <None@none.none> wrote:
>
> I'm just writing this to get some others' thoughts on this situation.
> Your thoughts and suggestions are all welcome.


Could there be environmental issues that are plaguing your hardware?
Maybe it's particularly hot, dusty, or some other factor that's causing
you more problems than normal?

Just a wild guess, but certainly not outside the realm of possibility.

--keith


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:55 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com