Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Mar 2011 07:41:24 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: deadlock or bad disk ?  RELENG_8
Message-ID:  <20110325144124.GA29033@icarus.home.lan>
In-Reply-To: <716F58E43EA845E3A2228F020D85DFBB@multiplay.co.uk>
References:  <20100718211415.GA84127@icarus.home.lan> <201007182142.o6ILgDQW044046@lava.sentex.ca> <20100719023419.GA91006@icarus.home.lan> <201007190301.o6J31Hs1045607@lava.sentex.ca> <20100719033424.GA92607@icarus.home.lan> <201007191237.o6JCbmj7049339@lava.sentex.ca> <20100719203320.GB21088@icarus.home.lan> <011E0838F0EC4E80885646C70B34F8AB@multiplay.co.uk> <4D8C9495.90103@sentex.net> <716F58E43EA845E3A2228F020D85DFBB@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 25, 2011 at 01:59:01PM -0000, Steven Hartland wrote:
> ----- Original Message ----- From: "Mike Tancsa" <mike@sentex.net>
> 
> >I would say probably the disk mostly. Perhaps a driver or firmware bug
> >on the Areca.  Hard to say.  The drive totally failed a month or so
> >later.  Also, moved to a later firmware on the areaca controller after
> >that and all has been quite stable on the box except for an odd em
> >driver bug. However, version 7.2.2 fixed that.
> 
> Thanks for that Mike, been having some problems on a core box here where
> it would just hang for periods of time during which disk IO would drop
> to nothing and then it would just suddenly recover. We suspect one of
> the disks is at fault but as with you said disk hasnt failed so we
> where just going on the decreased smart values.

I apologise in advance if I have already reviewed your situation, but if
you could please provide full "smartctl -a" output for the disk, I can
review the data to see if anything looks out of place.

An example: on some (not all) Western Digital "Green" disks, including
enterprise models, Attribute 193 showing an extremely large RAW_VALUE
(in the tens of thousands, if not more) indicates the drive is trying to
park its actuator arm/heads constantly.  The result: abysmal
performance.  One cannot key off of the firmware version as an indicator
(WD does not always increase/change the firmware string).  Some users
have been able to get WD to admit the problem + provide them a fixed
firmware.

My point: looking at SMART attributes doesn't help unless you know
exactly what to look for, are familiar with all the quirks of drive
models, and basically act as an information sponge (subscribe to all
sorts of mailing lists, talk to users, help generic non-technical
end users out, etc.).  It takes up a lot of my time, but I try my best.
Sometimes I feel like my brain needs checksumming...

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110325144124.GA29033>