Date: Fri, 25 Mar 2011 07:41:24 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: deadlock or bad disk ? RELENG_8 Message-ID: <20110325144124.GA29033@icarus.home.lan> In-Reply-To: <716F58E43EA845E3A2228F020D85DFBB@multiplay.co.uk> References: <20100718211415.GA84127@icarus.home.lan> <201007182142.o6ILgDQW044046@lava.sentex.ca> <20100719023419.GA91006@icarus.home.lan> <201007190301.o6J31Hs1045607@lava.sentex.ca> <20100719033424.GA92607@icarus.home.lan> <201007191237.o6JCbmj7049339@lava.sentex.ca> <20100719203320.GB21088@icarus.home.lan> <011E0838F0EC4E80885646C70B34F8AB@multiplay.co.uk> <4D8C9495.90103@sentex.net> <716F58E43EA845E3A2228F020D85DFBB@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 25, 2011 at 01:59:01PM -0000, Steven Hartland wrote: > ----- Original Message ----- From: "Mike Tancsa" <mike@sentex.net> > > >I would say probably the disk mostly. Perhaps a driver or firmware bug > >on the Areca. Hard to say. The drive totally failed a month or so > >later. Also, moved to a later firmware on the areaca controller after > >that and all has been quite stable on the box except for an odd em > >driver bug. However, version 7.2.2 fixed that. > > Thanks for that Mike, been having some problems on a core box here where > it would just hang for periods of time during which disk IO would drop > to nothing and then it would just suddenly recover. We suspect one of > the disks is at fault but as with you said disk hasnt failed so we > where just going on the decreased smart values. I apologise in advance if I have already reviewed your situation, but if you could please provide full "smartctl -a" output for the disk, I can review the data to see if anything looks out of place. An example: on some (not all) Western Digital "Green" disks, including enterprise models, Attribute 193 showing an extremely large RAW_VALUE (in the tens of thousands, if not more) indicates the drive is trying to park its actuator arm/heads constantly. The result: abysmal performance. One cannot key off of the firmware version as an indicator (WD does not always increase/change the firmware string). Some users have been able to get WD to admit the problem + provide them a fixed firmware. My point: looking at SMART attributes doesn't help unless you know exactly what to look for, are familiar with all the quirks of drive models, and basically act as an information sponge (subscribe to all sorts of mailing lists, talk to users, help generic non-technical end users out, etc.). It takes up a lot of my time, but I try my best. Sometimes I feel like my brain needs checksumming... -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110325144124.GA29033>