Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 Apr 2013 14:58:15 -0400
From:      Zaphod Beeblebrox <zbeeble@gmail.com>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, =?UTF-8?B?UmFkaW8gbcS5P29keWNoIGJhbmR5dMSCxYJ3?= <radiomlodychbandytow@o2.pl>, support@lists.pcbsd.org
Subject:   Re: A failed drive causes system to hang
Message-ID:  <CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw@mail.gmail.com>
In-Reply-To: <20130414185117.GA38259@icarus.home.lan>
References:  <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
I'd like to throw in my two cents here.  I've seen this (drives in RAID-1
configuration) hanging whole systems.  Back in the IDE days, two drives
were connected with one cable --- I largely wrote it off as a deficiency of
IDE hardware and resolved to by SCSI hardware for more important systems.
Of late, the physical hardware for SCSI (SAS) and SATA drives have
converged.  I'm willing to accept that SAS hardware may be built to a
different standard, but I'm suspicious of the fact that a bad SATA drive on
an ACH* controller can hang the whole system.

... it's not complete, however.  Often pulling the drive's cable will
unfreeze things.  It's also not entirely consistent.  Drives I have behind
4:1 port multipliers haven't (so far) hung the system that they're on
(which uses ACH10).  Right now, I have a remote ACH10 system that's hung
hard a couple of times --- and it passes both it's short and long SMART
tests on both drives.

Is there no global timeout we can depend on here?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw>