Date: Sun, 14 Apr 2013 14:58:15 -0400 From: Zaphod Beeblebrox <zbeeble@gmail.com> To: Jeremy Chadwick <jdc@koitsu.org> Cc: freebsd-fs <freebsd-fs@freebsd.org>, =?UTF-8?B?UmFkaW8gbcS5P29keWNoIGJhbmR5dMSCxYJ3?= <radiomlodychbandytow@o2.pl>, support@lists.pcbsd.org Subject: Re: A failed drive causes system to hang Message-ID: <CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw@mail.gmail.com> In-Reply-To: <20130414185117.GA38259@icarus.home.lan> References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
I'd like to throw in my two cents here. I've seen this (drives in RAID-1 configuration) hanging whole systems. Back in the IDE days, two drives were connected with one cable --- I largely wrote it off as a deficiency of IDE hardware and resolved to by SCSI hardware for more important systems. Of late, the physical hardware for SCSI (SAS) and SATA drives have converged. I'm willing to accept that SAS hardware may be built to a different standard, but I'm suspicious of the fact that a bad SATA drive on an ACH* controller can hang the whole system. ... it's not complete, however. Often pulling the drive's cable will unfreeze things. It's also not entirely consistent. Drives I have behind 4:1 port multipliers haven't (so far) hung the system that they're on (which uses ACH10). Right now, I have a remote ACH10 system that's hung hard a couple of times --- and it passes both it's short and long SMART tests on both drives. Is there no global timeout we can depend on here?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0Mebufi5=bEsu6MF03NCn6gDmKkx-OP3sP14t3Xe3CXdpw>