Date: Tue, 24 Apr 2012 12:55:06 -0500 From: Dustin Wenz <dustinwenz@xtechllc.com> To: freebsd-stable@freebsd.org Subject: Can MPS discard a misbehaving disk? Message-ID: <17DD4C39-6905-4A5B-AE86-87F149CBD5BC@xtechllc.com>
next in thread | raw e-mail | index | archive | help
I am having trouble with MPS becoming unresponsive in certain disk = failure conditions. So far, I've experienced this with 3TB Hitachi disks = (0S03208) and 3TB Seagate Barracuda disks (ST3000DM001, firmware CC9D) = while using the MPS driver with an LSI SAS2116 controller on FreeBSD = 8.2-STABLE. In these particular instances, the disks are part of a zpool of mirrors. = When a disk fails, I generally see a message like "kernel: = (da5:mps0:0:5:0): SCSI command timeout on device handle 0x0017 SMID = 148", followed by an indefinite number of "mps0: (0:5:0) terminated ioc = 804b scsi 0 state c xfer 65536" messages. What I would want to happen in this case is for the disk to simply go = offline in the zpool, in order for the pool to continue functioning. = However, the pool status still shows the disk as online. Any attempts to = disable the disk (such as with zpool offline, remove, or detach) will = hang and never complete, as will attempting a rescan with camcontrol. Of = course, any attempts to access data in the pool will hang as well. Rebooting the system in this state is also bad; when the disk is first = discovered, it will begin a cycle of mps scsi errors during startup that = never seem to stop. The only way to recover, at least that I know of, is = to physically remove the disk from the chassis. Once I do that, the = system continues running perfectly. Basically my question is this: How can I get MPS to ignore a failed disk = and never attempt to access it again? I don't care if it does so = automatically, or I if I need to perform some administrative operation = to drop the device reference. I've seen a number of people on the list = having problems that appear similar to this; but those seem more to do = with firmware or compatibility issues. I my case, these disks are = definitely dead... they no longer work in any other systems, and often = make sad clicking noises. I suppose this is also something that ZFS could do, independent of the = driver. If a device is unresponsive, shouldn't it take it offline on = it's own? - .Dustin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?17DD4C39-6905-4A5B-AE86-87F149CBD5BC>