From owner-freebsd-fs@FreeBSD.ORG Mon Jun 4 22:59:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A8B051065794 for ; Mon, 4 Jun 2012 22:59:40 +0000 (UTC) (envelope-from dustinwenz@ebureau.com) Received: from internet02.ebureau.com (internet02.tru-signal.biz [65.127.24.21]) by mx1.freebsd.org (Postfix) with ESMTP id D79038FC19 for ; Mon, 4 Jun 2012 22:59:39 +0000 (UTC) Received: from service02.office.ebureau.com (service02.office.ebureau.com [192.168.20.15]) by internet02.ebureau.com (Postfix) with ESMTP id B6556C921E5 for ; Mon, 4 Jun 2012 17:54:04 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by service02.office.ebureau.com (Postfix) with ESMTP id 611129D7A931 for ; Mon, 4 Jun 2012 17:54:04 -0500 (CDT) X-Virus-Scanned: amavisd-new at ebureau.com Received: from service02.office.ebureau.com ([127.0.0.1]) by localhost (service02.office.iscompanies.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 54Lr1lAwuQ33 for ; Mon, 4 Jun 2012 17:54:03 -0500 (CDT) Received: from square.office.ebureau.com (square.office.ebureau.com [10.10.20.22]) by service02.office.ebureau.com (Postfix) with ESMTPSA id 6CB469D7A922 for ; Mon, 4 Jun 2012 17:54:03 -0500 (CDT) From: Dustin Wenz Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Mon, 4 Jun 2012 17:54:03 -0500 Message-Id: <5532CFFB-F943-4D9E-9722-7FB9C8A9F82A@ebureau.com> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1257) X-Mailer: Apple Mail (2.1257) Subject: Can mps drop a failing device from bus? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jun 2012 22:59:40 -0000 I asked this question back in April on the stable list with no response = ( = http://lists.freebsd.org/pipermail/freebsd-stable/2012-April/067305.html = ). I've now been seeing the same behavior on 9.0-release, and I thought = it would be good to ask again here. There is a failure mode for SATA disks (Seagate Barracuda ST3000DM001 = disks, in this case) that the mps driver doesn't handle very well. If a = disk is slow to respond, or is unresponsive altogether, I'd like it to = be removed from the bus and degrade the zpool that it's a part of. The way things are now, mps will just report a lot of "SCSI command = timeout on device" messages. Any I/O on the affected zpools will hang = for an excessive amount of time (sometimes forever). We typically = configure our storage volumes as a pool of mirrors, with the expectation = that availability will be maintained if any redundant disk(s) should = fail. Unfortunately, availability is actually made *worse* on = highly-redundant mirrors when mps won't give up on an unresponsive = device. It's possible that I'm overlooking an obvious solution, or some relevant = configuration options for the driver. Can anyone offer some insight on = this? Thanks, - .Dustin