From owner-freebsd-fs@FreeBSD.ORG  Mon Jun  4 22:59:40 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A8B051065794
	for <freebsd-fs@freebsd.org>; Mon,  4 Jun 2012 22:59:40 +0000 (UTC)
	(envelope-from dustinwenz@ebureau.com)
Received: from internet02.ebureau.com (internet02.tru-signal.biz
	[65.127.24.21]) by mx1.freebsd.org (Postfix) with ESMTP id D79038FC19
	for <freebsd-fs@freebsd.org>; Mon,  4 Jun 2012 22:59:39 +0000 (UTC)
Received: from service02.office.ebureau.com (service02.office.ebureau.com
	[192.168.20.15])
	by internet02.ebureau.com (Postfix) with ESMTP id B6556C921E5
	for <freebsd-fs@freebsd.org>; Mon,  4 Jun 2012 17:54:04 -0500 (CDT)
Received: from localhost (localhost [127.0.0.1])
	by service02.office.ebureau.com (Postfix) with ESMTP id 611129D7A931
	for <freebsd-fs@freebsd.org>; Mon,  4 Jun 2012 17:54:04 -0500 (CDT)
X-Virus-Scanned: amavisd-new at ebureau.com
Received: from service02.office.ebureau.com ([127.0.0.1])
	by localhost (service02.office.iscompanies.com [127.0.0.1])
	(amavisd-new, port 10024)
	with ESMTP id 54Lr1lAwuQ33 for <freebsd-fs@freebsd.org>;
	Mon,  4 Jun 2012 17:54:03 -0500 (CDT)
Received: from square.office.ebureau.com (square.office.ebureau.com
	[10.10.20.22])
	by service02.office.ebureau.com (Postfix) with ESMTPSA id 6CB469D7A922
	for <freebsd-fs@freebsd.org>; Mon,  4 Jun 2012 17:54:03 -0500 (CDT)
From: Dustin Wenz <dustinwenz@ebureau.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Mon, 4 Jun 2012 17:54:03 -0500
Message-Id: <5532CFFB-F943-4D9E-9722-7FB9C8A9F82A@ebureau.com>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1257)
X-Mailer: Apple Mail (2.1257)
Subject: Can mps drop a failing device from bus?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jun 2012 22:59:40 -0000

I asked this question back in April on the stable list with no response =
( =
http://lists.freebsd.org/pipermail/freebsd-stable/2012-April/067305.html =
). I've now been seeing the same behavior on 9.0-release, and I thought =
it would be good to ask again here.

There is a failure mode for SATA disks (Seagate Barracuda ST3000DM001 =
disks, in this case) that the mps driver doesn't handle very well. If a =
disk is slow to respond, or is unresponsive altogether, I'd like it to =
be removed from the bus and degrade the zpool that it's a part of.

The way things are now, mps will just report a lot of "SCSI command =
timeout on device" messages. Any I/O on the affected zpools will hang =
for an excessive amount of time (sometimes forever). We typically =
configure our storage volumes as a pool of mirrors, with the expectation =
that availability will be maintained if any redundant disk(s) should =
fail. Unfortunately, availability is actually made *worse* on =
highly-redundant mirrors when mps won't give up on an unresponsive =
device.

It's possible that I'm overlooking an obvious solution, or some relevant =
configuration options for the driver. Can anyone offer some insight on =
this?

Thanks,

	- .Dustin