From owner-freebsd-geom@FreeBSD.ORG  Tue Apr 10 18:14:27 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
X-Original-To: freebsd-geom@freebsd.org
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id F159616A402
	for <freebsd-geom@freebsd.org>; Tue, 10 Apr 2007 18:14:27 +0000 (UTC)
	(envelope-from rick@kiwi-computer.com)
Received: from kiwi-computer.com (keira.kiwi-computer.com [63.224.10.3])
	by mx1.freebsd.org (Postfix) with SMTP id 9284013C4BF
	for <freebsd-geom@freebsd.org>; Tue, 10 Apr 2007 18:14:27 +0000 (UTC)
	(envelope-from rick@kiwi-computer.com)
Received: (qmail 23006 invoked by uid 2001); 10 Apr 2007 18:14:26 -0000
Date: Tue, 10 Apr 2007 13:14:26 -0500
From: "Rick C. Petty" <rick-freebsd@kiwi-computer.com>
To: freebsd-geom@freebsd.org
Message-ID: <20070410181426.GB21036@keira.kiwi-computer.com>
References: <evfqtt$n23$1@sea.gmane.org>
	<20070410111957.GA85578@garage.freebsd.pl> <461B75B2.40201@fer.hr>
	<20070410114115.GB85578@garage.freebsd.pl>
	<20070410161445.GA18858@keira.kiwi-computer.com>
	<20070410162129.GI85578@garage.freebsd.pl>
	<20070410172604.GA21036@keira.kiwi-computer.com>
	<461BCC85.2080900@freebsd.org> <20070410174607.GA26432@harmless.hu>
	<461BCF8A.3030307@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <461BCF8A.3030307@freebsd.org>
User-Agent: Mutt/1.4.2.1i
Subject: Re: volume management
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: rick-freebsd@kiwi-computer.com
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Apr 2007 18:14:28 -0000

On Tue, Apr 10, 2007 at 12:55:22PM -0500, Eric Anderson wrote:
> 
> Personally, what I would want to prevent, is having a server go down due 
> to one file system having an issue, when it is serving (or using) many 
> more file systems.

Exactly my point.  The whole machine isn't hosed, just N file systems who
use that GEOM provider.

> What I want is a blast to my logs, the 
> erroneous file system to be evicted from further damage (mount read-only 
> and marked as dirty) and trickle an i/o error to any processes trying to 
> write to it.  Even unmounting it would be ok, but that gets nasty with 
> NFS servers and other things.

This is why I suggested that you propagate down to the GEOM consumers of
the bad provider, either disallowing writes (which I don't think is a GEOM
option) or removing the device completely...  the file systems should be
unmounted, etc.

I pointed out that this seems already the case I've seen in gvinum when a
disk is dropped...  gvinum noticed the device failure and marked all
dependencies as stale, and the only problem I had was a mounted stripe.
90% of the time I was able to kill all user processes which were reading or
writing to the bad stripe, bring the disk back up, and force remounting the
filesystem.  Both UFS and GEOM code are buggy here-- sometimes I would get
a panic and have to fsck (and resync) terabytes of disks, but often if I
wait long enough after killing the user process everything else times out
and I'm able to remount the filesystem and continue.

I was never stating that the UFS subsystem is robust enough to handle all
the failures, but that the GEOM layer should do what it can to keep the box
up and we should train UFS and other filesystems how to handle these
failures better.

But a panic is just not a pretty option.  One GEOM provider should not be
arrogant enough to say that the box is no longer usable at all.  That's
like engineering an automobile which locks up the steering wheel and locks
the doors and windows after noticing that one tire just went flat--
preventing the driver from attempting to safely pull over and preventing
any passengers from exiting the vehicle.

-- Rick C. Petty