From owner-freebsd-geom@FreeBSD.ORG  Fri Sep 10 03:45:33 2010
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DF6AF106566B
	for <freebsd-geom@freebsd.org>; Fri, 10 Sep 2010 03:45:33 +0000 (UTC)
	(envelope-from rick@rix.kiwi-computer.com)
Received: from rix.kiwi-computer.com (66-191-70-202.static.stcd.mn.charter.com
	[66.191.70.202]) by mx1.freebsd.org (Postfix) with SMTP id 765E08FC08
	for <freebsd-geom@freebsd.org>; Fri, 10 Sep 2010 03:45:33 +0000 (UTC)
Received: (qmail 7396 invoked by uid 2000); 10 Sep 2010 03:18:51 -0000
Date: Thu, 9 Sep 2010 22:18:51 -0500
From: "Rick C. Petty" <rick-freebsd2009@kiwi-computer.com>
To: freebsd-geom@freebsd.org
Message-ID: <20100910031851.GA7066@rix.kiwi-computer.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
Subject: it's a race between gmirror and UFS labels
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: rick-freebsd2009@kiwi-computer.com
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Sep 2010 03:45:34 -0000

I've been struggling with GEOM_MIRROR over-aggressively dropping drives
and it's causing some issues.  I'm running 8.1-stable.

Essentially I have gmirror'd disks, gpart partitions inside the mirror,
and UFS labels on all my gpartitioned filesystems.  Everything is fine
as long as the mirror probes the disks first (which it does).

The problem is that after kernel panics, gmirror often drops a disk:

	GEOM_MIRROR: Component ada0 (device fs0) broken, skipping.

If it drops ada1, I have no problems, but if it drops ada0, then ada0
is subsequently probed for its GPT partitions, of which the secondary
table is corrupt (probably because it's looking at the GEOM_MIRROR
metadata).  Unfortunately, it proceeds and then mounts all the UFS
filesystems by label from /dev/ada0 instead of those in /dev/mirror/fs0
which is bad.

This has happened a number of times and it's getting quite annoying.
Not only is the rest of my mirror now invalid (as it's outdated with
respect to /dev/ada0), but I have to bring the system down to single
user mode to re-insert ada0 into that mirror.

I don't mean to start another discussion about why mixing start-of-disk
metadata magic with end-of-disk metadata magic is a bad thing.  I just
want some suggestions about how to prevent this from happening nearly
every time GEOM_MIRROR drops a disk due to a panic.  GEOM_MIRROR seems
very aggressive when it comes to dropping disks and not describing why
the disk is "broken", which should probably be fixed.  Also GPT probes
should probably be prioritized to choose other geom providers first (or
maybe providers with the same name but correct secondary GPT tables
should win over those with mismatched tables, but due to how GEOM
probes I can't see that happening).

Probably what *should* happen is that even though ada0 is rejected as
a GEOM_MIRROR provider, it should be marked as belonging to that
mirror regardless of whether it's broken.  That way no other GEOMs will
probe this provider.  To me, it seems that gmirror's handling of
brokenness is itself broken.  So someone please tell me why gmirror
doesn't keep ownership of the broken providers?  One can always remove
the broken providers manually.  To me the current behavior seems way
too aggressive.  It's one thing if the G_MIRROR_MAGIC doesn't match,
but quite another thing to have it marked as broken and thus released
into the provider pool for other GEOMs to sample it at will.

-- Rick C. Petty