Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Sep 2010 22:18:51 -0500
From:      "Rick C. Petty" <rick-freebsd2009@kiwi-computer.com>
To:        freebsd-geom@freebsd.org
Subject:   it's a race between gmirror and UFS labels
Message-ID:  <20100910031851.GA7066@rix.kiwi-computer.com>

next in thread | raw e-mail | index | archive | help
I've been struggling with GEOM_MIRROR over-aggressively dropping drives
and it's causing some issues.  I'm running 8.1-stable.

Essentially I have gmirror'd disks, gpart partitions inside the mirror,
and UFS labels on all my gpartitioned filesystems.  Everything is fine
as long as the mirror probes the disks first (which it does).

The problem is that after kernel panics, gmirror often drops a disk:

	GEOM_MIRROR: Component ada0 (device fs0) broken, skipping.

If it drops ada1, I have no problems, but if it drops ada0, then ada0
is subsequently probed for its GPT partitions, of which the secondary
table is corrupt (probably because it's looking at the GEOM_MIRROR
metadata).  Unfortunately, it proceeds and then mounts all the UFS
filesystems by label from /dev/ada0 instead of those in /dev/mirror/fs0
which is bad.

This has happened a number of times and it's getting quite annoying.
Not only is the rest of my mirror now invalid (as it's outdated with
respect to /dev/ada0), but I have to bring the system down to single
user mode to re-insert ada0 into that mirror.

I don't mean to start another discussion about why mixing start-of-disk
metadata magic with end-of-disk metadata magic is a bad thing.  I just
want some suggestions about how to prevent this from happening nearly
every time GEOM_MIRROR drops a disk due to a panic.  GEOM_MIRROR seems
very aggressive when it comes to dropping disks and not describing why
the disk is "broken", which should probably be fixed.  Also GPT probes
should probably be prioritized to choose other geom providers first (or
maybe providers with the same name but correct secondary GPT tables
should win over those with mismatched tables, but due to how GEOM
probes I can't see that happening).

Probably what *should* happen is that even though ada0 is rejected as
a GEOM_MIRROR provider, it should be marked as belonging to that
mirror regardless of whether it's broken.  That way no other GEOMs will
probe this provider.  To me, it seems that gmirror's handling of
brokenness is itself broken.  So someone please tell me why gmirror
doesn't keep ownership of the broken providers?  One can always remove
the broken providers manually.  To me the current behavior seems way
too aggressive.  It's one thing if the G_MIRROR_MAGIC doesn't match,
but quite another thing to have it marked as broken and thus released
into the provider pool for other GEOMs to sample it at will.

-- Rick C. Petty



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100910031851.GA7066>