Date: Thu, 9 Sep 2010 22:18:51 -0500 From: "Rick C. Petty" <rick-freebsd2009@kiwi-computer.com> To: freebsd-geom@freebsd.org Subject: it's a race between gmirror and UFS labels Message-ID: <20100910031851.GA7066@rix.kiwi-computer.com>
next in thread | raw e-mail | index | archive | help
I've been struggling with GEOM_MIRROR over-aggressively dropping drives and it's causing some issues. I'm running 8.1-stable. Essentially I have gmirror'd disks, gpart partitions inside the mirror, and UFS labels on all my gpartitioned filesystems. Everything is fine as long as the mirror probes the disks first (which it does). The problem is that after kernel panics, gmirror often drops a disk: GEOM_MIRROR: Component ada0 (device fs0) broken, skipping. If it drops ada1, I have no problems, but if it drops ada0, then ada0 is subsequently probed for its GPT partitions, of which the secondary table is corrupt (probably because it's looking at the GEOM_MIRROR metadata). Unfortunately, it proceeds and then mounts all the UFS filesystems by label from /dev/ada0 instead of those in /dev/mirror/fs0 which is bad. This has happened a number of times and it's getting quite annoying. Not only is the rest of my mirror now invalid (as it's outdated with respect to /dev/ada0), but I have to bring the system down to single user mode to re-insert ada0 into that mirror. I don't mean to start another discussion about why mixing start-of-disk metadata magic with end-of-disk metadata magic is a bad thing. I just want some suggestions about how to prevent this from happening nearly every time GEOM_MIRROR drops a disk due to a panic. GEOM_MIRROR seems very aggressive when it comes to dropping disks and not describing why the disk is "broken", which should probably be fixed. Also GPT probes should probably be prioritized to choose other geom providers first (or maybe providers with the same name but correct secondary GPT tables should win over those with mismatched tables, but due to how GEOM probes I can't see that happening). Probably what *should* happen is that even though ada0 is rejected as a GEOM_MIRROR provider, it should be marked as belonging to that mirror regardless of whether it's broken. That way no other GEOMs will probe this provider. To me, it seems that gmirror's handling of brokenness is itself broken. So someone please tell me why gmirror doesn't keep ownership of the broken providers? One can always remove the broken providers manually. To me the current behavior seems way too aggressive. It's one thing if the G_MIRROR_MAGIC doesn't match, but quite another thing to have it marked as broken and thus released into the provider pool for other GEOMs to sample it at will. -- Rick C. Petty
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100910031851.GA7066>