From owner-freebsd-geom@FreeBSD.ORG Fri Sep 10 03:45:33 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF6AF106566B for ; Fri, 10 Sep 2010 03:45:33 +0000 (UTC) (envelope-from rick@rix.kiwi-computer.com) Received: from rix.kiwi-computer.com (66-191-70-202.static.stcd.mn.charter.com [66.191.70.202]) by mx1.freebsd.org (Postfix) with SMTP id 765E08FC08 for ; Fri, 10 Sep 2010 03:45:33 +0000 (UTC) Received: (qmail 7396 invoked by uid 2000); 10 Sep 2010 03:18:51 -0000 Date: Thu, 9 Sep 2010 22:18:51 -0500 From: "Rick C. Petty" To: freebsd-geom@freebsd.org Message-ID: <20100910031851.GA7066@rix.kiwi-computer.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: it's a race between gmirror and UFS labels X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd2009@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Sep 2010 03:45:34 -0000 I've been struggling with GEOM_MIRROR over-aggressively dropping drives and it's causing some issues. I'm running 8.1-stable. Essentially I have gmirror'd disks, gpart partitions inside the mirror, and UFS labels on all my gpartitioned filesystems. Everything is fine as long as the mirror probes the disks first (which it does). The problem is that after kernel panics, gmirror often drops a disk: GEOM_MIRROR: Component ada0 (device fs0) broken, skipping. If it drops ada1, I have no problems, but if it drops ada0, then ada0 is subsequently probed for its GPT partitions, of which the secondary table is corrupt (probably because it's looking at the GEOM_MIRROR metadata). Unfortunately, it proceeds and then mounts all the UFS filesystems by label from /dev/ada0 instead of those in /dev/mirror/fs0 which is bad. This has happened a number of times and it's getting quite annoying. Not only is the rest of my mirror now invalid (as it's outdated with respect to /dev/ada0), but I have to bring the system down to single user mode to re-insert ada0 into that mirror. I don't mean to start another discussion about why mixing start-of-disk metadata magic with end-of-disk metadata magic is a bad thing. I just want some suggestions about how to prevent this from happening nearly every time GEOM_MIRROR drops a disk due to a panic. GEOM_MIRROR seems very aggressive when it comes to dropping disks and not describing why the disk is "broken", which should probably be fixed. Also GPT probes should probably be prioritized to choose other geom providers first (or maybe providers with the same name but correct secondary GPT tables should win over those with mismatched tables, but due to how GEOM probes I can't see that happening). Probably what *should* happen is that even though ada0 is rejected as a GEOM_MIRROR provider, it should be marked as belonging to that mirror regardless of whether it's broken. That way no other GEOMs will probe this provider. To me, it seems that gmirror's handling of brokenness is itself broken. So someone please tell me why gmirror doesn't keep ownership of the broken providers? One can always remove the broken providers manually. To me the current behavior seems way too aggressive. It's one thing if the G_MIRROR_MAGIC doesn't match, but quite another thing to have it marked as broken and thus released into the provider pool for other GEOMs to sample it at will. -- Rick C. Petty