From owner-freebsd-geom@FreeBSD.ORG Tue Oct 31 21:53:12 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C6E9116A504 for ; Tue, 31 Oct 2006 21:53:12 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id EBA5043DED for ; Tue, 31 Oct 2006 21:52:49 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 58434 invoked by uid 2001); 31 Oct 2006 21:52:41 -0000 Date: Tue, 31 Oct 2006 15:52:41 -0600 From: "Rick C. Petty" Cc: freebsd-geom@freebsd.org Message-ID: <20061031215241.GA57997@keira.kiwi-computer.com> References: <20061031195442.GA55478@keira.kiwi-computer.com> <4547AD9B.5050503@centtech.com> <20061031204659.GA56766@keira.kiwi-computer.com> <20061031205857.GA15861@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20061031205857.GA15861@garage.freebsd.pl> User-Agent: Mutt/1.4.2.1i Subject: Re: burnt again by gmirror X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Oct 2006 21:53:13 -0000 On Tue, Oct 31, 2006 at 09:58:57PM +0100, Pawel Jakub Dawidek wrote: > On Tue, Oct 31, 2006 at 02:46:59PM -0600, Rick C. Petty wrote: > > > > Still, I'm curious why/how ad8's metadata could have been clobbered. > > gmirror is the only one who would write to it, the filesystem is mounted > > from gm0* -- kinda scary. I guess the lesson here is to use simple gmirror > > configurations in case the metadata gets clobbered. > > gmirror told you that it think ad8 is broken and skipped it. If it's marked as broken/disconnected/whatever, why is it removed from the list ("gmirror list")? Surely it would be useful to state which pieces are broken. I'm thinking: printing "State: BROKEN" or something similar would do the trick. At least some of this data is available to gmirror, as stored on the other providers.. perhaps the provider name isn't, because it couldn't find the provider (for whatever reason), is that why it's not listed? If so, I still think it should be, perhaps with "Name: UNKNOWN". As it is, there is no way of knowing what's missing, or at least providing clues to what could be missing. > If an error is discovered on mirror's component it is marked as broken > and disconnected so it doesn't case further problems. For example disk > problem is that it doesn't complete I/O requests and gmirror need to > wait for ATA timeouts, which will make the whole system unresponsive. > If component was disconnected it means something was wrong with it and > it needs manual intervention and investigation. Or something just went horribly wrong with gmirror? The disks are fine, relatively new (5600 hours runtime), tested under load pretty thoroughly, and smartctl is showing no errors or other anomalies. > I'm sure your logs would tell you. Perhaps you didn't read in my original post where I stated that /var/run/dmesg.boot was empty: # ls -la /var/run/dmesg.boot -rw-r--r-- 1 root wheel 0 Oct 25 14:42 /var/run/dmesg.boot and that the message buffer had overflowed (quite visible from /var/log/messages)-- due to overwhelming fsck errors. I would love to know the wonderful error message which was printed, to give me an idea why gmirror dropped the disk for seemingly no good reason. I wish either fsck was less noisy or that the kernel would take a snapshot of the msgbuf right before starting init, so only the kernel messages would get copied into /var/run/ when the rc scripts do their thing... -- Rick C. Petty