From owner-freebsd-geom@FreeBSD.ORG Wed Aug 2 21:07:11 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F2BB416A4E0 for ; Wed, 2 Aug 2006 21:07:10 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from kiwi-computer.com (megan.kiwi-computer.com [63.224.10.3]) by mx1.FreeBSD.org (Postfix) with SMTP id 592B743D49 for ; Wed, 2 Aug 2006 21:07:10 +0000 (GMT) (envelope-from rick@kiwi-computer.com) Received: (qmail 15481 invoked by uid 2001); 2 Aug 2006 21:07:09 -0000 Date: Wed, 2 Aug 2006 16:07:09 -0500 From: "Rick C. Petty" To: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <20060802210709.GA15310@megan.kiwi-computer.com> References: <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <44D10D1D.9040700@quip.cz> User-Agent: Mutt/1.4.2.1i Cc: freebsd-geom@freebsd.org Subject: Re: gmirror Cannot add disk ad5 to gm0 (error=22) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd@kiwi-computer.com List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2006 21:07:11 -0000 On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote: > > >Did you have SMART enabled in the BIOS? > > Yes, (as I remember - I have only remote access now) and have Then I doubt the disk itself had any errors.. Likely a bad cable or controller, which I've typically seen manifested under heavier disk activity. > >It's already activated, so you can't add it again (as the message states). > > But how can I force gmirror to re-use this disk? I don't know, what > "broken, skipping" or "error=22" really means. There's no forcing, unless you specifically deactivated a provider. The mirror should auto-sync at startup. > >That shouldn't be a surprise-- the disks themselves didn't fail, only > >writing to them (possibly under heavy load?) failed-- and gmirror dropped > >the disks. The first disk drop was ok-- the mirror should still work in > >DEGRADED state. The second drop was critical which is why your system > >broke. Mounting the disks individually will work of course. > > This error occured after 5 days of periodical copying /usr/ports to > another partition. (I used this to test disk/filesystem before deploying > to production) Before this test, the server has another problems with > disks and whole server was replaced with newone, only first drive (ad4) > is from original machine. (originaly discussed on freebsd-stable@ - disk > disappeared from ATA channel - not listed by atacontrol list command) Yup, disks disappear when they stop responding to "bus reset" commands. This seems to happen on various controllers after an unpredictable number of READ_DMA or WRITE_DMA timeout errors. Theoretically, you could reinit the channel and see if the disk pops back up. One thing to note: I recommend putting the disks on separate channels so if a reinit fails, you don't lose both disks. I hate it when manufacturers put two SATA ports on the same ATA channel.. Cheap for them, problematic for you. > >>Can anybody tell me, where is the problem / how can I found what is wrong? > > > > > >What's the output of "gmirror status" ?? I suspect on a reboot, gmirror > >will try to synchronize ad4 to ad5 (since ad5 was the first to drop). Once > >that is complete, gmirror won't be DEGRADED anymore. > > # gmirror status > Name Status Components > mirror/gm0 DEGRADED ad4 Hmm, and is ad5 detected? (rhetorical question, because I see that it was) > Gmirror is not synchronized after reboot: > > Aug 1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 > detected. > Aug 1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) > broken, skipping. Looks like the disk was marked with bad metadata. > So disk is OK, but gmirror refused to use it? Yes. I would first suggest trying "gmirror deactivate -v gm0 ad5" then trying to reactivate it. Maybe that will flush out the wrong metadata. If that doesn't work, try booting in verbose mode and attaching the dmesg (in particular, when the mirror is being attached). Last resort (although not a horrible option), you can try removing ad5 from the mirror and relabelling (gmirror label, not bsdlabel) it. If the remove fails, use a combination of forget and clear. > If disks are OK, what is wrong? What caused READ / WRITE timeouts? > Broken SATA controler? FreeBSD ATA driver? Try replacing the cables, trying a different SATA controller. I've seen these timeouts *a lot* and usually my gmirror/gvinum partitions all survive (after reboot at least). There are a lot of threads on this and other mailing lists describing the timeout problems. -- Rick C. Petty