Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Aug 2006 16:07:09 -0500
From:      "Rick C. Petty" <rick-freebsd@kiwi-computer.com>
To:        Miroslav Lachman <000.fbsd@quip.cz>
Cc:        freebsd-geom@freebsd.org
Subject:   Re: gmirror Cannot add disk ad5 to gm0 (error=22)
Message-ID:  <20060802210709.GA15310@megan.kiwi-computer.com>
In-Reply-To: <44D10D1D.9040700@quip.cz>
References:  <44D06650.1030803@quip.cz> <20060802183001.GA14279@megan.kiwi-computer.com> <44D10D1D.9040700@quip.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 02, 2006 at 10:37:49PM +0200, Miroslav Lachman wrote:
> 
> >Did you have SMART enabled in the BIOS?
> 
> Yes, (as I remember - I have only remote access now) and have 

Then I doubt the disk itself had any errors..  Likely a bad cable or
controller, which I've typically seen manifested under heavier disk
activity.

> >It's already activated, so you can't add it again (as the message states).
> 
> But how can I force gmirror to re-use this disk? I don't know, what 
> "broken, skipping" or "error=22" really means.

There's no forcing, unless you specifically deactivated a provider.  The
mirror should auto-sync at startup.

> >That shouldn't be a surprise-- the disks themselves didn't fail, only
> >writing to them (possibly under heavy load?) failed-- and gmirror dropped
> >the disks.  The first disk drop was ok-- the mirror should still work in
> >DEGRADED state.  The second drop was critical which is why your system
> >broke.  Mounting the disks individually will work of course.
> 
> This error occured after 5 days of periodical copying /usr/ports to 
> another partition. (I used this to test disk/filesystem before deploying 
> to production) Before this test, the server has another problems with 
> disks and whole server was replaced with newone, only first drive (ad4) 
> is from original machine. (originaly discussed on freebsd-stable@ - disk 
> disappeared from ATA channel - not listed by atacontrol list command)

Yup, disks disappear when they stop responding to "bus reset" commands.
This seems to happen on various controllers after an unpredictable number
of READ_DMA or WRITE_DMA timeout errors.  Theoretically, you could reinit
the channel and see if the disk pops back up.  One thing to note:  I
recommend putting the disks on separate channels so if a reinit fails, you
don't lose both disks.  I hate it when manufacturers put two SATA ports on
the same ATA channel..  Cheap for them, problematic for you.

> >>Can anybody tell me, where is the problem / how can I found what is wrong?
> >
> >
> >What's the output of "gmirror status" ??  I suspect on a reboot, gmirror
> >will try to synchronize ad4 to ad5 (since ad5 was the first to drop).  Once
> >that is complete, gmirror won't be DEGRADED anymore.
> 
> # gmirror status
>       Name    Status  Components
> mirror/gm0  DEGRADED  ad4

Hmm, and is ad5 detected?  (rhetorical question, because I see that it was)

> Gmirror is not synchronized after reboot:
> 
> Aug  1 09:14:50 track kernel: GEOM_MIRROR: Device gm0: provider ad5 
> detected.
> Aug  1 09:14:50 track kernel: GEOM_MIRROR: Component ad5 (device gm0) 
> broken, skipping.

Looks like the disk was marked with bad metadata.

> So disk is OK, but gmirror refused to use it?

Yes.  I would first suggest trying "gmirror deactivate -v gm0 ad5" then
trying to reactivate it.  Maybe that will flush out the wrong metadata.
If that doesn't work, try booting in verbose mode and attaching the dmesg
(in particular, when the mirror is being attached).

Last resort (although not a horrible option), you can try removing ad5 from
the mirror and relabelling (gmirror label, not bsdlabel) it.  If the remove
fails, use a combination of forget and clear.  

> If disks are OK, what is wrong? What caused READ / WRITE timeouts? 
> Broken SATA controler? FreeBSD ATA driver?

Try replacing the cables, trying a different SATA controller.  I've seen
these timeouts *a lot* and usually my gmirror/gvinum partitions all
survive (after reboot at least).  There are a lot of threads on this and
other mailing lists describing the timeout problems.

-- Rick C. Petty



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060802210709.GA15310>