Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jun 2007 13:48:56 -0400
From:      "Time Catalyst" <timecatalyst@gmail.com>
To:        freebsd-geom@freebsd.org
Subject:   Intermittent mirror rebuilds upon reboot.
Message-ID:  <659d8eb70706051048p35888759ubd3a8ca574d4b57d@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hello,

I'm experiencing and odd problem with gmirror (and possibly gstripe) and I
was recommended to this mailing list by an acquaintance on bsdforums.org.

I have a raid0+1 setup through GEOM consisting of two mirrored, five-disk
stripes (all identical SCSI hard drives).  I've been stress testing a
database on it and everything seems to be working fine.  However, when I try
to reboot the system, roughly 20% of the time gmirror will report as
'DEGRADED' and rebuild one of the five-disk stripes.  Oddly enough, this
happens on the same device (stripe1) every time a degraded mirror occurs.
Here's a dump of the GEOM section of my dmesg and the procedure that I used
to set up the array:

GEOM_STRIPE: Device stripe1 created (id=127607522).
GEOM_STRIPE: Disk da0 attached to stripe1.
GEOM_STRIPE: Disk da1 attached to stripe1.
GEOM_STRIPE: Disk da2 attached to stripe1.
GEOM_STRIPE: Disk da3 attached to stripe1.
GEOM_STRIPE: Disk da4 attached to stripe1.
GEOM_STRIPE: Device stripe1 activated.
GEOM_STRIPE: Device stripe2 created (id=2058765235).
GEOM_STRIPE: Disk da5 attached to stripe2.
GEOM_STRIPE: Disk da6 attached to stripe2.
GEOM_STRIPE: Disk da7 attached to stripe2.
GEOM_STRIPE: Disk da8 attached to stripe2.
GEOM_STRIPE: Disk da9 attached to stripe2.
GEOM_STRIPE: Device stripe2 activated.
GEOM_MIRROR: Device raid1 created (id=702664467).
GEOM_MIRROR: Device raid1: provider stripe/stripe1 detected.
GEOM_MIRROR: Device raid1: provider stripe/stripe2 detected.
GEOM_MIRROR: Device raid1: provider stripe/stripe2 activated.
GEOM_MIRROR: Device raid1: provider mirror/raid1 launched.
GEOM_MIRROR: Device raid1: rebuilding provider stripe/stripe1.

%gstripe label -v -s 1024 stripe1 /dev/da0 /dev/da1 /dev/da2 /dev/da3
/dev/da4
%gstripe label -v -s 1024 stripe2 /dev/da5 /dev/da6 /dev/da7 /dev/da8
/dev/da9
%gmirror label -v -b load raid1 /dev/stripe/stripe1 /dev/stripe/stripe2
%gmirror load
%newfs /dev/mirror/raid1
%mkdir /mirror
%mount /dev/mirror/raid1 /mirror

I added the following to /boot/loader.conf:
geom_stripe_load="YES"
geom_mirror_load="YES"

And the mount to fstab:
/dev/mirror/raid1 /mirror ufs rw 2 2

Since the symptoms were consistent (as in, always appearing on stripe1) I
tried reconfiguring the array a couple different ways.  First I tried to
swap the hard drives on stripe1 and stripe2:

%gstripe label -v -s 1024 stripe2 /dev/da0 /dev/da1 /dev/da2 /dev/da3
/dev/da4
%gstripe label -v -s 1024 stripe1 /dev/da5 /dev/da6 /dev/da7 /dev/da8
/dev/da9

But despite the different drives, stripe1 still caused a degraded mirror.
I'm thinking that this should rule out any hardware failure.  Next I tried
changing the order in which I created the mirror:

%gmirror label -v b load raid1 /dev/stripe/stripe2 /dev/stripe/stripe1

This time stripe2 consistently causes the degraded mirror.  Which leads me
to believe that there could be some sort of race condition occuring here.

Given the intermittent nature of the problem, I'm wondering a few of
things...
1) Could it be the case that gmirror is trying to setup the stripes before
one of them is finished individually setting up?
2) Is it possible that my SCSI drives are not finished settling before
gstripe and gmirror start to do their thing?
3) Am I doing something wrong in my setup procedure that's causing my mirror
to be unstable?

Here's a link to the forum topic in case anyone wants to read the
'back-and-forth':
http://www.bsdforums.org/forums/showthread.php?p=260944#post260944

Thanks in advance (sorry for the 'novel' of text).
-Andy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?659d8eb70706051048p35888759ubd3a8ca574d4b57d>