Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Mar 2006 12:19:04 +0100
From:      Paul Schenkeveld <fb-geom@psconsult.nl>
To:        freebsd-geom@freebsd.org
Subject:   gvinum losing state about failed drives
Message-ID:  <20060312111904.GA52139@psconsult.nl>

next in thread | raw e-mail | index | archive | help
Hi,

My hardware:

    Intel L440GX+ serverboard, 2x 700MHz P3, 1GB ECC RAM
    2x Seagate SCSI 73GB off mainboard SCSI controller
    2x add-in Promise ATA133 controller
    4x Hitachi 500GB ATA133 disks off the Promise controllers
    add-in Intel gigabit ethernet controller

My gvinum config:

    12 volumes mirrored across da0 and da1
    1 volume 500GB mirrored across ad4 and ad8
    1 volume 500GB mirrored across ad6 and ad10

After my 4-STABLE to 6-STABLE upgrade of the first server I had two
occasions where two ATA disks became unavailable because the controller
stopped responding.  The first time I lost ad8 and ad10 containing
vol12.p1 and vol13.p1, the second time (after everything was manually
repaired) I lost vol12.p0 and vol13.p0.

When the ATA controller stops, two gvinum drives go down, the plexes
and the subdisks on them go down as well.  After a reboot, however,
all drives, plexes and subdisks are up again.  By comparing the
plexes by hand (using optimized cmp which still takes 5.5 hours for
500GB) I see that they are not equal, understandably because some
data was updated while one plex was down.

Seems that the failure of a drive and its subdisks is not recorded in
the metadata of the other drives.

I'm now contemplating a rollback of the upgrade as this server has been
down too long already but I'll try to get me a similar setup here to
do more testing.

Regards,

Paul Schenkeveld



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060312111904.GA52139>