From owner-freebsd-geom@FreeBSD.ORG  Sun Mar 12 11:19:08 2006
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
X-Original-To: freebsd-geom@freebsd.org
Delivered-To: freebsd-geom@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 811E716A43D
	for <freebsd-geom@freebsd.org>; Sun, 12 Mar 2006 11:19:08 +0000 (GMT)
	(envelope-from fb-geom@psconsult.nl)
Received: from mx0.psconsult.nl (ps226.psconsult.nl [213.222.19.226])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6667843D6A
	for <freebsd-geom@freebsd.org>; Sun, 12 Mar 2006 11:19:06 +0000 (GMT)
	(envelope-from fb-geom@psconsult.nl)
Received: from phuket.psconsult.nl (localhost [127.0.0.1])
	by mx0.psconsult.nl (8.12.8p2/8.12.8) with ESMTP id k2CBJ5pe052383
	for <freebsd-geom@freebsd.org>; Sun, 12 Mar 2006 12:19:05 +0100 (CET)
	(envelope-from fb-geom@psconsult.nl)
Received: (from paul@localhost)
	by phuket.psconsult.nl (8.12.8p2/8.12.8/Submit) id k2CBJ5no052382
	for freebsd-geom@freebsd.org; Sun, 12 Mar 2006 12:19:05 +0100 (CET)
Date: Sun, 12 Mar 2006 12:19:04 +0100
From: Paul Schenkeveld <fb-geom@psconsult.nl>
To: freebsd-geom@freebsd.org
Message-ID: <20060312111904.GA52139@psconsult.nl>
Mail-Followup-To: freebsd-geom@freebsd.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.6i
Subject: gvinum losing state about failed drives
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Mar 2006 11:19:08 -0000

Hi,

My hardware:

    Intel L440GX+ serverboard, 2x 700MHz P3, 1GB ECC RAM
    2x Seagate SCSI 73GB off mainboard SCSI controller
    2x add-in Promise ATA133 controller
    4x Hitachi 500GB ATA133 disks off the Promise controllers
    add-in Intel gigabit ethernet controller

My gvinum config:

    12 volumes mirrored across da0 and da1
    1 volume 500GB mirrored across ad4 and ad8
    1 volume 500GB mirrored across ad6 and ad10

After my 4-STABLE to 6-STABLE upgrade of the first server I had two
occasions where two ATA disks became unavailable because the controller
stopped responding.  The first time I lost ad8 and ad10 containing
vol12.p1 and vol13.p1, the second time (after everything was manually
repaired) I lost vol12.p0 and vol13.p0.

When the ATA controller stops, two gvinum drives go down, the plexes
and the subdisks on them go down as well.  After a reboot, however,
all drives, plexes and subdisks are up again.  By comparing the
plexes by hand (using optimized cmp which still takes 5.5 hours for
500GB) I see that they are not equal, understandably because some
data was updated while one plex was down.

Seems that the failure of a drive and its subdisks is not recorded in
the metadata of the other drives.

I'm now contemplating a rollback of the upgrade as this server has been
down too long already but I'll try to get me a similar setup here to
do more testing.

Regards,

Paul Schenkeveld