From owner-freebsd-geom@FreeBSD.ORG Sun Mar 12 11:19:08 2006 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 811E716A43D for ; Sun, 12 Mar 2006 11:19:08 +0000 (GMT) (envelope-from fb-geom@psconsult.nl) Received: from mx0.psconsult.nl (ps226.psconsult.nl [213.222.19.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6667843D6A for ; Sun, 12 Mar 2006 11:19:06 +0000 (GMT) (envelope-from fb-geom@psconsult.nl) Received: from phuket.psconsult.nl (localhost [127.0.0.1]) by mx0.psconsult.nl (8.12.8p2/8.12.8) with ESMTP id k2CBJ5pe052383 for ; Sun, 12 Mar 2006 12:19:05 +0100 (CET) (envelope-from fb-geom@psconsult.nl) Received: (from paul@localhost) by phuket.psconsult.nl (8.12.8p2/8.12.8/Submit) id k2CBJ5no052382 for freebsd-geom@freebsd.org; Sun, 12 Mar 2006 12:19:05 +0100 (CET) Date: Sun, 12 Mar 2006 12:19:04 +0100 From: Paul Schenkeveld To: freebsd-geom@freebsd.org Message-ID: <20060312111904.GA52139@psconsult.nl> Mail-Followup-To: freebsd-geom@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i Subject: gvinum losing state about failed drives X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Mar 2006 11:19:08 -0000 Hi, My hardware: Intel L440GX+ serverboard, 2x 700MHz P3, 1GB ECC RAM 2x Seagate SCSI 73GB off mainboard SCSI controller 2x add-in Promise ATA133 controller 4x Hitachi 500GB ATA133 disks off the Promise controllers add-in Intel gigabit ethernet controller My gvinum config: 12 volumes mirrored across da0 and da1 1 volume 500GB mirrored across ad4 and ad8 1 volume 500GB mirrored across ad6 and ad10 After my 4-STABLE to 6-STABLE upgrade of the first server I had two occasions where two ATA disks became unavailable because the controller stopped responding. The first time I lost ad8 and ad10 containing vol12.p1 and vol13.p1, the second time (after everything was manually repaired) I lost vol12.p0 and vol13.p0. When the ATA controller stops, two gvinum drives go down, the plexes and the subdisks on them go down as well. After a reboot, however, all drives, plexes and subdisks are up again. By comparing the plexes by hand (using optimized cmp which still takes 5.5 hours for 500GB) I see that they are not equal, understandably because some data was updated while one plex was down. Seems that the failure of a drive and its subdisks is not recorded in the metadata of the other drives. I'm now contemplating a rollback of the upgrade as this server has been down too long already but I'll try to get me a similar setup here to do more testing. Regards, Paul Schenkeveld