From owner-freebsd-questions@FreeBSD.ORG Fri Apr 18 04:38:25 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 41D53106566C for ; Fri, 18 Apr 2008 04:38:25 +0000 (UTC) (envelope-from gary@pattersonsoftware.com) Received: from qsrv03ps.mx.bigpond.com (qsrv03ps.mx.bigpond.com [144.140.82.183]) by mx1.freebsd.org (Postfix) with ESMTP id BACDD8FC27 for ; Fri, 18 Apr 2008 04:38:24 +0000 (UTC) (envelope-from gary@pattersonsoftware.com) Received: from nskntotgx03p.mx.bigpond.com ([121.223.241.235]) by nskntmtas05p.mx.bigpond.com with ESMTP id <20080418013318.SZT3571.nskntmtas05p.mx.bigpond.com@nskntotgx03p.mx.bigpond.com> for ; Fri, 18 Apr 2008 01:33:18 +0000 Received: from mail.pattersonsoftware.com ([121.223.241.235]) by nskntotgx03p.mx.bigpond.com with ESMTP id <20080418013316.RAXK9173.nskntotgx03p.mx.bigpond.com@mail.pattersonsoftware.com> for ; Fri, 18 Apr 2008 01:33:16 +0000 Received: from localhost (mail [192.168.111.46]) by mail.pattersonsoftware.com (Postfix) with ESMTP id 67797536055 for ; Fri, 18 Apr 2008 11:33:16 +1000 (EST) X-Virus-Scanned: amavisd-new at pattersonsoftware.com Received: from mail.pattersonsoftware.com ([192.168.111.46]) by localhost (mail.pattersonsoftware.com [192.168.111.46]) (amavisd-new, port 10024) with ESMTP id 8E8Vx4mus+kS for ; Fri, 18 Apr 2008 11:33:08 +1000 (EST) Received: from elegia (60-242-254-180.static.tpgi.com.au [60.242.254.180]) by mail.pattersonsoftware.com (Postfix) with ESMTP id DA089536042 for ; Fri, 18 Apr 2008 11:33:07 +1000 (EST) Date: Fri, 18 Apr 2008 11:33:05 +1000 From: Gary Newcombe To: freebsd-questions@freebsd.org Message-Id: <20080418113305.53b72c64.gary@pattersonsoftware.com> Organization: Patterson Software X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.5; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-RPD-ScanID: Class unknown; VirusThreatLevel unknown, RefID str=0001.0A150203.4807FA5E.0022,ss=1,fgs=0 Subject: gmirror disk fail questions... X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Apr 2008 04:38:25 -0000 Hi all, Yesterday, after users complaining of strange things happening in their accounting package, I rebooted the server only to find that it never came back up. gmirror was complaining about ad6 in the raid and the server had hung bringing the mirror up (this has happened twice now). uname -a FreeBSD mesh.lhshoses.com.au 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Thu Jan 18 22:55:39 EST 2007 gary@mesh.lhshoses.com.au:/usr/obj/usr/src/sys/MESH i386 After a hard reboot, provider ad4 was available, ad6 timed out and the server booted. dmesg ad4: 76324MB at ata2-master SATA150 ad6: 76324MB at ata3-master SATA150 GEOM_MIRROR: Device gm0 created (id=3803006992). GEOM_MIRROR: Device gm0: provider ad4 detected. Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR GEOM_MIRROR: Force device gm0 start due to timeout. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. Trying to mount root from ufs:/dev/mirror/gm0s1a # gmirror status [mesh:/var/log]# gmirror status Name Status Components mirror/gm0 DEGRADED ad4 looking in /dev/ however, we have crw-r----- 1 root operator 0, 83 17 Apr 13:58 ad4 crw-r----- 1 root operator 0, 91 17 Apr 13:58 ad4s1 crw-r----- 1 root operator 0, 84 17 Apr 13:58 ad6 crw-r----- 1 root operator 0, 92 17 Apr 13:58 ad6a crw-r----- 1 root operator 0, 99 17 Apr 13:58 ad6as1 crw-r----- 1 root operator 0, 93 17 Apr 13:58 ad6b crw-r----- 1 root operator 0, 94 17 Apr 13:58 ad6c crw-r----- 1 root operator 0, 100 17 Apr 13:58 ad6cs1 crw-r----- 1 root operator 0, 95 17 Apr 13:58 ad6d crw-r----- 1 root operator 0, 96 17 Apr 13:58 ad6e crw-r----- 1 root operator 0, 97 17 Apr 13:58 ad6f crw-r----- 1 root operator 0, 98 17 Apr 13:58 ad6s1 crw-r----- 1 root operator 0, 101 17 Apr 13:58 ad6s1a crw-r----- 1 root operator 0, 102 17 Apr 13:58 ad6s1b crw-r----- 1 root operator 0, 103 17 Apr 13:58 ad6s1c crw-r----- 1 root operator 0, 104 17 Apr 13:58 ad6s1d crw-r----- 1 root operator 0, 105 17 Apr 13:58 ad6s1e crw-r----- 1 root operator 0, 106 17 Apr 13:58 ad6s1f I am guessing that a failing disk is responsible for the data corruption, but I have no errors in /var/log/messages or console.log. On every boot, the mirror is marked clean ad there's no warnings about a disk failing anywhere? Where should I be looking for or what should I be doing to get any warnings? Also, how-come if ad4 is the working disk, ad4's slices seem to be labelled as ad6. What's going on here? To me, ad6 appears to have correct labelling for the mirror from ad6s1a-f How can I test for sure whether the disk is damaged or dying, or whether this is just a temporary glitch in the mirror? This is the first time I've had a gmirror raid give me problems. Assuming ad6 has been deactivated/disconnected, I was thinking of trying: gmirror activate gm0 ad6 gmirror rebuild gm0 ad6 Is this safe? I haven't tried pulling either disk from the server as I am remote from the site. Cheers, Gary.