Date: Fri, 18 Apr 2008 11:33:05 +1000 From: Gary Newcombe <gary@pattersonsoftware.com> To: freebsd-questions@freebsd.org Subject: gmirror disk fail questions... Message-ID: <20080418113305.53b72c64.gary@pattersonsoftware.com>
next in thread | raw e-mail | index | archive | help
Hi all, Yesterday, after users complaining of strange things happening in their accounting package, I rebooted the server only to find that it never came back up. gmirror was complaining about ad6 in the raid and the server had hung bringing the mirror up (this has happened twice now). uname -a FreeBSD mesh.lhshoses.com.au 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Thu Jan 18 22:55:39 EST 2007 gary@mesh.lhshoses.com.au:/usr/obj/usr/src/sys/MESH i386 After a hard reboot, provider ad4 was available, ad6 timed out and the server booted. dmesg ad4: 76324MB <WDC WD800JD-23LSA0 07.01D07> at ata2-master SATA150 ad6: 76324MB <WDC WD800JD-23LSA0 07.01D07> at ata3-master SATA150 GEOM_MIRROR: Device gm0 created (id=3803006992). GEOM_MIRROR: Device gm0: provider ad4 detected. Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR Root mount waiting for: GMIRROR GEOM_MIRROR: Force device gm0 start due to timeout. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. Trying to mount root from ufs:/dev/mirror/gm0s1a # gmirror status [mesh:/var/log]# gmirror status Name Status Components mirror/gm0 DEGRADED ad4 looking in /dev/ however, we have crw-r----- 1 root operator 0, 83 17 Apr 13:58 ad4 crw-r----- 1 root operator 0, 91 17 Apr 13:58 ad4s1 crw-r----- 1 root operator 0, 84 17 Apr 13:58 ad6 crw-r----- 1 root operator 0, 92 17 Apr 13:58 ad6a crw-r----- 1 root operator 0, 99 17 Apr 13:58 ad6as1 crw-r----- 1 root operator 0, 93 17 Apr 13:58 ad6b crw-r----- 1 root operator 0, 94 17 Apr 13:58 ad6c crw-r----- 1 root operator 0, 100 17 Apr 13:58 ad6cs1 crw-r----- 1 root operator 0, 95 17 Apr 13:58 ad6d crw-r----- 1 root operator 0, 96 17 Apr 13:58 ad6e crw-r----- 1 root operator 0, 97 17 Apr 13:58 ad6f crw-r----- 1 root operator 0, 98 17 Apr 13:58 ad6s1 crw-r----- 1 root operator 0, 101 17 Apr 13:58 ad6s1a crw-r----- 1 root operator 0, 102 17 Apr 13:58 ad6s1b crw-r----- 1 root operator 0, 103 17 Apr 13:58 ad6s1c crw-r----- 1 root operator 0, 104 17 Apr 13:58 ad6s1d crw-r----- 1 root operator 0, 105 17 Apr 13:58 ad6s1e crw-r----- 1 root operator 0, 106 17 Apr 13:58 ad6s1f I am guessing that a failing disk is responsible for the data corruption, but I have no errors in /var/log/messages or console.log. On every boot, the mirror is marked clean ad there's no warnings about a disk failing anywhere? Where should I be looking for or what should I be doing to get any warnings? Also, how-come if ad4 is the working disk, ad4's slices seem to be labelled as ad6. What's going on here? To me, ad6 appears to have correct labelling for the mirror from ad6s1a-f How can I test for sure whether the disk is damaged or dying, or whether this is just a temporary glitch in the mirror? This is the first time I've had a gmirror raid give me problems. Assuming ad6 has been deactivated/disconnected, I was thinking of trying: gmirror activate gm0 ad6 gmirror rebuild gm0 ad6 Is this safe? I haven't tried pulling either disk from the server as I am remote from the site. Cheers, Gary.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080418113305.53b72c64.gary>