Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Apr 2008 10:40:04 -0700
From:      Christopher Cowart <ccowart@rescomp.berkeley.edu>
To:        Gary Newcombe <gary@pattersonsoftware.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: gmirror disk fail questions...
Message-ID:  <20080418174004.GE27135@hal.rescomp.berkeley.edu>
In-Reply-To: <20080418113305.53b72c64.gary@pattersonsoftware.com>
References:  <20080418113305.53b72c64.gary@pattersonsoftware.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--eIqwoG8s2bAM0wbT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Gary Newcombe wrote:
[...]
> # gmirror status
>=20
> [mesh:/var/log]# gmirror status
>       Name    Status  Components
> mirror/gm0  DEGRADED  ad4
>=20
>=20
> looking in /dev/ however, we have
>=20
> crw-r-----  1 root  operator    0,  83 17 Apr 13:58 ad4
> crw-r-----  1 root  operator    0,  91 17 Apr 13:58 ad4s1
> crw-r-----  1 root  operator    0,  84 17 Apr 13:58 ad6
> crw-r-----  1 root  operator    0,  92 17 Apr 13:58 ad6a
> crw-r-----  1 root  operator    0,  99 17 Apr 13:58 ad6as1
> crw-r-----  1 root  operator    0,  93 17 Apr 13:58 ad6b
> crw-r-----  1 root  operator    0,  94 17 Apr 13:58 ad6c
> crw-r-----  1 root  operator    0, 100 17 Apr 13:58 ad6cs1
> crw-r-----  1 root  operator    0,  95 17 Apr 13:58 ad6d
> crw-r-----  1 root  operator    0,  96 17 Apr 13:58 ad6e
> crw-r-----  1 root  operator    0,  97 17 Apr 13:58 ad6f
> crw-r-----  1 root  operator    0,  98 17 Apr 13:58 ad6s1
> crw-r-----  1 root  operator    0, 101 17 Apr 13:58 ad6s1a
> crw-r-----  1 root  operator    0, 102 17 Apr 13:58 ad6s1b
> crw-r-----  1 root  operator    0, 103 17 Apr 13:58 ad6s1c
> crw-r-----  1 root  operator    0, 104 17 Apr 13:58 ad6s1d
> crw-r-----  1 root  operator    0, 105 17 Apr 13:58 ad6s1e
> crw-r-----  1 root  operator    0, 106 17 Apr 13:58 ad6s1f
>=20
> I am guessing that a failing disk is responsible for the data
> corruption, but I have no errors in /var/log/messages or console.log.
> On every boot, the mirror is marked clean ad there's no warnings about
> a disk failing anywhere? Where should I be looking for or what should I
> be doing to get any warnings?
>=20
> Also, how-come if ad4 is the working disk, ad4's slices seem to be
> labelled as ad6. What's going on here? To me, ad6 appears to have
> correct labelling for the mirror from ad6s1a-f

I believe the kernel hides individual labels for a gmirror volume. The
labels on ad4 should be visible in /dev/mirror/. Because gmirror really
just mirrors the data block by block (with a little bit of meta data at
the very end of the drive), once the drive is no longer a member of an
array, the kernel treats it as an individual drive and allows visibility
of all the labels.

> How can I test for sure whether the disk is damaged or dying, or
> whether this is just a temporary glitch in the mirror? This is the
> first time I've had a gmirror raid give me problems.

The first time a drive gets kicked out, I typically try to re-insert it.
We have monitoring, so we receive notifications if it fails again. After
that, I get the vendor to replace it.=20

> Assuming ad6 has been deactivated/disconnected, I was thinking of
> trying:
>=20
> gmirror activate gm0 ad6
> gmirror rebuild gm0 ad6
>=20
> Is this safe?

You have to kick ad6 out and re-insert it:
# gmirror forget
# gmirror insert gm0 /dev/ad6

After doing that, I would watch closely for a while in case your drive
is actually failing. I've written a small nagios check for gmirror; let
me know if you'd like me to send it (it could easily be adapted to a
cron job). You can also get `gmirror status' output in your dailies by
adding daily_status_gmirror_enable=3D"YES" to /etc/periodic.conf.

But, given it's timing out on boot, I would personally bag the drive and
replace it. You'll still need to run the same 2 commands above.

--=20
Chris Cowart
Network Technical Lead
Network & Infrastructure Services, RSSP-IT
UC Berkeley

--eIqwoG8s2bAM0wbT
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iQIVAwUBSAjc8yPHEDszU3zYAQJ4VRAAgt5rnd+5hEWuVcCMqHK4UJluhvZnxfLT
95m/8VRni6kbFwvSABPMAdVRI87lloWD+MpU7yuO+1PKBq9VfT68dc7E4EjtVQoU
Pi3CtNv7zmBK6HEaK1PgSnD6uVp6UAr+sBrZjNqfPf+8oooC+AW7p50BjxE6w3bZ
vd8TEfTb9UVmpJoHeQ8sK0MfxyURfbr9M7Y95/q/Rj+/QFGeqgPr/sxHcXlbfkfx
7/IROKbPwlDuHdvhVPH3yYyCbOHei7AD/Lf5fjOYhytUeP4KPqUVAgO99JHYQXMD
JpcBDKjGGNcTUZ4xIm+iXMeoWk3QhF93Hzmfe/3ioJq/PQ4xFVhlZeBeqyKEObMs
Vzaoi3pk7w/ym0xtgqHER00Roea1E9wOoUyuvXc9rWtbeYrN9ZApsDqYcAQsCbMs
lxpLr2zvQ1/Wpni670xK3AHFaPpkbI+PKQCdyHf7+yWYU0IJwpPwCge00KBI7NMg
F7xDyCTa/2sH4yaZ1x/zJh2zSf2wRwfS5Gyr3C0llYy1ClWYiTtffcMsd7il9xYu
0sbwUdX/NvWZMJfMMAF7SGCO/icYJJY0Zh/SRMoA548OvQoZN0IApyoC0u0boqqi
6IlWmRl3zK0i+Pes5HO2e6zQnKNYpWriYUAwgOgJjzACrMz/1Sm4hWzAgVo00p4M
yl0EDmKxZ9s=
=n9zi
-----END PGP SIGNATURE-----

--eIqwoG8s2bAM0wbT--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080418174004.GE27135>