Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Jan 2007 23:00:04 +0100
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        "Simon L. Nielsen" <simon@FreeBSD.org>
Cc:        sos@FreeBSD.org, Oliver Fromme <olli@lurza.secnetix.de>, freebsd-geom@FreeBSD.ORG
Subject:   Re: gmirror or ata problem
Message-ID:  <20070131220004.GC487@garage.freebsd.pl>
In-Reply-To: <20070131201201.GB973@zaphod.nitro.dk>
References:  <200701300851.l0U8pEkO005250@lurza.secnetix.de> <20070131201201.GB973@zaphod.nitro.dk>

next in thread | previous in thread | raw e-mail | index | archive | help

--6zdv2QT/q3FMhpsV
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jan 31, 2007 at 09:12:02PM +0100, Simon L. Nielsen wrote:
> On 2007.01.30 09:51:14 +0100, Oliver Fromme wrote:
>=20
> > This is strange.  gmirror just detached one of its disks
> > for no apparent reason.  I've built a mirror consisting of
> > the components ad0 and ad1 (both SATA drives).  It has
> > been running fine.  This is RELENG_6 from 2006-12-20.
> >=20
> > Yesterday evening ad1 was detached.  There is no other
> > error message logged on console or in the logs (i.e. no
> > I/O error such as a bad sector or anything).  There was
> > no particularly high load at that time.  In fact, the
> > machine had been under much higher load before, without
> > anything bad happening.
> >=20
> > This is from the logs:
> >=20
> > Jan 29 19:10:13 pluto -- MARK --
> > Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached
> > Jan 29 19:20:26 pluto kernel: subdisk1: detached
> > Jan 29 19:20:26 pluto kernel: ad1: detached
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1=
 (device=3Dgm0, error=3D6).
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on di=
sk ad1 (error=3D6).
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on di=
sk ad1 (error=3D6).
> > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 dis=
connected.
> > Jan 29 19:50:13 pluto -- MARK --
>=20
> I have seen similar problems on my graid3.  I think it's simply the
> disk which stops responding to commands, or at least ata(4) can't talk
> to the disk anymore...
>=20
> I see it on:
>=20
> ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150
> ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150
> ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150
>=20
> After a reboot everything seems fine again and my RAID is rebuilt.
>=20
> I don't know why it happens, but it sucks :-/.  I'm running 7-CURRENT
> BTW.

It seems that when gmirror/graid3 writes to more than one disk at a
time, this puts too much load on ata channel or something and ata
disconnects the disk. I don't really know how it works exactly, but
maybe some timeout should be increased in the ata code?

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--6zdv2QT/q3FMhpsV
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFwRFkForvXbEpPzQRAlMeAKDWwPjha/sx1jFR6XMMA4xJ4iSQtgCeNZ06
wELBJjHfOcMiP1VPUjJVBkU=
=/smt
-----END PGP SIGNATURE-----

--6zdv2QT/q3FMhpsV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070131220004.GC487>