Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 Feb 2007 21:19:51 +0100 (CET)
From:      Oliver Fromme <olli@lurza.secnetix.de>
To:        etc@fluffles.net (Fluffles)
Cc:        freebsd-geom@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>, "Simon L. Nielsen" <simon@FreeBSD.org>, sos@FreeBSD.org
Subject:   Re: gmirror or ata problem
Message-ID:  <200702022019.l12KJpcD018232@lurza.secnetix.de>
In-Reply-To: <45C12274.7030404@fluffles.net>

next in thread | previous in thread | raw e-mail | index | archive | help

Fluffles wrote:
 > Pawel Jakub Dawidek wrote:
 > > Simon L. Nielsen wrote:
 > > > Oliver Fromme wrote:
 > > > > This is strange.  gmirror just detached one of its disks
 > > > > for no apparent reason.  I've built a mirror consisting of
 > > > > the components ad0 and ad1 (both SATA drives).  It has
 > > > > been running fine.  This is RELENG_6 from 2006-12-20.
 > > > > 
 > > > > Yesterday evening ad1 was detached.  There is no other
 > > > > error message logged on console or in the logs (i.e. no
 > > > > I/O error such as a bad sector or anything).  There was
 > > > > no particularly high load at that time.  In fact, the
 > > > > machine had been under much higher load before, without
 > > > > anything bad happening.
 > > > > 
 > > > > This is from the logs:
 > > > > 
 > > > > Jan 29 19:10:13 pluto -- MARK --
 > > > > Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached
 > > > > Jan 29 19:20:26 pluto kernel: subdisk1: detached
 > > > > Jan 29 19:20:26 pluto kernel: ad1: detached
 > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6).
 > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
 > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
 > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected.
 > > > > Jan 29 19:50:13 pluto -- MARK --
 > > > >       
 > > > I have seen similar problems on my graid3.  I think it's simply the
 > > > disk which stops responding to commands, or at least ata(4) can't talk
 > > > to the disk anymore...
 > > > 
 > > > I see it on:
 > > > 
 > > > ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150
 > > > ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150
 > > > ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150
 > > > 
 > > > After a reboot everything seems fine again and my RAID is rebuilt.
 > > > 
 > > > I don't know why it happens, but it sucks :-/.  I'm running 7-CURRENT
 > > > BTW.
 > > 
 > > It seems that when gmirror/graid3 writes to more than one disk at a
 > > time, this puts too much load on ata channel or something and ata
 > > disconnects the disk. I don't really know how it works exactly, but
 > > maybe some timeout should be increased in the ata code?
 > 
 > My experiences are that even a single disk will timeout; 5 seconds is
 > just not enough for the disk to spinup. Most disks will need 10 seconds
 > at least.

In my case it has nothing to do with spin up / spin down.
I do not use ataidle, and the disks are running all the
time.  They don't have to spin up.

So it must be something else causing the problems.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606, USt-Id: DE204219783
Any opinions expressed in this message are personal to the author and may
not necessarily reflect the opinions of secnetix GmbH & Co KG in any way.
FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"C++ is over-complicated nonsense. And Bjorn Shoestrap's book
a danger to public health. I tried reading it once, I was in
recovery for months."
        -- Cliff Sarginson



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200702022019.l12KJpcD018232>