Date: Fri, 2 Feb 2007 21:19:51 +0100 (CET) From: Oliver Fromme <olli@lurza.secnetix.de> To: etc@fluffles.net (Fluffles) Cc: freebsd-geom@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>, "Simon L. Nielsen" <simon@FreeBSD.org>, sos@FreeBSD.org Subject: Re: gmirror or ata problem Message-ID: <200702022019.l12KJpcD018232@lurza.secnetix.de> In-Reply-To: <45C12274.7030404@fluffles.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Fluffles wrote: > Pawel Jakub Dawidek wrote: > > Simon L. Nielsen wrote: > > > Oliver Fromme wrote: > > > > This is strange. gmirror just detached one of its disks > > > > for no apparent reason. I've built a mirror consisting of > > > > the components ad0 and ad1 (both SATA drives). It has > > > > been running fine. This is RELENG_6 from 2006-12-20. > > > > > > > > Yesterday evening ad1 was detached. There is no other > > > > error message logged on console or in the logs (i.e. no > > > > I/O error such as a bad sector or anything). There was > > > > no particularly high load at that time. In fact, the > > > > machine had been under much higher load before, without > > > > anything bad happening. > > > > > > > > This is from the logs: > > > > > > > > Jan 29 19:10:13 pluto -- MARK -- > > > > Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached > > > > Jan 29 19:20:26 pluto kernel: subdisk1: detached > > > > Jan 29 19:20:26 pluto kernel: ad1: detached > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6). > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected. > > > > Jan 29 19:50:13 pluto -- MARK -- > > > > > > > I have seen similar problems on my graid3. I think it's simply the > > > disk which stops responding to commands, or at least ata(4) can't talk > > > to the disk anymore... > > > > > > I see it on: > > > > > > ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150 > > > ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150 > > > ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150 > > > > > > After a reboot everything seems fine again and my RAID is rebuilt. > > > > > > I don't know why it happens, but it sucks :-/. I'm running 7-CURRENT > > > BTW. > > > > It seems that when gmirror/graid3 writes to more than one disk at a > > time, this puts too much load on ata channel or something and ata > > disconnects the disk. I don't really know how it works exactly, but > > maybe some timeout should be increased in the ata code? > > My experiences are that even a single disk will timeout; 5 seconds is > just not enough for the disk to spinup. Most disks will need 10 seconds > at least. In my case it has nothing to do with spin up / spin down. I do not use ataidle, and the disks are running all the time. They don't have to spin up. So it must be something else causing the problems. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, USt-Id: DE204219783 Any opinions expressed in this message are personal to the author and may not necessarily reflect the opinions of secnetix GmbH & Co KG in any way. FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C++ is over-complicated nonsense. And Bjorn Shoestrap's book a danger to public health. I tried reading it once, I was in recovery for months." -- Cliff Sarginson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200702022019.l12KJpcD018232>