Date: Thu, 01 Feb 2007 07:55:08 -0600 From: Eric Anderson <anderson@freebsd.org> To: Fluffles <etc@fluffles.net> Cc: Oliver Fromme <olli@lurza.secnetix.de>, sos@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>, "Simon L. Nielsen" <simon@freebsd.org>, freebsd-geom@freebsd.org Subject: Re: gmirror or ata problem Message-ID: <45C1F13C.2020503@freebsd.org> In-Reply-To: <45C12274.7030404@fluffles.net> References: <200701300851.l0U8pEkO005250@lurza.secnetix.de> <20070131201201.GB973@zaphod.nitro.dk> <20070131220004.GC487@garage.freebsd.pl> <45C12274.7030404@fluffles.net>
index | next in thread | previous in thread | raw e-mail
On 01/31/07 17:12, Fluffles wrote: > Pawel Jakub Dawidek wrote: >> On Wed, Jan 31, 2007 at 09:12:02PM +0100, Simon L. Nielsen wrote: >> >>> On 2007.01.30 09:51:14 +0100, Oliver Fromme wrote: >>> >>> >>>> This is strange. gmirror just detached one of its disks >>>> for no apparent reason. I've built a mirror consisting of >>>> the components ad0 and ad1 (both SATA drives). It has >>>> been running fine. This is RELENG_6 from 2006-12-20. >>>> >>>> Yesterday evening ad1 was detached. There is no other >>>> error message logged on console or in the logs (i.e. no >>>> I/O error such as a bad sector or anything). There was >>>> no particularly high load at that time. In fact, the >>>> machine had been under much higher load before, without >>>> anything bad happening. >>>> >>>> This is from the logs: >>>> >>>> Jan 29 19:10:13 pluto -- MARK -- >>>> Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached >>>> Jan 29 19:20:26 pluto kernel: subdisk1: detached >>>> Jan 29 19:20:26 pluto kernel: ad1: detached >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6). >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected. >>>> Jan 29 19:50:13 pluto -- MARK -- >>>> >>> I have seen similar problems on my graid3. I think it's simply the >>> disk which stops responding to commands, or at least ata(4) can't talk >>> to the disk anymore... >>> >>> I see it on: >>> >>> ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150 >>> ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150 >>> ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150 >>> >>> After a reboot everything seems fine again and my RAID is rebuilt. >>> >>> I don't know why it happens, but it sucks :-/. I'm running 7-CURRENT >>> BTW. >>> >> It seems that when gmirror/graid3 writes to more than one disk at a >> time, this puts too much load on ata channel or something and ata >> disconnects the disk. I don't really know how it works exactly, but >> maybe some timeout should be increased in the ata code? >> > > My experiences are that even a single disk will timeout; 5 seconds is > just not enough for the disk to spinup. Most disks will need 10 seconds > at least. > In ata-disk.c the timeout is set at 5 seconds. When set at 15 seconds; > the ataidle-sleep mode works perfectly. I think this should be patched. > Right now ataidle is broken on FreeBSD i would say, without patching the > sourcecode at least. > > For those not being able to wait for an official patch; try this: > - edit /usr/src/sys/dev/ata/ata-disk.c > - search for "timeout" case-insensitive > - you will find: request->timeout = 5; > - change the value 5 to 15 > - save and execute: cd /usr/src; make kernel KERNCONF=GENERIC > - after reboot you can test ataidle and it should work perfectly; with > any geom raid layer or as 'single disk' Is there any reason the sleep and idle pieces of ataidle could not be added to atacontrol? Erichome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45C1F13C.2020503>
