Date: Thu, 01 Feb 2007 00:12:52 +0100 From: Fluffles <etc@fluffles.net> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: freebsd-geom@FreeBSD.ORG, Oliver Fromme <olli@lurza.secnetix.de>, "Simon L. Nielsen" <simon@FreeBSD.org>, sos@FreeBSD.org Subject: Re: gmirror or ata problem Message-ID: <45C12274.7030404@fluffles.net> In-Reply-To: <20070131220004.GC487@garage.freebsd.pl> References: <200701300851.l0U8pEkO005250@lurza.secnetix.de> <20070131201201.GB973@zaphod.nitro.dk> <20070131220004.GC487@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
Pawel Jakub Dawidek wrote: > On Wed, Jan 31, 2007 at 09:12:02PM +0100, Simon L. Nielsen wrote: > >> On 2007.01.30 09:51:14 +0100, Oliver Fromme wrote: >> >> >>> This is strange. gmirror just detached one of its disks >>> for no apparent reason. I've built a mirror consisting of >>> the components ad0 and ad1 (both SATA drives). It has >>> been running fine. This is RELENG_6 from 2006-12-20. >>> >>> Yesterday evening ad1 was detached. There is no other >>> error message logged on console or in the logs (i.e. no >>> I/O error such as a bad sector or anything). There was >>> no particularly high load at that time. In fact, the >>> machine had been under much higher load before, without >>> anything bad happening. >>> >>> This is from the logs: >>> >>> Jan 29 19:10:13 pluto -- MARK -- >>> Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached >>> Jan 29 19:20:26 pluto kernel: subdisk1: detached >>> Jan 29 19:20:26 pluto kernel: ad1: detached >>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6). >>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). >>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). >>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected. >>> Jan 29 19:50:13 pluto -- MARK -- >>> >> I have seen similar problems on my graid3. I think it's simply the >> disk which stops responding to commands, or at least ata(4) can't talk >> to the disk anymore... >> >> I see it on: >> >> ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150 >> ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150 >> ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150 >> >> After a reboot everything seems fine again and my RAID is rebuilt. >> >> I don't know why it happens, but it sucks :-/. I'm running 7-CURRENT >> BTW. >> > > It seems that when gmirror/graid3 writes to more than one disk at a > time, this puts too much load on ata channel or something and ata > disconnects the disk. I don't really know how it works exactly, but > maybe some timeout should be increased in the ata code? > My experiences are that even a single disk will timeout; 5 seconds is just not enough for the disk to spinup. Most disks will need 10 seconds at least. In ata-disk.c the timeout is set at 5 seconds. When set at 15 seconds; the ataidle-sleep mode works perfectly. I think this should be patched. Right now ataidle is broken on FreeBSD i would say, without patching the sourcecode at least. For those not being able to wait for an official patch; try this: - edit /usr/src/sys/dev/ata/ata-disk.c - search for "timeout" case-insensitive - you will find: request->timeout = 5; - change the value 5 to 15 - save and execute: cd /usr/src; make kernel KERNCONF=GENERIC - after reboot you can test ataidle and it should work perfectly; with any geom raid layer or as 'single disk' - Veronica
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45C12274.7030404>