Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 01 Feb 2007 07:55:08 -0600
From:      Eric Anderson <anderson@freebsd.org>
To:        Fluffles <etc@fluffles.net>
Cc:        Oliver Fromme <olli@lurza.secnetix.de>, sos@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>, "Simon L. Nielsen" <simon@freebsd.org>, freebsd-geom@freebsd.org
Subject:   Re: gmirror or ata problem
Message-ID:  <45C1F13C.2020503@freebsd.org>
In-Reply-To: <45C12274.7030404@fluffles.net>
References:  <200701300851.l0U8pEkO005250@lurza.secnetix.de>	<20070131201201.GB973@zaphod.nitro.dk>	<20070131220004.GC487@garage.freebsd.pl> <45C12274.7030404@fluffles.net>

index | next in thread | previous in thread | raw e-mail

On 01/31/07 17:12, Fluffles wrote:
> Pawel Jakub Dawidek wrote:
>> On Wed, Jan 31, 2007 at 09:12:02PM +0100, Simon L. Nielsen wrote:
>>   
>>> On 2007.01.30 09:51:14 +0100, Oliver Fromme wrote:
>>>
>>>     
>>>> This is strange.  gmirror just detached one of its disks
>>>> for no apparent reason.  I've built a mirror consisting of
>>>> the components ad0 and ad1 (both SATA drives).  It has
>>>> been running fine.  This is RELENG_6 from 2006-12-20.
>>>>
>>>> Yesterday evening ad1 was detached.  There is no other
>>>> error message logged on console or in the logs (i.e. no
>>>> I/O error such as a bad sector or anything).  There was
>>>> no particularly high load at that time.  In fact, the
>>>> machine had been under much higher load before, without
>>>> anything bad happening.
>>>>
>>>> This is from the logs:
>>>>
>>>> Jan 29 19:10:13 pluto -- MARK --
>>>> Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached
>>>> Jan 29 19:20:26 pluto kernel: subdisk1: detached
>>>> Jan 29 19:20:26 pluto kernel: ad1: detached
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6).
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6).
>>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected.
>>>> Jan 29 19:50:13 pluto -- MARK --
>>>>       
>>> I have seen similar problems on my graid3.  I think it's simply the
>>> disk which stops responding to commands, or at least ata(4) can't talk
>>> to the disk anymore...
>>>
>>> I see it on:
>>>
>>> ad10: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata5-master SATA150
>>> ad12: 305245MB <WDC WD3200SD-01KNB0 08.05J08> at ata6-master SATA150
>>> ad14: 305245MB <WDC WD3200YS-01PGB0 21.00M21> at ata7-master SATA150
>>>
>>> After a reboot everything seems fine again and my RAID is rebuilt.
>>>
>>> I don't know why it happens, but it sucks :-/.  I'm running 7-CURRENT
>>> BTW.
>>>     
>> It seems that when gmirror/graid3 writes to more than one disk at a
>> time, this puts too much load on ata channel or something and ata
>> disconnects the disk. I don't really know how it works exactly, but
>> maybe some timeout should be increased in the ata code?
>>   
> 
> My experiences are that even a single disk will timeout; 5 seconds is
> just not enough for the disk to spinup. Most disks will need 10 seconds
> at least.
> In ata-disk.c the timeout is set at 5 seconds. When set at 15 seconds;
> the ataidle-sleep mode works perfectly. I think this should be patched.
> Right now ataidle is broken on FreeBSD i would say, without patching the
> sourcecode at least.
> 
> For those not being able to wait for an official patch; try this:
> - edit /usr/src/sys/dev/ata/ata-disk.c
> - search for "timeout" case-insensitive
> - you will find:     request->timeout = 5;
> - change the value 5 to 15
> - save and execute: cd /usr/src; make kernel KERNCONF=GENERIC
> - after reboot you can test ataidle and it should work perfectly; with
> any geom raid layer or as 'single disk'

Is there any reason the sleep and idle pieces of ataidle could not be 
added to atacontrol?


Eric



home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45C1F13C.2020503>