From owner-freebsd-geom@FreeBSD.ORG Thu Feb 1 13:55:14 2007 Return-Path: X-Original-To: freebsd-geom@freebsd.org Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 74D7B16A408; Thu, 1 Feb 2007 13:55:14 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from mh1.centtech.com (moat3.centtech.com [64.129.166.50]) by mx1.freebsd.org (Postfix) with ESMTP id 27F3F13C471; Thu, 1 Feb 2007 13:55:13 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh1.centtech.com (8.13.8/8.13.8) with ESMTP id l11Dt8J5049247; Thu, 1 Feb 2007 07:55:08 -0600 (CST) (envelope-from anderson@freebsd.org) Message-ID: <45C1F13C.2020503@freebsd.org> Date: Thu, 01 Feb 2007 07:55:08 -0600 From: Eric Anderson User-Agent: Thunderbird 1.5.0.9 (X11/20070130) MIME-Version: 1.0 To: Fluffles References: <200701300851.l0U8pEkO005250@lurza.secnetix.de> <20070131201201.GB973@zaphod.nitro.dk> <20070131220004.GC487@garage.freebsd.pl> <45C12274.7030404@fluffles.net> In-Reply-To: <45C12274.7030404@fluffles.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.88.4/2510/Thu Feb 1 03:12:06 2007 on mh1.centtech.com X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=8.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.6 X-Spam-Checker-Version: SpamAssassin 3.1.6 (2006-10-03) on mh1.centtech.com Cc: Oliver Fromme , sos@freebsd.org, Pawel Jakub Dawidek , "Simon L. Nielsen" , freebsd-geom@freebsd.org Subject: Re: gmirror or ata problem X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Feb 2007 13:55:14 -0000 On 01/31/07 17:12, Fluffles wrote: > Pawel Jakub Dawidek wrote: >> On Wed, Jan 31, 2007 at 09:12:02PM +0100, Simon L. Nielsen wrote: >> >>> On 2007.01.30 09:51:14 +0100, Oliver Fromme wrote: >>> >>> >>>> This is strange. gmirror just detached one of its disks >>>> for no apparent reason. I've built a mirror consisting of >>>> the components ad0 and ad1 (both SATA drives). It has >>>> been running fine. This is RELENG_6 from 2006-12-20. >>>> >>>> Yesterday evening ad1 was detached. There is no other >>>> error message logged on console or in the logs (i.e. no >>>> I/O error such as a bad sector or anything). There was >>>> no particularly high load at that time. In fact, the >>>> machine had been under much higher load before, without >>>> anything bad happening. >>>> >>>> This is from the logs: >>>> >>>> Jan 29 19:10:13 pluto -- MARK -- >>>> Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached >>>> Jan 29 19:20:26 pluto kernel: subdisk1: detached >>>> Jan 29 19:20:26 pluto kernel: ad1: detached >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6). >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). >>>> Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected. >>>> Jan 29 19:50:13 pluto -- MARK -- >>>> >>> I have seen similar problems on my graid3. I think it's simply the >>> disk which stops responding to commands, or at least ata(4) can't talk >>> to the disk anymore... >>> >>> I see it on: >>> >>> ad10: 305245MB at ata5-master SATA150 >>> ad12: 305245MB at ata6-master SATA150 >>> ad14: 305245MB at ata7-master SATA150 >>> >>> After a reboot everything seems fine again and my RAID is rebuilt. >>> >>> I don't know why it happens, but it sucks :-/. I'm running 7-CURRENT >>> BTW. >>> >> It seems that when gmirror/graid3 writes to more than one disk at a >> time, this puts too much load on ata channel or something and ata >> disconnects the disk. I don't really know how it works exactly, but >> maybe some timeout should be increased in the ata code? >> > > My experiences are that even a single disk will timeout; 5 seconds is > just not enough for the disk to spinup. Most disks will need 10 seconds > at least. > In ata-disk.c the timeout is set at 5 seconds. When set at 15 seconds; > the ataidle-sleep mode works perfectly. I think this should be patched. > Right now ataidle is broken on FreeBSD i would say, without patching the > sourcecode at least. > > For those not being able to wait for an official patch; try this: > - edit /usr/src/sys/dev/ata/ata-disk.c > - search for "timeout" case-insensitive > - you will find: request->timeout = 5; > - change the value 5 to 15 > - save and execute: cd /usr/src; make kernel KERNCONF=GENERIC > - after reboot you can test ataidle and it should work perfectly; with > any geom raid layer or as 'single disk' Is there any reason the sleep and idle pieces of ataidle could not be added to atacontrol? Eric