From owner-freebsd-geom@FreeBSD.ORG Fri Feb 2 20:20:00 2007 Return-Path: X-Original-To: freebsd-geom@FreeBSD.org Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B258B16A407; Fri, 2 Feb 2007 20:20:00 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8]) by mx1.freebsd.org (Postfix) with ESMTP id 24B3E13C467; Fri, 2 Feb 2007 20:19:59 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (ividqf@localhost [127.0.0.1]) by lurza.secnetix.de (8.13.4/8.13.4) with ESMTP id l12KJqPZ018233; Fri, 2 Feb 2007 21:19:57 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.13.4/8.13.1/Submit) id l12KJpcD018232; Fri, 2 Feb 2007 21:19:51 +0100 (CET) (envelope-from olli) From: Oliver Fromme Message-Id: <200702022019.l12KJpcD018232@lurza.secnetix.de> To: etc@fluffles.net (Fluffles) Date: Fri, 2 Feb 2007 21:19:51 +0100 (CET) In-Reply-To: <45C12274.7030404@fluffles.net> X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Fri, 02 Feb 2007 21:19:57 +0100 (CET) Cc: freebsd-geom@FreeBSD.org, Pawel Jakub Dawidek , "Simon L. Nielsen" , sos@FreeBSD.org Subject: Re: gmirror or ata problem X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Feb 2007 20:20:00 -0000 Fluffles wrote: > Pawel Jakub Dawidek wrote: > > Simon L. Nielsen wrote: > > > Oliver Fromme wrote: > > > > This is strange. gmirror just detached one of its disks > > > > for no apparent reason. I've built a mirror consisting of > > > > the components ad0 and ad1 (both SATA drives). It has > > > > been running fine. This is RELENG_6 from 2006-12-20. > > > > > > > > Yesterday evening ad1 was detached. There is no other > > > > error message logged on console or in the logs (i.e. no > > > > I/O error such as a bad sector or anything). There was > > > > no particularly high load at that time. In fact, the > > > > machine had been under much higher load before, without > > > > anything bad happening. > > > > > > > > This is from the logs: > > > > > > > > Jan 29 19:10:13 pluto -- MARK -- > > > > Jan 29 19:20:26 pluto kernel: ad1: FAILURE - device detached > > > > Jan 29 19:20:26 pluto kernel: subdisk1: detached > > > > Jan 29 19:20:26 pluto kernel: ad1: detached > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot write metadata on ad1 (device=gm0, error=6). > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Cannot update metadata on disk ad1 (error=6). > > > > Jan 29 19:20:26 pluto kernel: GEOM_MIRROR: Device gm0: provider ad1 disconnected. > > > > Jan 29 19:50:13 pluto -- MARK -- > > > > > > > I have seen similar problems on my graid3. I think it's simply the > > > disk which stops responding to commands, or at least ata(4) can't talk > > > to the disk anymore... > > > > > > I see it on: > > > > > > ad10: 305245MB at ata5-master SATA150 > > > ad12: 305245MB at ata6-master SATA150 > > > ad14: 305245MB at ata7-master SATA150 > > > > > > After a reboot everything seems fine again and my RAID is rebuilt. > > > > > > I don't know why it happens, but it sucks :-/. I'm running 7-CURRENT > > > BTW. > > > > It seems that when gmirror/graid3 writes to more than one disk at a > > time, this puts too much load on ata channel or something and ata > > disconnects the disk. I don't really know how it works exactly, but > > maybe some timeout should be increased in the ata code? > > My experiences are that even a single disk will timeout; 5 seconds is > just not enough for the disk to spinup. Most disks will need 10 seconds > at least. In my case it has nothing to do with spin up / spin down. I do not use ataidle, and the disks are running all the time. They don't have to spin up. So it must be something else causing the problems. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, USt-Id: DE204219783 Any opinions expressed in this message are personal to the author and may not necessarily reflect the opinions of secnetix GmbH & Co KG in any way. FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C++ is over-complicated nonsense. And Bjorn Shoestrap's book a danger to public health. I tried reading it once, I was in recovery for months." -- Cliff Sarginson