From owner-freebsd-stable@FreeBSD.ORG Tue Mar 7 10:43:16 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A03116A420 for ; Tue, 7 Mar 2006 10:43:16 +0000 (GMT) (envelope-from volker@vwsoft.com) Received: from gwfra.elbekies.net (tce71.tce85.de [195.145.102.20]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8A6D143D72 for ; Tue, 7 Mar 2006 10:43:12 +0000 (GMT) (envelope-from volker@vwsoft.com) Received-SPF: pass (gwfra.elbekies.net: domain of vwsoft.com designates 212.23.126.2 as permitted sender) client-ip=212.23.126.2; envelope-from=volker@vwsoft.com; helo=mail.vtec.ipme.de; Received: from mail.vtec.ipme.de (gprs-pool-1-002.eplus-online.de [212.23.126.2]) by gwfra.elbekies.net (Postfix) with ESMTP id A91D01702F for ; Tue, 7 Mar 2006 11:43:04 +0100 (CET) Received: from [127.0.0.1] (unknown [192.168.201.3]) by mail.vtec.ipme.de (Postfix) with ESMTP id 222495C4F for ; Tue, 7 Mar 2006 11:42:59 +0100 (CET) Message-ID: <440D63BF.2070904@vwsoft.com> Date: Tue, 07 Mar 2006 11:43:11 +0100 From: Volker User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.10) Gecko/20050716 Thunderbird/1.0.6 Mnenhy/0.6.0.101 MIME-Version: 1.0 To: freebsd-stable@freebsd.org X-Enigmail-Version: 0.94.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-VWSoft-MailScanner: Found to be clean X-TarmacIntl-MailScanner: Found to be clean X-MailScanner-From: volker@vwsoft.com Subject: SATA drive 1 disappears X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Mar 2006 10:43:16 -0000 Dear list, I've seen GEOM mirror error messages at two nearly identical systems. Both are running on Asrock K7VT4xx (VIA chipset) boards and having two SATA drives connected (Hitachi HDS728080PLA380/PF2OA60A). On both systems we're using gmirror RAID-1 per slice. After same weeks of productional use, on both systems the first disc (ad4) within the RAID set came out with error messages like: > +ad4: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=127199808 > +ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=10968959 > +ad4: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=10968959 > +ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=118404223 > +ad4: FAILURE - WRITE_DMA timed out LBA=10968959 > +GEOM_MIRROR: Request failed (error=5). ad4s1[WRITE(offset=5616074752, length=16384)] > +GEOM_MIRROR: Device gm0s1: provider ad4s1 disconnected. > +ad4: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=118404223 > +ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=122117983 > +ad4: FAILURE - WRITE_DMA timed out LBA=118404223 > ... > +subdisk4: detached > +ad4: detached > +GEOM_MIRROR: Device gm0s2: provider ad4s2 disconnected. > +GEOM_MIRROR: Request failed (error=5). ad4s2[READ(offset=8987662336, length=2048)] After these messages the disc isn't seen by the system anymore: > atacontrol list > ATA channel 0: > Master: acd0 ATA/ATAPI revision 0 > Slave: no device present > ATA channel 1: > Master: no device present > Slave: no device present > ATA channel 2: > Master: no device present > Slave: no device present > ATA channel 3: > Master: ad6 Serial ATA v1.0 > Slave: no device present The (S)ATA controller and devices is being detected at startup as: > +atapci0: port > +atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f > +ad4: 78533MB at ata2-master SATA150 > +ad6: 78533MB at ata3-master SATA150 > +GEOM_MIRROR: Device gm0s1 created (id=613166686). > +GEOM_MIRROR: Device gm0s1: provider ad4s1 detected. > +GEOM_MIRROR: Device gm0s2 created (id=91558579). > +GEOM_MIRROR: Device gm0s2: provider ad4s2 detected. > +GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. > +GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. > +GEOM_MIRROR: Device gm0s1: provider ad4s1 activated. > +GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. > +GEOM_MIRROR: Device gm0s2: provider ad6s2 detected. > +GEOM_MIRROR: Device gm0s2: provider ad6s2 activated. > +GEOM_MIRROR: Device gm0s2: provider ad4s2 activated. > +GEOM_MIRROR: Device gm0s2: provider mirror/gm0s2 launched. The RAID set is now running degraded. Both systems are running on R 6.0. I know it's more like guesswork, but what might be the reason for these disc errors? Are the discs really dying? When rebooting the system(s) the first disc re-appears for a few days and will disappear again later. The hdu connectors have been checked. Is there something wrong with gmirror, geom or the controller driver? What makes me scratching my head is on both systems just the first disc is dying. I've found postings from one year ago and the conclusion was faulty hardware. Are there any signs for geom or driver problems? `uname -a': FreeBSD GwOsl 6.0-RELEASE FreeBSD 6.0-RELEASE #0: Wed Nov 30 02:41:47 UTC 2005 root@gwosl:/usr/obj/usr/src/sys/GwOsl i386 Greetings, Volker