From owner-freebsd-stable@FreeBSD.ORG Wed Apr 4 13:21:53 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BF17216A408 for ; Wed, 4 Apr 2007 13:21:53 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [82.208.36.70]) by mx1.freebsd.org (Postfix) with ESMTP id 4F1AD13C448 for ; Wed, 4 Apr 2007 13:21:53 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 0005819E03B; Wed, 4 Apr 2007 15:21:51 +0200 (CEST) Received: from [192.168.1.2] (grimm.quip.cz [213.220.192.218]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTP id 48E9619E038; Wed, 4 Apr 2007 15:21:46 +0200 (CEST) Message-ID: <4613A66A.50204@quip.cz> Date: Wed, 04 Apr 2007 15:21:46 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: andrej@antiszoc.hu References: <58209.195.70.43.76.1175680466.squirrel@duloc.webmedia.hu> In-Reply-To: <58209.195.70.43.76.1175680466.squirrel@duloc.webmedia.hu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: Sun Fire X2100 SATA problem [was - sun x2100 gmirror problem] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Apr 2007 13:21:53 -0000 andrej@antiszoc.hu wrote: > Hi, > > We're using gmirror on our sun fire x2100 and FreeBSD 6.1-p10. Some days > ago I found this in the logs: > > Apr 1 02:12:05 x2100 kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error > (retrying request) LBA=612960533 > Apr 1 02:12:05 x2100 kernel: ad6: FAILURE - WRITE_DMA48 > status=51 error=10 LBA=612960533 > Apr 1 02:12:05 x2100 kernel: GEOM_MIRROR: Request failed (error=5). > ad6[WRITE(offset=313835792896, length=4096)] > Apr 1 02:12:05 x2100 kernel: GEOM_MIRROR: Device gm0: provider ad6 > disconnected. > > Normally it looks like a disk error, but I think our half year old disks > (WD RE2) shouldn't fail after this short time. Of course they have moving > parts so they MAY fail. :( Yesterday I tried to reinit the sata channel > and insert the disk back into the mirror. I got this: > > Apr 3 23:00:32 x2100 kernel: GEOM_MIRROR: Device gm0: provider ad6 detected. > Apr 3 23:00:32 x2100 kernel: GEOM_MIRROR: Device gm0: rebuilding provider > ad6. > Apr 3 23:00:36 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error > (retrying request) LBA=245760 > Apr 3 23:00:38 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error > (retrying request) LBA=392576 > Apr 3 23:00:38 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error > (retrying request) LBA=392960 > Apr 3 23:00:53 x2100 kernel: ad6: FAILURE - device detached > > After this, the disk disappeared from the sata channel completely. > > The wierd is that we used the onboard nvidia-raid and the very same error > occured, but there was no report in the kernel the machine just don't > asked for operating system. Later I found out that the disk was forgotten > ~2 weeks before that reboot (data was ~2 week old on it). Otherwise that > "forgotten/failed" disk was also half year old and was fine without a > problem. > > Is there anybody who experienced something similar with SUN X2100 or any > other servers running FreeBSD 6 and sata? > > Regards, > Andras Hi, I can confirm your problem. I have same problem on one X2100 but not on the others. Currenty I have 4 X2100 machines, but only one with this strange problem. The problem is not caused by HDD it self, I tried to replace it with brand new and same error appears after few days. May be there are some problems with cables / connectors or something on mainboard. I am well known by problems with SATA(n) disk drives problems / disappearing on this list and local (czech) mailing list. I had similar problems on ASUS boards with Intel chipsets... so in my point of view - there is something bad with SATA in general. I never had problem like this with old good ATA drives. I have not solution for this problem. Disk is OK after reboot for a few dasy or weeks... if there is somebody which can help with investigating this kind of problem, I'll be happy to cooperate. output of dmesg, smartctl, gmirror etc.: http://www.quip.cz/1/freebsd/sata-hdd-problems/2007-03-07_errors_ad6.txt Miroslav Lachman