From owner-freebsd-stable@FreeBSD.ORG Tue Aug 9 08:23:35 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E3F1416A41F; Tue, 9 Aug 2005 08:23:35 +0000 (GMT) (envelope-from ohartman@mail.uni-mainz.de) Received: from mailgate1.zdv.Uni-Mainz.DE (mailgate1.zdv.Uni-Mainz.DE [134.93.178.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1E52A43D48; Tue, 9 Aug 2005 08:23:35 +0000 (GMT) (envelope-from ohartman@mail.uni-mainz.de) Received: from [134.93.180.218] (edda.Physik.Uni-Mainz.DE [134.93.180.218]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mailgate1.zdv.Uni-Mainz.DE (Postfix) with ESMTP id F1B613000C5F; Tue, 9 Aug 2005 10:23:33 +0200 (CEST) Message-ID: <42F86816.6070706@mail.uni-mainz.de> Date: Tue, 09 Aug 2005 10:23:50 +0200 From: "O. Hartmann" Organization: Institut =?ISO-8859-1?Q?f=FCr_Geophysik?= User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050722) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Mike Tancsa References: <42F7F7E8.1020507@mail.uni-mainz.de> <6.2.1.2.0.20050808232304.03deb4b8@64.7.153.2> In-Reply-To: <6.2.1.2.0.20050808232304.03deb4b8@64.7.153.2> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by amavisd-new at uni-mainz.de Cc: freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Subject: Re: ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2005 08:23:36 -0000 Mike Tancsa wrote: > At 08:25 PM 08/08/2005, O. Hartmann wrote: > >> Hello. >> >> My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 >> boxed (see dmesg). >> One of my SATA disks, the SAMSUNG SP2004C seems to show errors during >> operation (and also showd under 5.4-RELEASE-p3). >> Sometimes I get this error: >> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >> while the machine still keeps working. >> Other days the box crashes completely. >> >> Is this a operating system bug or is this message an evidence of >> defective hardware? > > > You can probably confirm a hardware issue with the smartmon tools. > (/usr/ports/sysutils/smartmontools). > > It was quite handy the other day for us to narrow down a problem between > a drive tray and the actual drive. We started to see > > Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=391423 > Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2304319 > Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2312927 > Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2308639 > Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2309855 > Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2348359 > Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1530031 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out > Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: > (error=5) bp 0xd630b4fc vp 0xc2640d68 > > Yet when we read the actual error info off the drive via smartctl -a > ad0, it was clean. So it pointed to the drive tray which we swapped and > all was well. In other situations however, the smart info will often > tell you if the drive is starting to fail. Its not 100% reliable, but > since we started using it, it generally gave us some sort of heads up as > to whether or not a drive is in trouble. > > > ---Mike Dear Mike. Thanks a lot for this info. I will use this tool and try to report what I found out. I also use trays for my drives (like I did with SCSI and SCA2 on our servers at the lab). Maybe this could be an issue. Oliver