From owner-freebsd-stable@FreeBSD.ORG Tue Aug 9 03:28:25 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0A92316A41F; Tue, 9 Aug 2005 03:28:25 +0000 (GMT) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [64.7.153.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 37E9843D46; Tue, 9 Aug 2005 03:28:24 +0000 (GMT) (envelope-from mike@sentex.net) Received: from pumice3.sentex.ca (pumice3.sentex.ca [64.7.153.26]) by smarthost1.sentex.ca (8.13.3/8.13.3) with ESMTP id j793RSP0097686; Mon, 8 Aug 2005 23:27:28 -0400 (EDT) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by pumice3.sentex.ca (8.13.3/8.13.3) with ESMTP id j793SMIF067229; Mon, 8 Aug 2005 23:28:22 -0400 (EDT) (envelope-from mike@sentex.net) Received: from simian.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.13.3/8.13.3) with ESMTP id j793SKSX026113 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 8 Aug 2005 23:28:21 -0400 (EDT) (envelope-from mike@sentex.net) Message-Id: <6.2.1.2.0.20050808232304.03deb4b8@64.7.153.2> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Mon, 08 Aug 2005 23:30:28 -0400 To: "O. Hartmann" , freebsd-stable@freebsd.org, freebsd-questions@freebsd.org From: Mike Tancsa In-Reply-To: <42F7F7E8.1020507@mail.uni-mainz.de> References: <42F7F7E8.1020507@mail.uni-mainz.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Virus-Scanned: by amavisd-new X-Scanned-By: MIMEDefang 2.51 on 64.7.153.18 X-Scanned-By: MIMEDefang 2.51 on 64.7.153.26 Cc: Subject: Re: ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2005 03:28:25 -0000 At 08:25 PM 08/08/2005, O. Hartmann wrote: >Hello. > >My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 boxed >(see dmesg). >One of my SATA disks, the SAMSUNG SP2004C seems to show errors during >operation (and also showd under 5.4-RELEASE-p3). >Sometimes I get this error: >ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >while the machine still keeps working. >Other days the box crashes completely. > >Is this a operating system bug or is this message an evidence of defective >hardware? You can probably confirm a hardware issue with the smartmon tools. (/usr/ports/sysutils/smartmontools). It was quite handy the other day for us to narrow down a problem between a drive tray and the actual drive. We started to see Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=391423 Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2304319 Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2312927 Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2308639 Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2309855 Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2348359 Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=1528639 Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=1530031 Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=1528639 Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: (error=5) bp 0xd630b4fc vp 0xc2640d68 Yet when we read the actual error info off the drive via smartctl -a ad0, it was clean. So it pointed to the drive tray which we swapped and all was well. In other situations however, the smart info will often tell you if the drive is starting to fail. Its not 100% reliable, but since we started using it, it generally gave us some sort of heads up as to whether or not a drive is in trouble. ---Mike