Date: Tue, 09 Aug 2005 10:23:50 +0200 From: "O. Hartmann" <ohartman@mail.uni-mainz.de> To: Mike Tancsa <mike@sentex.net> Cc: freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Subject: Re: ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 Message-ID: <42F86816.6070706@mail.uni-mainz.de> In-Reply-To: <6.2.1.2.0.20050808232304.03deb4b8@64.7.153.2> References: <42F7F7E8.1020507@mail.uni-mainz.de> <6.2.1.2.0.20050808232304.03deb4b8@64.7.153.2>
next in thread | previous in thread | raw e-mail | index | archive | help
Mike Tancsa wrote: > At 08:25 PM 08/08/2005, O. Hartmann wrote: > >> Hello. >> >> My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 >> boxed (see dmesg). >> One of my SATA disks, the SAMSUNG SP2004C seems to show errors during >> operation (and also showd under 5.4-RELEASE-p3). >> Sometimes I get this error: >> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >> while the machine still keeps working. >> Other days the box crashes completely. >> >> Is this a operating system bug or is this message an evidence of >> defective hardware? > > > You can probably confirm a hardware issue with the smartmon tools. > (/usr/ports/sysutils/smartmontools). > > It was quite handy the other day for us to narrow down a problem between > a drive tray and the actual drive. We started to see > > Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=391423 > Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2304319 > Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2312927 > Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2308639 > Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2309855 > Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2348359 > Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1530031 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out > Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: > (error=5) bp 0xd630b4fc vp 0xc2640d68 > > Yet when we read the actual error info off the drive via smartctl -a > ad0, it was clean. So it pointed to the drive tray which we swapped and > all was well. In other situations however, the smart info will often > tell you if the drive is starting to fail. Its not 100% reliable, but > since we started using it, it generally gave us some sort of heads up as > to whether or not a drive is in trouble. > > > ---Mike Dear Mike. Thanks a lot for this info. I will use this tool and try to report what I found out. I also use trays for my drives (like I did with SCSI and SCA2 on our servers at the lab). Maybe this could be an issue. Oliver
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42F86816.6070706>