Date: Mon, 20 Sep 2004 13:25:39 -0700 From: "Kevin Oberman" <oberman@es.net> To: current@freebsd.org Subject: Re: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=207594611 Message-ID: <20040920202539.10F925D0A@ptavv.es.net>
next in thread | raw e-mail | index | archive | help
Well, I spent the weekend building systems and kernels and I can now be pretty sure that this is a timing related issue. I had previously reported that I could create the problem by starting my xl Ethernet card. I have since learned that the issues are not closely coupled but are a problem with the Ethernet triggering the problem with the disks. First, the problem I am seeing with the xl is long standing. It appeared on about June 30 or July 1. (It will take a few more kernels to track it down further and I m on the road for a few days and can't play with the system.) But even prior to that date, if I disable ACPI, the same behavior shows up. (Dead Ethernet and continual 'xl0: watchdog timeout" messages.) I ave no idea when the problem started when ACPI is disabled. The xl0 problems causes the system to pause and, after some changes to the kernel in late July or early August, the problems with ATA joined the xl0 problem. If i turn off xl0 (and, probably if the xl0 problem was fixed), the disk errors go away. Because of this, I suspect that the added delays cause by the xl0 timeouts are actually triggering the ATA timeouts. Since others are seeing the same error under heavy load, I imagine that other things can trigger the same DMA timeouts on ATA. When I get home, I'll try to figure out exactly which patch is causing the problem with the xl and then go after the patch that caused the ata error. I can say that it shows up earlier (with the xl to trigger it) than others have reported. I think it started in early August, but I ma sure it was present in RELENG_5 by August 15 and was probably present when RELENG_5 was branched. Sorry that I ran out of time before I could track this down better, but I hope this helps and I'll continue tracking the exact failures when I get home. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040920202539.10F925D0A>