From owner-freebsd-current@FreeBSD.ORG Fri Dec 3 02:31:41 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 98A9F16A4CE for ; Fri, 3 Dec 2004 02:31:41 +0000 (GMT) Received: from smtp2.server.rpi.edu (smtp2.server.rpi.edu [128.113.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2C52043D45 for ; Fri, 3 Dec 2004 02:31:41 +0000 (GMT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp2.server.rpi.edu (8.13.0/8.13.0) with ESMTP id iB32VW1d002232; Thu, 2 Dec 2004 21:31:32 -0500 Mime-Version: 1.0 Message-Id: Date: Thu, 2 Dec 2004 21:31:31 -0500 To: freebsd-current@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="iso-8859-1" ; format="flowed" Content-Transfer-Encoding: quoted-printable X-CanItPRO-Stream: default X-RPI-SA-Score: undef - spam-scanning disabled X-Scanned-By: CanIt (www . canit . ca) cc: =?iso-8859-1?Q?S=F8ren_Schmidt?= Subject: Another twist on WRITE_DMA issues... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Dec 2004 02:31:41 -0000 In a different thread, I (garance) wrote: > >At 10:30 PM +0100 11/18/04, S=F8ren Schmidt wrote: >>Garance A Drosihn wrote: >> >>>I am trying to pin down problems "FAILURE - WRITE_DMA timed out" >>>in a new PC that I have. I had some local shop build this for me, >>>and apparently there were "a few" miscommunications in what I >>>thought I asked for, and what they actually built. >>> >>>The machine ended up with two SATA controllers: >>> atapci0: -- on the motherboard >>> atapci1: -- on a PCI card >> >>I think its the other way around, the VIA chip is part of the >>motherboard chipset, the SiI is a "loose" PCI compatible chip. > >Ugh. You are correct. Somewhere along the line I got the two >mixed up. So now have I removed the PCI-based SATA card, and >connected the Western Digital hard disk to the on-board SATA. >I have just done a complete buildworld/installworld cycle for >5.3-STABLE. I did not see a single WRITE_DMA time-out message. So far so good. >But looking around the web for awhile, it looks like this model of >Western Digital is not a native SATA drive. So I think I will >replace it just to avoid any further hassles, even though I did not >get any errors with this drive once I was using the right controller. I have now switched from that Western Digital drive to a Seagate Barracuda 7200.7 120-gig (ST3120026AS). The drive seems to be working fairly well, but now I sometimes see some combination like the following three lines: Dec 2 20:29:50 kernel: Interrupt storm detected on "irq20: atapci0"; throttling interrupt source Dec 2 20:29:54 kernel: ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=3D20627679 Dec 2 20:29:54 kernel: ad4: FAILURE - WRITE_DMA timed out Where atapci0: And ad4: 114473MB [232581/16/63] at ata2-master SATA150 This does not come up often, and it usually doesn't cause any noticeable problem. As it luck would have it, the one time it has caused problems is during installworlds. I just did 18 buildworlds in a row without any problem. I built and installed the new kernel, rebooted into single-user, and the system paniced early in the installworld. I rebooted into single-user again, and this time it was *almost* finished with installworld when the system simply hung after a "ad4: FAILURE - WRITE_DMA timed out" message. Now I'm back up in multi-user mode, and I just completed another buildworld without any problem. I did get the above set of messages, but nothing after that. (I did see several sets of WRITE_DMA error messages during the installworlds). This is on a recent snapshot of 5.3-stable. Should I just switch back to the western digital? Or is it that the new disk is fast enough that the kernel *thinks* something is wrong with it, and starts throttling it? Or maybe I have a bad SATA cable? If it wasn't for the panics/hangs during installworld, I would think that everything was working quite well. Of course, that is about the worst time to be getting system panics! I tried getting a core dump of the panic, but 'call doadump' complained that no dump device had been set. I'm now looking at /etc/rc.d/dumpon so I should know how to set that up the next time I'm in single-user mode. -- Garance Alistair Drosehn =3D gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu