From owner-freebsd-current@FreeBSD.ORG Fri Feb 18 15:03:36 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2FF4716A4CE for ; Fri, 18 Feb 2005 15:03:36 +0000 (GMT) Received: from csa.cs.okstate.edu (a.cs.okstate.edu [139.78.113.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id E218B43D41 for ; Fri, 18 Feb 2005 15:03:35 +0000 (GMT) (envelope-from lreid@a.cs.okstate.edu) Received: by csa.cs.okstate.edu (Postfix, from userid 601) id 7B4A9A063E; Fri, 18 Feb 2005 09:03:35 -0600 (CST) To: freebsd-current@freebsd.org Received: from 164.58.79.196 (auth. user lreid@a.cs.okstate.edu) by cs.okstate.edu with HTTP; Fri, 18 Feb 2005 09:03:35 -0600 X-IlohaMail-Blah: lreid@a.cs.okstate.edu X-IlohaMail-Method: mail() [mem] X-IlohaMail-Dummy: moo X-Mailer: IlohaMail/0.8.12 (On: cs.okstate.edu) From: "Reid Linnemann" Bounce-To: "Reid Linnemann" Errors-To: "Reid Linnemann" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Message-Id: <20050218150335.7B4A9A063E@csa.cs.okstate.edu> Date: Fri, 18 Feb 2005 09:03:35 -0600 (CST) Subject: ad WRITE_DMA timing out frequently X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Feb 2005 15:03:36 -0000 I've recently brought a machine up from 5.3-STABLE to 6-CURRENT. It usually just sits in the corner and runs services, but lately I've come home form work or woken up to find that it is completely unresponsive, and I have to hard reset the machine. It happens at least once a day, and it's becoming more and more frequent. When I look at the console, I always have the same 4 messages before the failure: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2085599 ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=2085599 kernel: ad0: FAILURE - WRITE_DMA timed out kernel: g_vfs_done():ad0s1d[WRITE(offset=52772864, length=16384)]error = 5 It seems to me that a sector on the disk might be dead in the ad0s1d slice (/var), but I want to be certain before I take further steps that the behavior I'm experiencing is positively unrelated to the migration to 6-CURRENT. I started poking around /var to see if anything was amiss, and I found that mail messages are being stacked up in /var/spool/clientmqueue, even though nothing should be using the msp queue (I've redirected periodic outputs to logfiles). In the last daily run mailed to root in January, I found records in the submit queue that looked like this: j0EDINHh049826 2489 Fri Jan 14 07:18 MAILER-DAEMON (Deferred: Permission denied) There were nearly 500 of them. Even after redirecting periodic output to logs and clearing out the client mail queue, this continues to happen, and I have a hunch that it may be related to the WRITE_DMA timeouts, as it's the only weird behavior I can see on /var. If anyone can help me shed some light on this, I'd appreciate it. I've had 2 IDE drives die in this machine already, I'm going to be severely depressed if I've killed a third. -Reid