From owner-freebsd-stable@FreeBSD.ORG Mon Jun 20 10:09:47 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA92616A41C for ; Mon, 20 Jun 2005 10:09:47 +0000 (GMT) (envelope-from freebsd-current@byrnehq.com) Received: from schubert.byrnehq.com (dsl-33-12.dsl.netsource.ie [213.79.33.12]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1D1A143D53 for ; Mon, 20 Jun 2005 10:09:46 +0000 (GMT) (envelope-from freebsd-current@byrnehq.com) Received: from localhost (mauer.directski.com. [212.147.140.194]) by schubert.byrnehq.com (8.13.3/8.13.3) with ESMTP id j5KB9gTD011054 for ; Mon, 20 Jun 2005 11:09:42 GMT (envelope-from freebsd-current@byrnehq.com) Date: Mon, 20 Jun 2005 11:09:53 +0100 From: Tony Byrne Organization: ByrneHQ X-Priority: 3 (Normal) Message-ID: <67335859.20050620110953@byrnehq.com> To: freebsd-stable@freebsd.org In-Reply-To: <8d02aed005061918049c8fd8@mail.gmail.com> References: <8d02aed00506181404642100b9@mail.gmail.com> <42B5DAEA.4040908@nurfuerspam.de> <8d02aed005061918049c8fd8@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ByrneHQ-SA-Hits: 1.455 X-Scanned-By: MIMEDefang 2.51 on 192.168.10.254 Subject: Re[2]: ATA_DMA errors (and fs corruption!) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Tony Byrne List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2005 10:09:48 -0000 Hello twesky, t> atapci0: port t> 0x1860-0x186f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on t> pci0 t> ata0: channel #0 on atapci0 t> ata1: channel #1 on atapci0 t> The last known good stable version for me was aprox April 25, my next t> cvsup was May 17, but I have problems with 5.4 Release so I assume t> (probably incorrectly) that something changed between April 25 and t> 5.4R. t> I don't exactly recall my shutdown errors, but I did have to restore t> my file systems to get my laptop back to a functioning state. We've been seeing the same problem in a server equipped with an Intel ICH5 controller and SATA Hard Disk. The problems seemed to start after an update in mid-May. We noticed that processes such as our imap server would stall for a few seconds and the console would indicate either a READ_DMA or WRITE_DMA timeout. On two occasions the the disk became detached requiring a reboot. The frequency of these timeouts were such that we couldn't do any work with the server. We didn't have this problem prior to the update. We are tracking RELENG_5, but have now reverted to a May 9th kernel, which doesn't seem to be quite so fussy and has reduced the problem to a handful of timeouts every day. What's bugging me is that this list has been very quiet about this problem. The Intel ICH* controllers must be common in the field and I'm surprised that this problem has gone unnoticed. Of course, there can be hardware reasons for timeouts such as a dying disk or cable, but I think we've eliminated these in our case. The disk works fine when transferred to another machine and the SATA cable works fine when used with another disk (albeit one of smaller capacity) in the server. So we've come to the conclusion that it's the combination of controller, disk and FreeBSD version that holds the key to this. Jun 20 10:20:04 roo kernel: atapci0: port 0xffa0-0xffaf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.2 on pci0 Jun 20 10:20:04 roo kernel: ata0: channel #0 on atapci0 Jun 20 10:20:04 roo kernel: ata1: channel #1 on atapci0 ... Jun 20 10:20:04 roo kernel: ad0: 190782MB [387621/16/63] at ata0-master SATA150 Jun 20 10:20:04 roo kernel: acd0: CDROM at ata1-master PIO4 ... Regards, Tony. -- Tony Byrne