From owner-freebsd-stable@FreeBSD.ORG Thu Jul 21 16:15:26 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E0C0016A420 for ; Thu, 21 Jul 2005 16:15:26 +0000 (GMT) (envelope-from news649@powersystemsdirect.com) Received: from sccrmhc11.comcast.net (sccrmhc11.comcast.net [204.127.202.55]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A5AA43D4C for ; Thu, 21 Jul 2005 16:15:25 +0000 (GMT) (envelope-from news649@powersystemsdirect.com) Received: from [192.168.1.6] (c-24-1-170-82.hsd1.tx.comcast.net[24.1.170.82]) by comcast.net (sccrmhc11) with ESMTP id <2005072116152301100s4u1ae>; Thu, 21 Jul 2005 16:15:24 +0000 Message-ID: <42DFCA1C.3050406@powersystemsdirect.com> Date: Thu, 21 Jul 2005 11:15:24 -0500 From: Steve User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Paul Mather References: <42DF2A8F.30202@powersystemsdirect.com> <1121955306.7274.19.camel@zappa.Chelsea-Ct.Org> In-Reply-To: <1121955306.7274.19.camel@zappa.Chelsea-Ct.Org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: READ_DMA, WRITE_DMA errors X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2005 16:15:27 -0000 Paul Mather wrote: >One common thread in my case is that >all ran some kind of software RAID (gvinum or gmirror), though not all >of my software RAIDed machines exhibited the DMA problems leading me to >think perhaps it was a hardware/load/disk combination problem. > > I do not use RAID at all, so, not common for me. >Anyway, as well as 5-STABLE, I also run a 6-CURRENT system that suffered >the problem. Happily, after the ATA Mk.III merge, the situation >improved a LOT. I occasionally still get the error reported, but it is >not fatal, unlike before (where the drive would be detached, breaking my >geom_mirror, necessitating a lengthy background rebuild). > > Well, that's good news, I just hope that is a widespread fix, there seems to be different issues, and, hopefully, the rewrite intentionally or unintentionally resolves them all! Sounds like in your case, it's almost 100%. An occasional error (we get watchdog timeouts on network) is not bad as long as it doesn't destroy the FS, obviously, we want zero, but, things happen. It's quite conceivable that 1 error per day IS a hardware issue. But, in our case, with 4 machines and the corruption, not the case! Steve