From owner-freebsd-stable@FreeBSD.ORG Thu Jul 21 14:15:22 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B6EA816A424 for ; Thu, 21 Jul 2005 14:15:22 +0000 (GMT) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id C825043D92 for ; Thu, 21 Jul 2005 14:15:15 +0000 (GMT) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (pool-151-199-7-31.ROA.east.verizon.net [151.199.7.31]) by gromit.dlib.vt.edu (8.13.3/8.13.3) with ESMTP id j6LEFDR8035566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 21 Jul 2005 10:15:14 -0400 (EDT) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (localhost.Chelsea-Ct.Org [127.0.0.1]) by zappa.Chelsea-Ct.Org (8.13.4/8.13.4) with ESMTP id j6LEF74O007530 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 21 Jul 2005 10:15:07 -0400 (EDT) (envelope-from paul@gromit.dlib.vt.edu) Received: (from paul@localhost) by zappa.Chelsea-Ct.Org (8.13.4/8.13.4/Submit) id j6LEF7e3007529; Thu, 21 Jul 2005 10:15:07 -0400 (EDT) (envelope-from paul@gromit.dlib.vt.edu) From: Paul Mather To: Steve In-Reply-To: <42DF2A8F.30202@powersystemsdirect.com> References: <42DF2A8F.30202@powersystemsdirect.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 21 Jul 2005 10:15:06 -0400 Message-Id: <1121955306.7274.19.camel@zappa.Chelsea-Ct.Org> Mime-Version: 1.0 X-Mailer: Evolution 2.2.3 FreeBSD GNOME Team Port Cc: freebsd-stable@freebsd.org Subject: Re: READ_DMA, WRITE_DMA errors X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2005 14:15:23 -0000 On Wed, 2005-07-20 at 23:54 -0500, Steve wrote: > I've found tons of emails, news messages, listserv messages, and even > some bug reports of this seemingly common error. > > So, I had been running 5.2 on a server, and, updated to 5.3. Got the > READ_DMA and WRITE_DMA error and retries. So, figuring it might be a bad > update, took a new drive. put it in, loaded 5.4 for grins, and, same > issue, lots of these errors, eventually destroying the FS. Played around > with various settings, no avail. So, took it back, got different box, > everything new. Same problem, new install of 5.4 > > So, took it back, got another with another MB (different model), but, > same maker (ASUS). Didn't have endless time to spend on production > machine. Sure enough, same problem. It's an ASUS A7V880. Controller is > SATA VT8237. Played around with tons of settings, eventually, after > reading various messages out there, discovered one that resolved the > problem. Had to set hw.ata.ata_dma="0". Of course, there is the obvious > downside to that! Speed! > > But it stinks to have "decent" hardware, yet, have to cripple the > machine. The place I got the equipment at runs ASUS only and has > thousands of them running under other OSes. Wished I had stayed with the > old FreeBSD version and old hardware now. I have not seen anyone that > has ever said the problem was being (or had been) solved though. I see > the bug reports, I take it no one has actually pinpointed the problem > though. BUT, I do hope it is understood that this is fairly widespread, > for me, the likelihood of 3 pcs, 2 different MB models, and, *complete* > new hardware for each of the 3 pcs kind of rules out hardware being > broken, might be badly designed, but, certainly not defective hardware. > > I do hope someone can eventually figure this out, seems to be extremely > common, and, definitely a problem for a stable release named 5.4. I was one of the people who suffered from and reported this "seemingly common error." On the systems that encountered problems, none had particularly obscure or cutting-edge hardware (e.g., Intel PIIX4 ATA controller on the motherboard). One common thread in my case is that all ran some kind of software RAID (gvinum or gmirror), though not all of my software RAIDed machines exhibited the DMA problems leading me to think perhaps it was a hardware/load/disk combination problem. Quite obviously, not all PIIX4 controller users were having this happen, and so the "it doesn't happen to me" factor might have contributed to the general notion that this was probably "operator error" or something like that, and dismissed. Anyway, as well as 5-STABLE, I also run a 6-CURRENT system that suffered the problem. Happily, after the ATA Mk.III merge, the situation improved a LOT. I occasionally still get the error reported, but it is not fatal, unlike before (where the drive would be detached, breaking my geom_mirror, necessitating a lengthy background rebuild). So, I consider the ATA Mk. III rewrite to have "fixed" the problem I had. It may be, then, that those upgrading to the upcoming 6.0-RELEASE (when it appears) might also find their ATA DMA problems solved, too. As for 5.x, I track -STABLE, and have noticed slight improvements regarding the DMA TIMEOUT problem. If you only run -RELEASE, you might miss these ongoing improvements that crop up from time to time. Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa