From owner-freebsd-current@FreeBSD.ORG Fri Mar 17 21:54:20 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F6DD16A400 for ; Fri, 17 Mar 2006 21:54:20 +0000 (UTC) (envelope-from brian@aljex.com) Received: from s1tank.virtdom.com (s1tank.virtdom.com [216.240.101.50]) by mx1.FreeBSD.org (Postfix) with SMTP id CDB4143D45 for ; Fri, 17 Mar 2006 21:54:19 +0000 (GMT) (envelope-from brian@aljex.com) Received: (qmail 46139 invoked by uid 89); 17 Mar 2006 22:37:09 -0000 Received: from ool-43552092.dyn.optonline.net (HELO venti) (brian@aljex.com@67.85.32.146) by s1tank.virtdom.com with SMTP; 17 Mar 2006 22:37:09 -0000 Message-ID: <016801c64a0d$570d7d90$6b00000a@venti> From: "Brian K. White" To: Date: Fri, 17 Mar 2006 16:53:51 -0500 Organization: Aljex Software MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2670 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670 Subject: athlon-xp + fakeraid regression X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Mar 2006 21:54:20 -0000 I don't know who might be interested in this or what the most generally acceptable fix will be, "thats a bug we should know about" or just "don't do that". Sometime between 6.0-snap011 and 6.1beta4, something broke CPUTYPE=athlon-xp I would normally be satisfied with simply "don't use athlon-xp" but, since it used to work fine, I see it as a regression and any regression should be reported. Or, forget the regression, consider it evidence instead to more strongly depreciate -mcpu/-march = athlon-xp. One desktop machine has an athlon-xp-2200 and a highpoint rocketraid133 pata raid with two identical 60 gig ata100 dirves in a raid0 array. 6.0-snap011 and several preceeding versions all install directly to this card without a hitch even though it's not a real hardware raid and even though the entire drive is a raid0 striping array with no boot partition that's outside the array etc.. and without having to know how to even spell "gmirror" or "vinum". It's beautiful, just select "ar0" in sysinstall easy as pie. For the record, Linux can't do this on these same exact cards. 6.1beta4 installs and runs just fine on it too. The problem only comes when I put CPUTYPE=athlon-xp in make.conf and build a new kernel. The build completes fine, the kernel boots fine, the machine will seem to be fine as long as it remains quiescent. but type make anything, including the same "make buildkernel" that just suceeded a few minutes ago, and you get this: # make buildworld Interrupt storm detected on "irq10"; throttling interrupt source ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=88099699 ad4: TIMEOUT - READ_DMA retrying (0 retries left) LBA=88099699 ad4: FAILURE - READ_DMA timed out LBA=88099699 ar0: FAILURE - RAID0 array broken g_vfs_done():ar0s1a[WRITE(offset=6144000, length=10240)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=65536, length=2048)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=65536, length=2048)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=6144000, length=10240)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=18883952640, length=16384)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=18884395008, length=16384)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=18885148672, length=16384)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=19848953856, length=16384)]error = 5 g_vfs_done():ar0s1a[WRITE(offset=60120072192, length=16384)]error = 5 ad4: WARNING - WRITE_DMA taskqueue timeout - completeing request directly ad4: WARNING - WRITE_DMA freeing taskqueue zombie request At this piont the box is 99% locked. The console reacts to the keyboard (screensaver comes up, spacebar makes it go away, ctrl-t, alt-fn etc work) and it responds to pings. But no programs work (no new shell or login prompts from pressing enter, getty doesn't see keystrokes, no network services work) At the beginning , just after hitting enter on the make command, one of the ad4 disk light goes on solid for several seconds. Those messages look just like messages I've seen before and know a little about. There is a well known thing where these cheap pata fakeraid cards will try to do ata133 if the drive says it can, when really, even if he drives are new ata133 drives and the cables are new and short and shielded, you still shouldn't try to do ata133 since the spec is too tight and you'll just get bit errors or other failures. I have several similar boxes with both ITE and HighPoint pata fakeraid chips and have seen in almost every case that if I don't disable dma entirely in /boot/loader.conf(1), then either immediately or eventually, you get WRITE_DMA errors from the ata driver and the box reboots. The fix is use ata100 somehow, either by disabling dma entirely in loader.conf (since you have no more selective option there, and the raid card bios never has an option for controlling pio/dma mode like motherboard bios's have) and then use atacontrol in rc.early to set udma5, or by using disks that can only do ata100 and only advertise ata100 to the controller. That's just to show I know about that issue and it's not that. Not simply/only that anyway :) This machine first off, only has ata100 disks and already runs in udma5 without needing to be forced. 2nd, it ran 6.0-snap-011 from the day it came out, with athlon-xp, performed a lot of big makes and rsyncs etc until yesterday without a problem. So it's not a case of "well maybe this machine should be forced even slower to ata66?" Reinstall 6.1beta4 on the above box and don't use CPUTYPE=athlon-xp, and everything is fine. reliable builds and other heavy disk activity. I also just built a kernel with CPUTYPE=i686 and removed I486_CPU & I586_CPU, and it's doing a buildworld just fine under that kernel right now. Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/ +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!