From owner-freebsd-stable@FreeBSD.ORG Sat Jul 23 04:16:47 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 776F616A41F for ; Sat, 23 Jul 2005 04:16:47 +0000 (GMT) (envelope-from karl@FS.denninger.net) Received: from FS.denninger.net (wsip-68-15-213-52.at.at.cox.net [68.15.213.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6E18D43D48 for ; Sat, 23 Jul 2005 04:16:46 +0000 (GMT) (envelope-from karl@FS.denninger.net) Received: from fs.denninger.net (localhost [127.0.0.1]) by FS.denninger.net (8.13.3/8.13.1) with SMTP id j6N4Gjpa002731 for ; Fri, 22 Jul 2005 23:16:45 -0500 (CDT) (envelope-from karl@FS.denninger.net) Received: from fs.denninger.net [127.0.0.1] by Spamblock-sys (LOCAL); Fri Jul 22 23:16:45 2005 Received: (from karl@localhost) by FS.denninger.net (8.13.3/8.13.1/Submit) id j6N4GiKs002729 for freebsd-stable@freebsd.org; Fri, 22 Jul 2005 23:16:44 -0500 (CDT) (envelope-from karl) Date: Fri, 22 Jul 2005 23:16:44 -0500 From: Karl Denninger To: freebsd-stable@freebsd.org Message-ID: <20050723041644.GA2607@FS.denninger.net> Mail-Followup-To: freebsd-stable@freebsd.org References: <6.2.1.2.0.20050721153750.0851fab0@64.7.153.2> <20050721202234.GA62615@FS.denninger.net> <20050722004340.H16902@fledge.watson.org> <20050722001253.GA70277@FS.denninger.net> <20050722013605.U16902@fledge.watson.org> <20050722010611.GA72234@FS.denninger.net> <42E0F93E.7000108@commit.it> <20050722194009.GA95692@FS.denninger.net> <20050722195357.GB95692@FS.denninger.net> <20050723025300.GY24353@ratchet.nebcorp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050723025300.GY24353@ratchet.nebcorp.com> User-Agent: Mutt/1.4.2.1i Organization: Karl's Sushi and Packet Smashers X-Die-Spammers: Spammers cheerfully broiled for supper and served with ketchup! Subject: Re: make -j as a stress test (was: Re: Quality of FreeBSD) [WARNING - 6.0-BETA1 still hosed!] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jul 2005 04:16:47 -0000 On Fri, Jul 22, 2005 at 07:53:00PM -0700, Danny Howard wrote: > On Fri, Jul 22, 2005 at 02:53:57PM -0500, Karl Denninger wrote: > [...] > > Note carefully from this that there is NO ERROR INDICATION AS TO WHY THE > > DISK DETACHED! > > > > At least with the 5.x problems you'd SEE an error before it went BOOM. > > > > This time around, nope - just death. > > > > What's worse, the complaints continue even through a shutdown ... > > While I agree with Karl that introducing instability is a very bad > thing, I guess we now have an answer to Karl's vexation yesterday: > [ http://lists.freebsd.org/pipermail/freebsd-stable/2005-July/017210.html ] > > "What I don't understand Robert is why Soren's code is "too > sensitive" to commit, but the explosive reduction in stability > that the changes made between 4.x and 5.3 caused weren't > enough to back THAT out until it could be fixed." > > The answer would seem to be that when someone actually does test the > untested code, it is even worse than the code we are already upset with. > :) > > Love, > -danny Point taken. Can we get a from the development team that 6.x will go out the door until this problem is identified and FIXED (e.g. the PR I submitted against this early in the year is closed)? The problem is trivially easy to reproduce, as I've pointed out. My hardware is hardly anything special - its a Dell Poweredge 400SC, a rather pedestrian 2.4Ghz P4/HT machine with 512MB of RAM and nothing special in terms of boards in it. Indeed, on the sandbox machine the ONLY cards in the machine are the Adaptec SATA card and a video board! The ICH SATA onboard adapter works fine. No problems, even if you beat the snot out of the disks. Ditto for the onboard PATA channels. ANY PCI SII-chipset SATA card (nothing fancy here, no onboard RAID, just a disk adapter) that I've tried thus far - Bustek or Adaptec - causes trouble in an absolutely reproducable fashion when put under heavy load. If both channels are in use the trouble is immediate and dramatic, although you CAN provoke errors even with only one of the two channels in operation if you can get the I/O load up high enough. Gmirror is great for provoking this as it queues traffic to both channels in a nicely balanced and heavily-utilized fashion, although I'm willing to bet that Gmirror itself is not involved as the actual cause of the problem, since I had trouble once DURING install (before I had put a gmirror'ed config on the disks.) Note that a MIX of read and writes appears to be required - a REBUILD of the disks by Gmirror (which is all writes to those two disks) succeeds. As soon as you have all three subdisks in the array, however, a "make buildworld" produces fireworks. If necessary (or useful) I can give one or more developers a way to log into the sandbox machine here via ssh. I do not have a way to get a serial console on the box, however, so if its blown up in an unrecoverable fashion remotely someone would have to call or IM me to push the big red button. If that's NOT necessary (or desired), then I want to move those two disks back to the production machine as they are how my offsite/offline backups are done - I've no problem with leaving them on the sandbox IF the problem is being actively worked though. -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats! http://genesis3.blogspot.com Musings Of A Sentient Mind