Date: Sat, 28 Aug 2004 20:05:05 -0400 From: Garance A Drosihn <drosih@rpi.edu> To: re@freebsd.org, current@freebsd.org Subject: Re: 5.3-RELEASE TODO - make/kqueue Message-ID: <p06110427bd56b2ed7525@[128.113.24.47]> In-Reply-To: <200408271337.i7RDbXgu052801@pooker.samsco.org> References: <200408271337.i7RDbXgu052801@pooker.samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
At 7:37 AM -0600 8/27/04, Scott Long wrote: > >Testing focuses for 5.3-RELEASE And update on Issue: > |---------------------------------+ > | make -DUSE_KQUEUE causes lockup | > | with buildworld -jBIGNUM | > |---------------------------------+ The description says: > |-------------------+---------------+--------------+------------| > | Attempts to use make(1) with KQueues appears to result in a | > | kernel hang under "heavy load". It would be desirable to fix | > | this both from the perspective of building FreeBSD quickly | > | as a developer, but also because it's an instability that | > | could show up under other high load and heavy use of | > | KQueues. See PR kern/57945 for a proposed patch and details. | > | This appear to be the product of a locking problem, and must | > | be fixed for 5.3. | > |-------------------+---------------+--------------+------------| I have done many buildworlds using the WITH_KQUEUE make over the past week. I have done at least 50 buildworlds in my dual-proc Althon machine, with -j ranging from 3 to 15. I have not seen any lockups since the fix for IPI deadlocks went in. I do still get the "*** Signal 6"s, even though I am now running with v1.76 of src/sys/kern/kern_lock.c. Actually I had updated that one source file, expecting to get revision 1.75 (and thus backout revision 1.74), as recently mentioned by Doug White. I just now realized that I ended up with 1.76... I guess I should try it one more time with 1.75 instead of 1.76. One observation which is perhaps interesting. I also modified sys/kern/kern_sig.c so that it prints out a message to the console whenever kill() or killpg1() is called with a SIGABRT. I tested that change, and it seems to work correctly with programs caling kill(SIGABRT), abort(), or raise(SIGABORT). However, when my buildworld dies with `make' claiming it saw a Signal 6, these printf's in kern_sig.c are never triggered. This failure is "eventually repeatable" for me, in that I can trigger it within 10 buildworlds. And *seems* that it only happens if I am also running a "folding-at-home" client at the same time. That client program is a Linux ELF binary, so maybe that is significant. Or maybe it's a red herring. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?p06110427bd56b2ed7525>