Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Aug 2004 20:05:05 -0400
From:      Garance A Drosihn <drosih@rpi.edu>
To:        re@freebsd.org, current@freebsd.org
Subject:   Re: 5.3-RELEASE TODO - make/kqueue
Message-ID:  <p06110427bd56b2ed7525@[128.113.24.47]>
In-Reply-To: <200408271337.i7RDbXgu052801@pooker.samsco.org>
References:  <200408271337.i7RDbXgu052801@pooker.samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
At 7:37 AM -0600 8/27/04, Scott Long wrote:
>
>Testing focuses for 5.3-RELEASE

And update on Issue:

>  |---------------------------------+
>  | make -DUSE_KQUEUE causes lockup |
>  | with buildworld -jBIGNUM        |
>  |---------------------------------+

The description says:

>  |-------------------+---------------+--------------+------------|
>  |  Attempts to use make(1) with KQueues appears to result in a  |
>  |  kernel hang under "heavy load". It would be desirable to fix |
>  |  this both from the perspective of building FreeBSD quickly   |
>  |  as a developer, but also because it's an instability that    |
>  |  could show up under other high load and heavy use of         |
>  |  KQueues. See PR kern/57945 for a proposed patch and details. |
>  |  This appear to be the product of a locking problem, and must |
>  |  be fixed for 5.3.                                            |
>  |-------------------+---------------+--------------+------------|

I have done many buildworlds using the WITH_KQUEUE make over the
past week.  I have done at least 50 buildworlds in my dual-proc
Althon machine, with -j ranging from 3 to 15.  I have not seen any
lockups since the fix for IPI deadlocks went in.

I do still get the "*** Signal 6"s, even though I am now running
with v1.76 of src/sys/kern/kern_lock.c.  Actually I had updated
that one source file, expecting to get revision 1.75 (and thus
backout revision 1.74), as recently mentioned by Doug White.  I
just now realized that I ended up with 1.76...  I guess I should
try it one more time with 1.75 instead of 1.76.

One observation which is perhaps interesting.  I also modified
sys/kern/kern_sig.c so that it prints out a message to the console
whenever kill() or killpg1() is called with a SIGABRT.  I tested
that change, and it seems to work correctly with programs caling
kill(SIGABRT), abort(), or raise(SIGABORT).  However, when my
buildworld dies with `make' claiming it saw a Signal 6, these
printf's in kern_sig.c are never triggered.

This failure is "eventually repeatable" for me, in that I can
trigger it within 10 buildworlds.  And *seems* that it only
happens if I am also running a "folding-at-home" client at the
same time.  That client program is a Linux ELF binary, so maybe
that is significant.   Or maybe it's a red herring.

-- 
Garance Alistair Drosehn            =   gad@gilead.netel.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?p06110427bd56b2ed7525>