Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Nov 2006 18:47:57 -0700
From:      Scott Long <scottl@samsco.org>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-net <freebsd-net@freebsd.org>, freebsd-stable@freebsd.org, Jack Vogel <jfvogel@gmail.com>
Subject:   Re: Proposed 6.2 em RELEASE patch
Message-ID:  <4557CECD.2000609@samsco.org>
In-Reply-To: <200611130040.kAD0etbp040637@lava.sentex.ca>
References:  <2a41acea0611081719h31be096eu614d2f2325aff511@mail.gmail.com> <200611091536.kA9FaltD018819@lava.sentex.ca> <45534E76.6020906@samsco.org> <200611092200.kA9M0q1E020473@lava.sentex.ca> <200611102004.kAAK4iO9027778@lava.sentex.ca> <2a41acea0611101400w5b8cef40ob84ed6de181f3e2c@mail.gmail.com> <200611102221.kAAML6ol028630@lava.sentex.ca> <455570D8.6070000@samsco.org> <200611120412.kAC4CuIB035746@lava.sentex.ca> <45574ECA.4080207@samsco.org> <200611130040.kAD0etbp040637@lava.sentex.ca>

index | next in thread | previous in thread | raw e-mail

Mike Tancsa wrote:
> At 11:41 AM 11/12/2006, Scott Long wrote:
>> Mike Tancsa wrote:
>>> At 01:42 AM 11/11/2006, Scott Long wrote:
>>> driver.  What will help me is if you can hook up a serial console to
>>>> your machine and see if it can be made to drop to the debugger while it
>>>> is under load and otherwise unresponsive.  If you can, getting a 
>>>> process
>>>> dump might help confirm where each CPU is spending its time.
>>>
>>> ./netblast 192.168.88.218 500 110 1000
>>> I compiled in the various debugging options and on the serial console 
>>> I get a few
>>> Expensive timeout(9) function: 0xc0601e48(0) 0.024135749 s
>>> and the serial console
>>
>> One CPU seems to be stuck in softclock.  My guess here is that there 
>> is significant lock contention.  Please try the following:
>>
>> 1. Use addr2line or gdb on the kernel to find out what function is at
>> 0xc0601e48 (the address that was printed above).  This address will
>> change every time you recompile the kernel, so you'll need to reacquire
>> it from the console each time.
> 
> # addr2line 0xc0601e48 -e kernel.debug -f
> nfsrv_timer
> /usr/src/sys/nfsserver/nfs_srvsock.c:809
> #
> 

Can you try removing NFS from your kernel?

> and
> 
> # addr2line 0xc0561444 -e kernel.debug -f
> sleepq_timeout
> /usr/src/sys/kern/subr_sleepqueue.c:724

This is sched_lock contention.

> 
> I dont have any nfs activity on the box
> 
>> 2. Try compiling in WITNESS and running the test as before, then break
>> into the debugger as before.  Run 'show locks'.  I'm not sure how
>> fruitful this will be, WITNESS might make it unbearably slow.
> 
> It was in that kernel already

So you're seeing the livelock under load while also having WITNESS 
enabled?  Have you tried your test without WITNESS?  What about INVARIANTS?

Scott


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4557CECD.2000609>