Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jan 2007 09:10:12 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Sergey Zaharchenko <doublef-ctm@yandex.ru>
Subject:   Re: nve related LOR triggered by lots of small packets, and a hard hang
Message-ID:  <200701100910.13167.jhb@freebsd.org>
In-Reply-To: <20070110120731.GA1515@shark.localdomain>
References:  <20070110120731.GA1515@shark.localdomain>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 10 January 2007 07:07, Sergey Zaharchenko wrote:
> Hello -current,
> 
> While chasing that smbfs recursive locking thing, I decided to try
> copying a large amount of small files (/usr/src actually) to an SMB
> share to which I am connected by an NVIDIA nForce MCP2 card. I have come
> across a lock order reversal which seems related to the card. First,
> some files are copied, then I see the following kernel messages, some
> more files are copied, and then the system hangs without responding to
> the keyboard or anything.
> 
> : lock order reversal:
> :  1st 0xc3629f00 inp (tcpinp) @ /src/usr.src/sys/netinet/tcp_usrreq.c:801
> :  2nd 0xc0a9feec tcp (tcp) @ /src/usr.src/sys/netinet/tcp_input.c:626
> : KDB: stack backtrace:
> : db_trace_self_wrapper(c0950c60) at db_trace_self_wrapper+0x25
> : kdb_backtrace(0,ffffffff,c0a612a8,c0a612d0,c09f8e84,...) at kdb_backtrace+0x29
> : witness_checkorder(c0a9feec,9,c095ec63,272) at witness_checkorder+0x586
> : _mtx_lock_flags(c0a9feec,0,c095ec63,272,0,...) at _mtx_lock_flags+0x84
> : tcp_input(c32df800,14,c3300800,100a8c0,0,...) at tcp_input+0x432
> : ip_input(c32df800) at ip_input+0x5a6
> : netisr_dispatch(2,c32df800,0,c32c5000,c3300800,...) at netisr_dispatch+0x58
> : ether_demux(c32c5000,c32df800,c32caed8,c32df800,dd1757d4,...) at ether_demux+0x28a
> : ether_input(c32c5000,c32df800,c32caed8,0,c0970133,...) at ether_input+0x202
> : nve_ospacketrx(c32cae00,dd175810,1,0,0,...) at nve_ospacketrx+0xd9
> : UpdateReceiveDescRingData(c08981a4,c08981c4,c0898260,c089828c,c08982a4,...) at UpdateReceiveDescRingData+0x2f8
> : nve_osalloc(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at nve_osalloc
> : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600
> : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680
> : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600
> : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680
> 
> The last 2 strings repeat themselves a lot of times (kdb seems to have a
> limit of 1024 stack trace strings, which came in very helpful). No info
> about the actual hang... The LOR looks like #009
> (http://sources.zabbadoz.net/freebsd/lor/009.html), but is different
> actually. Any ideas? BTW, what is _end?

_end may hint to being out in a kernel module, though ddb usually can handle
those fine.  I think your stack is busted somehow though as nve_osalloc()
doesn't call UpdateReceiveDescRingData(), and the first lock is acquired
in tcp_usr_send() (userland is sending data on a tcp socket).  Somehow
the nve driver has decided to handle receiving a packet and re-entering
the stack leading to the LOR.  Have you tried using nfe(4)? :)

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701100910.13167.jhb>