Date: Wed, 10 Jan 2007 09:10:12 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Sergey Zaharchenko <doublef-ctm@yandex.ru> Subject: Re: nve related LOR triggered by lots of small packets, and a hard hang Message-ID: <200701100910.13167.jhb@freebsd.org> In-Reply-To: <20070110120731.GA1515@shark.localdomain> References: <20070110120731.GA1515@shark.localdomain>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 10 January 2007 07:07, Sergey Zaharchenko wrote: > Hello -current, > > While chasing that smbfs recursive locking thing, I decided to try > copying a large amount of small files (/usr/src actually) to an SMB > share to which I am connected by an NVIDIA nForce MCP2 card. I have come > across a lock order reversal which seems related to the card. First, > some files are copied, then I see the following kernel messages, some > more files are copied, and then the system hangs without responding to > the keyboard or anything. > > : lock order reversal: > : 1st 0xc3629f00 inp (tcpinp) @ /src/usr.src/sys/netinet/tcp_usrreq.c:801 > : 2nd 0xc0a9feec tcp (tcp) @ /src/usr.src/sys/netinet/tcp_input.c:626 > : KDB: stack backtrace: > : db_trace_self_wrapper(c0950c60) at db_trace_self_wrapper+0x25 > : kdb_backtrace(0,ffffffff,c0a612a8,c0a612d0,c09f8e84,...) at kdb_backtrace+0x29 > : witness_checkorder(c0a9feec,9,c095ec63,272) at witness_checkorder+0x586 > : _mtx_lock_flags(c0a9feec,0,c095ec63,272,0,...) at _mtx_lock_flags+0x84 > : tcp_input(c32df800,14,c3300800,100a8c0,0,...) at tcp_input+0x432 > : ip_input(c32df800) at ip_input+0x5a6 > : netisr_dispatch(2,c32df800,0,c32c5000,c3300800,...) at netisr_dispatch+0x58 > : ether_demux(c32c5000,c32df800,c32caed8,c32df800,dd1757d4,...) at ether_demux+0x28a > : ether_input(c32c5000,c32df800,c32caed8,0,c0970133,...) at ether_input+0x202 > : nve_ospacketrx(c32cae00,dd175810,1,0,0,...) at nve_ospacketrx+0xd9 > : UpdateReceiveDescRingData(c08981a4,c08981c4,c0898260,c089828c,c08982a4,...) at UpdateReceiveDescRingData+0x2f8 > : nve_osalloc(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at nve_osalloc > : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600 > : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680 > : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600 > : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680 > > The last 2 strings repeat themselves a lot of times (kdb seems to have a > limit of 1024 stack trace strings, which came in very helpful). No info > about the actual hang... The LOR looks like #009 > (http://sources.zabbadoz.net/freebsd/lor/009.html), but is different > actually. Any ideas? BTW, what is _end? _end may hint to being out in a kernel module, though ddb usually can handle those fine. I think your stack is busted somehow though as nve_osalloc() doesn't call UpdateReceiveDescRingData(), and the first lock is acquired in tcp_usr_send() (userland is sending data on a tcp socket). Somehow the nve driver has decided to handle receiving a packet and re-entering the stack leading to the LOR. Have you tried using nfe(4)? :) -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701100910.13167.jhb>