Date: Sat, 30 Aug 2008 10:52:08 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: John Baldwin <jhb@FreeBSD.org> Cc: julian@FreeBSD.org, current@FreeBSD.org Subject: Re: rtentry panic with FIB Message-ID: <alpine.BSF.1.10.0808301049420.59527@fledge.watson.org> In-Reply-To: <200808291636.10656.jhb@FreeBSD.org> References: <200808291636.10656.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 29 Aug 2008, John Baldwin wrote: > Unfortunately it hung trying to dump, so all I have is the stack trace from > DDB. This is recent HEAD running stress2 > > panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../1 Kip and I have theorized that increased parallelism at higher layers of the network stack is exposing route locking and reference counting to more stress than it had done previously, and that as such we're starting to trigger races in the routing code more than we used to. While I wouldn't rule out a FIB-related bug, it seems more likely to me that we've hit a general bug in locking/references in the ethernet link layer / ARP, and we need to take a careful look at what's going on throughout that layer. Unfortunately, that's not something I have time to work on currently, so it would be great if people with an existing interest in the routing code (Julian and Qing have done the most work there recently?) could spend a few hours looking really carefully at what is happening. Robert N M Watson Computer Laboratory University of Cambridge > > cpuid = 1 > KDB: enter: panic > [thread pid 14025 tid 100928 ] > Stopped at kdb_enter+0x3d: movq $0,0x435054(%rip) > db> tr > Tracing pid 14025 tid 100928 td 0xffffff0003773360 > kdb_enter() at kdb_enter+0x3d > panic() at panic+0x14b > _mtx_lock_flags() at _mtx_lock_flags > _mtx_lock_flags() at _mtx_lock_flags+0xc3 > rt_check_fib() at rt_check_fib+0x1ea > arpresolve() at arpresolve+0x77 > ether_output() at ether_output+0x180 > ip_output() at ip_output+0xb4f > udp_send() at udp_send+0x47d > sosend_dgram() at sosend_dgram+0x1fa > soo_write() at soo_write+0x30 > dofilewrite() at dofilewrite+0x7a > kern_writev() at kern_writev+0x52 > write() at write+0x4d > syscall() at syscall+0x1bf > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (4, FreeBSD ELF64, write), rip = 0x80071cb7c, rsp = > 0x7fffffffe628,- > db> c > Uptime: 1h39m18s > Physical memory: 2038 MB > Dumping 263 MB:pid 14025 (udp), uid 26840, was killed: exceeded maximum CPU > limt > pid 14099 (udp), uid 26840, was killed: exceeded maximum CPU limit > pid 14100 (udp), uid 26840, was killed: exceeded maximum CPU limit > > -- > John Baldwin > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.1.10.0808301049420.59527>