Date: Sat, 30 Aug 2008 07:57:59 -0700 From: Julian Elischer <julian@elischer.org> To: Robert Watson <rwatson@FreeBSD.org> Cc: julian@FreeBSD.org, current@FreeBSD.org, John Baldwin <jhb@FreeBSD.org> Subject: Re: rtentry panic with FIB Message-ID: <48B95FF7.6030003@elischer.org> In-Reply-To: <alpine.BSF.1.10.0808301049420.59527@fledge.watson.org> References: <200808291636.10656.jhb@FreeBSD.org> <alpine.BSF.1.10.0808301049420.59527@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote: > > On Fri, 29 Aug 2008, John Baldwin wrote: > >> Unfortunately it hung trying to dump, so all I have is the stack trace >> from DDB. This is recent HEAD running stress2 >> >> panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../1 > > Kip and I have theorized that increased parallelism at higher layers of > the network stack is exposing route locking and reference counting to > more stress than it had done previously, and that as such we're starting > to trigger races in the routing code more than we used to. While I > wouldn't rule out a FIB-related bug, it seems more likely to me that > we've hit a general bug in locking/references in the ethernet link layer > / ARP, and we need to take a careful look at what's going on throughout > that layer. > > Unfortunately, that's not something I have time to work on currently, so > it would be great if people with an existing interest in the routing > code (Julian and Qing have done the most work there recently?) could > spend a few hours looking really carefully at what is happening. I'm planning on spending few hours on looking at this this weekend.. > > Robert N M Watson > Computer Laboratory > University of Cambridge > >> >> cpuid = 1 >> KDB: enter: panic >> [thread pid 14025 tid 100928 ] >> Stopped at kdb_enter+0x3d: movq $0,0x435054(%rip) >> db> tr >> Tracing pid 14025 tid 100928 td 0xffffff0003773360 >> kdb_enter() at kdb_enter+0x3d >> panic() at panic+0x14b >> _mtx_lock_flags() at _mtx_lock_flags >> _mtx_lock_flags() at _mtx_lock_flags+0xc3 >> rt_check_fib() at rt_check_fib+0x1ea >> arpresolve() at arpresolve+0x77 >> ether_output() at ether_output+0x180 >> ip_output() at ip_output+0xb4f >> udp_send() at udp_send+0x47d >> sosend_dgram() at sosend_dgram+0x1fa >> soo_write() at soo_write+0x30 >> dofilewrite() at dofilewrite+0x7a >> kern_writev() at kern_writev+0x52 >> write() at write+0x4d >> syscall() at syscall+0x1bf >> Xfast_syscall() at Xfast_syscall+0xab >> --- syscall (4, FreeBSD ELF64, write), rip = 0x80071cb7c, rsp = >> 0x7fffffffe628,- >> db> c >> Uptime: 1h39m18s >> Physical memory: 2038 MB >> Dumping 263 MB:pid 14025 (udp), uid 26840, was killed: exceeded >> maximum CPU >> limt >> pid 14099 (udp), uid 26840, was killed: exceeded maximum CPU limit >> pid 14100 (udp), uid 26840, was killed: exceeded maximum CPU limit >> >> -- >> John Baldwin >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to >> "freebsd-current-unsubscribe@freebsd.org" >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48B95FF7.6030003>