From owner-freebsd-current@FreeBSD.ORG Sat Aug 30 09:52:09 2008 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F335B1065670; Sat, 30 Aug 2008 09:52:08 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id C6BA18FC1D; Sat, 30 Aug 2008 09:52:08 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 3D78346B8F; Sat, 30 Aug 2008 05:52:08 -0400 (EDT) Date: Sat, 30 Aug 2008 10:52:08 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: John Baldwin In-Reply-To: <200808291636.10656.jhb@FreeBSD.org> Message-ID: References: <200808291636.10656.jhb@FreeBSD.org> User-Agent: Alpine 1.10 (BSF 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: julian@FreeBSD.org, current@FreeBSD.org Subject: Re: rtentry panic with FIB X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Aug 2008 09:52:09 -0000 On Fri, 29 Aug 2008, John Baldwin wrote: > Unfortunately it hung trying to dump, so all I have is the stack trace from > DDB. This is recent HEAD running stress2 > > panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../1 Kip and I have theorized that increased parallelism at higher layers of the network stack is exposing route locking and reference counting to more stress than it had done previously, and that as such we're starting to trigger races in the routing code more than we used to. While I wouldn't rule out a FIB-related bug, it seems more likely to me that we've hit a general bug in locking/references in the ethernet link layer / ARP, and we need to take a careful look at what's going on throughout that layer. Unfortunately, that's not something I have time to work on currently, so it would be great if people with an existing interest in the routing code (Julian and Qing have done the most work there recently?) could spend a few hours looking really carefully at what is happening. Robert N M Watson Computer Laboratory University of Cambridge > > cpuid = 1 > KDB: enter: panic > [thread pid 14025 tid 100928 ] > Stopped at kdb_enter+0x3d: movq $0,0x435054(%rip) > db> tr > Tracing pid 14025 tid 100928 td 0xffffff0003773360 > kdb_enter() at kdb_enter+0x3d > panic() at panic+0x14b > _mtx_lock_flags() at _mtx_lock_flags > _mtx_lock_flags() at _mtx_lock_flags+0xc3 > rt_check_fib() at rt_check_fib+0x1ea > arpresolve() at arpresolve+0x77 > ether_output() at ether_output+0x180 > ip_output() at ip_output+0xb4f > udp_send() at udp_send+0x47d > sosend_dgram() at sosend_dgram+0x1fa > soo_write() at soo_write+0x30 > dofilewrite() at dofilewrite+0x7a > kern_writev() at kern_writev+0x52 > write() at write+0x4d > syscall() at syscall+0x1bf > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (4, FreeBSD ELF64, write), rip = 0x80071cb7c, rsp = > 0x7fffffffe628,- > db> c > Uptime: 1h39m18s > Physical memory: 2038 MB > Dumping 263 MB:pid 14025 (udp), uid 26840, was killed: exceeded maximum CPU > limt > pid 14099 (udp), uid 26840, was killed: exceeded maximum CPU limit > pid 14100 (udp), uid 26840, was killed: exceeded maximum CPU limit > > -- > John Baldwin > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >