From owner-freebsd-current@FreeBSD.ORG  Sat Aug 30 09:52:09 2008
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F335B1065670;
	Sat, 30 Aug 2008 09:52:08 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id C6BA18FC1D;
	Sat, 30 Aug 2008 09:52:08 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 3D78346B8F;
	Sat, 30 Aug 2008 05:52:08 -0400 (EDT)
Date: Sat, 30 Aug 2008 10:52:08 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200808291636.10656.jhb@FreeBSD.org>
Message-ID: <alpine.BSF.1.10.0808301049420.59527@fledge.watson.org>
References: <200808291636.10656.jhb@FreeBSD.org>
User-Agent: Alpine 1.10 (BSF 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: julian@FreeBSD.org, current@FreeBSD.org
Subject: Re: rtentry panic with FIB
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Aug 2008 09:52:09 -0000


On Fri, 29 Aug 2008, John Baldwin wrote:

> Unfortunately it hung trying to dump, so all I have is the stack trace from 
> DDB.  This is recent HEAD running stress2
>
> panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../1

Kip and I have theorized that increased parallelism at higher layers of the 
network stack is exposing route locking and reference counting to more stress 
than it had done previously, and that as such we're starting to trigger races 
in the routing code more than we used to.  While I wouldn't rule out a 
FIB-related bug, it seems more likely to me that we've hit a general bug in 
locking/references in the ethernet link layer / ARP, and we need to take a 
careful look at what's going on throughout that layer.

Unfortunately, that's not something I have time to work on currently, so it 
would be great if people with an existing interest in the routing code (Julian 
and Qing have done the most work there recently?) could spend a few hours 
looking really carefully at what is happening.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> cpuid = 1
> KDB: enter: panic
> [thread pid 14025 tid 100928 ]
> Stopped at      kdb_enter+0x3d: movq    $0,0x435054(%rip)
> db> tr
> Tracing pid 14025 tid 100928 td 0xffffff0003773360
> kdb_enter() at kdb_enter+0x3d
> panic() at panic+0x14b
> _mtx_lock_flags() at _mtx_lock_flags
> _mtx_lock_flags() at _mtx_lock_flags+0xc3
> rt_check_fib() at rt_check_fib+0x1ea
> arpresolve() at arpresolve+0x77
> ether_output() at ether_output+0x180
> ip_output() at ip_output+0xb4f
> udp_send() at udp_send+0x47d
> sosend_dgram() at sosend_dgram+0x1fa
> soo_write() at soo_write+0x30
> dofilewrite() at dofilewrite+0x7a
> kern_writev() at kern_writev+0x52
> write() at write+0x4d
> syscall() at syscall+0x1bf
> Xfast_syscall() at Xfast_syscall+0xab
> --- syscall (4, FreeBSD ELF64, write), rip = 0x80071cb7c, rsp =
> 0x7fffffffe628,-
> db> c
> Uptime: 1h39m18s
> Physical memory: 2038 MB
> Dumping 263 MB:pid 14025 (udp), uid 26840, was killed: exceeded maximum CPU
> limt
> pid 14099 (udp), uid 26840, was killed: exceeded maximum CPU limit
> pid 14100 (udp), uid 26840, was killed: exceeded maximum CPU limit
>
> -- 
> John Baldwin
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>