From owner-freebsd-current@FreeBSD.ORG Mon Mar 23 14:56:37 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59EA01065675 for ; Mon, 23 Mar 2009 14:56:37 +0000 (UTC) (envelope-from mwest@zeeb.org) Received: from zeeb.org (zeeb.org [88.198.32.244]) by mx1.freebsd.org (Postfix) with ESMTP id 206C38FC16 for ; Mon, 23 Mar 2009 14:56:36 +0000 (UTC) (envelope-from mwest@zeeb.org) Received: from mwest by zeeb.org with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1LlkpA-0009jX-VB for freebsd-current@freebsd.org; Mon, 23 Mar 2009 14:08:20 +0000 Date: Mon, 23 Mar 2009 14:08:20 +0000 From: Matthew West To: freebsd-current@freebsd.org Message-ID: <20090323140820.GA37093@zeeb.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.16 (2007-06-09) Sender: Matthew West Subject: panic: Bad link elm, nfsd related? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Mar 2009 14:56:38 -0000 FreeBSD 8-CURRENT, built from sources around 27/02/2009: FreeBSD foo.internal 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri Feb 27 12:43:45 GMT 2009 mwest@foo.internal:/usr/obj/usr/src/sys/DEBUGLOCK amd64 The system is AMD64, with 16GB of RAM, serving a few clients via NFS (v2 and v3) and Samba, from a 800GB ZFS pool; using hardware RAID (aac controller), not RAID-Z. Running a GENERIC kernel, but with the standard deadlock debugging options enabled. After 1-2 weeks, the system will panic with the following: ---------- panic: Bad link elm 0xffffff0011febc00 next->prev != elm cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 xprt_unregister_locked() at xprt_unregister_locked+0xbe xprt_unregister() at xprt_unregister+0x2c svc_run_internal() at svc_run_internal+0x42f svc_thread_start() at svc_thread_start+0xb fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0xc, rip = 0x800695c4c, rsp = 0x7fffffffe8e8, rbp = 0 --- KDB: enter: panic [thread pid 920 tid 100272 ] Stopped at kdb_enter+0x3d: movq $0,0x65ba38(%rip) db> bt Tracing pid 920 tid 100272 td 0xffffff000649a000 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b xprt_unregister_locked() at xprt_unregister_locked+0xbe xprt_unregister() at xprt_unregister+0x2c svc_run_internal() at svc_run_internal+0x42f svc_thread_start() at svc_thread_start+0xb fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0xc, rip = 0x800695c4c, rsp = 0x7fffffffe8e8, rbp = 0 --- db> ps pid ppid pgrp uid state wmesg wchan cmd [ ... ] 920 919 919 0 R (threaded) nfsd [ ... ] db> panic < machine hangs hard and needs to be power cycled > ---------- Unfortunately, whenever I attempt to get the system to do a kernel core dump, it simply hangs... Even if I panic the machine by sending a break it doesn't work: ---------- db> cont Uptime: 10m22s Physical memory: 3056 MB Dumping 252 MB: 237 221 205 189 173 157 141Error dumping block 0x0 ** DUMP FAILED (ERROR 5) ** aac0: shutting down controller...FAILED. ---------- I've done some searching through the archives, but can't find anything useful. Does anyone have any clues for me on: 1) How to get a kernel crash dump out of KDB in 8-CURRENT at the moment? 2) What the problem with nfsd is? Thanks, Matthew