From owner-freebsd-current@FreeBSD.ORG  Mon Mar 23 14:56:37 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 59EA01065675
	for <freebsd-current@freebsd.org>; Mon, 23 Mar 2009 14:56:37 +0000 (UTC)
	(envelope-from mwest@zeeb.org)
Received: from zeeb.org (zeeb.org [88.198.32.244])
	by mx1.freebsd.org (Postfix) with ESMTP id 206C38FC16
	for <freebsd-current@freebsd.org>; Mon, 23 Mar 2009 14:56:36 +0000 (UTC)
	(envelope-from mwest@zeeb.org)
Received: from mwest by zeeb.org with local (Exim 4.69 (FreeBSD))
	(envelope-from <mwest@zeeb.org>) id 1LlkpA-0009jX-VB
	for freebsd-current@freebsd.org; Mon, 23 Mar 2009 14:08:20 +0000
Date: Mon, 23 Mar 2009 14:08:20 +0000
From: Matthew West <mwest@l.zeeb.org>
To: freebsd-current@freebsd.org
Message-ID: <20090323140820.GA37093@zeeb.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.16 (2007-06-09)
Sender: Matthew West <mwest@zeeb.org>
Subject: panic: Bad link elm, nfsd related?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Mar 2009 14:56:38 -0000

FreeBSD 8-CURRENT, built from sources around 27/02/2009:

FreeBSD foo.internal 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri Feb 27 12:43:45 GMT 2009 mwest@foo.internal:/usr/obj/usr/src/sys/DEBUGLOCK amd64

The system is AMD64, with 16GB of RAM, serving a few clients via NFS (v2
and v3) and Samba, from a 800GB ZFS pool; using hardware RAID (aac
controller), not RAID-Z.  Running a GENERIC kernel, but with the
standard deadlock debugging options enabled.

After 1-2 weeks, the system will panic with the following:

----------
panic: Bad link elm 0xffffff0011febc00 next->prev != elm
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x182
xprt_unregister_locked() at xprt_unregister_locked+0xbe
xprt_unregister() at xprt_unregister+0x2c
svc_run_internal() at svc_run_internal+0x42f
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x800695c4c, rsp = 0x7fffffffe8e8, rbp = 0 ---
KDB: enter: panic
[thread pid 920 tid 100272 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x65ba38(%rip)
db> bt
Tracing pid 920 tid 100272 td 0xffffff000649a000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
xprt_unregister_locked() at xprt_unregister_locked+0xbe
xprt_unregister() at xprt_unregister+0x2c
svc_run_internal() at svc_run_internal+0x42f
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x800695c4c, rsp = 0x7fffffffe8e8, rbp = 0 ---
db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
[ ... ]
  920   919   919     0  R       (threaded)                  nfsd
[ ... ]
db> panic
< machine hangs hard and needs to be power cycled >
----------

Unfortunately, whenever I attempt to get the system to do a kernel core
dump, it simply hangs...

Even if I panic the machine by sending a break it doesn't work:

----------
db> cont
Uptime: 10m22s
Physical memory: 3056 MB
Dumping 252 MB: 237 221 205 189 173 157 141Error dumping block 0x0

** DUMP FAILED (ERROR 5) **
aac0: shutting down controller...FAILED.
----------

I've done some searching through the archives, but can't find anything
useful.  Does anyone have any clues for me on:

1) How to get a kernel crash dump out of KDB in 8-CURRENT at the moment?

2) What the problem with nfsd is?

Thanks,

Matthew