Date: Wed, 29 Apr 2009 19:24:40 +0100 From: Matthew West <mwest@l.zeeb.org> To: freebsd-current@freebsd.org Subject: Re: panic: Bad link elm, nfsd related? Message-ID: <20090429182440.GA74110@zeeb.org> In-Reply-To: <20090323140820.GA37093@zeeb.org> References: <20090323140820.GA37093@zeeb.org>
next in thread | previous in thread | raw e-mail | index | archive | help
FreeBSD 8-CURRENT, built from sources around 27/02/2009: FreeBSD foo.internal 8.0-CURRENT FreeBSD 8.0-CURRENT #5: Fri Apr 17 18:33:02 BST 2009 mwest@foo.internal:/usr/obj/usr/src/sys/DEBUGLOCK amd64 The system is AMD64, with 16GB of RAM, serving a few hundred clients via NFS (v2 and v3) and Samba, from a 800GB ZFS pool; using hardware RAID (aac controller), not RAID-Z. Running a GENERIC kernel, but with the following options enabled: options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC options NFS_LEGACYRPC The last option is per Rick Macklem's suggestion (http://lists.freebsd.org/pipermail/freebsd-current/2009-March/005074.html). While I don't think it's related, I also have Jaakko Heinonen's patch to zfs_znode.c applied, from: http://www.freebsd.org/cgi/query-pr.cgi?pr=132068 After almost 11 days of active usage, there was a system panic. I did manage to get a crash dump: ---------- GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: Bad link elm 0xffffff00074ef400 next->prev != elm cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 xprt_inactive_locked() at xprt_inactive_locked+0x78 svc_vc_rendezvous_recv() at svc_vc_rendezvous_recv+0x335 svc_run_internal() at svc_run_internal+0x347 svc_run() at svc_run+0x94 nlm_syscall() at nlm_syscall+0x826 syscall() at syscall+0x1e7 Xfast_syscall() at Xfast_syscall+0xab --- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x8008b7c6c, rsp = 0x7fffffffecf8, rbp = 0x7fffffffee20 --- KDB: enter: panic Uptime: 11d23h3m42s Physical memory: 3056 MB Dumping 1757 MB: 1742 1726 1710 1694 1678 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/pf.ko...Reading symbols from /boot/kernel/pf.ko.symbols...done. done. Loaded symbols for /boot/kernel/pf.ko #0 doadump () at pcpu.h:196 196 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:196 #1 0xffffffff805428c3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 #2 0xffffffff80542d6c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:576 #3 0xffffffff8071be38 in xprt_inactive_locked (xprt=Variable "xprt" is not available. ) at /usr/src/sys/rpc/svc.c:380 #4 0xffffffff8071f915 in svc_vc_rendezvous_recv (xprt=0xffffff00074ef400, msg=Variable "msg" is not available. ) at /usr/src/sys/rpc/svc_vc.c:352 #5 0xffffffff8071da17 in svc_run_internal (pool=0xffffff0007bd7600, ismaster=1) at /usr/src/sys/rpc/svc.c:787 #6 0xffffffff8071e174 in svc_run (pool=0xffffff0007bd7600) at /usr/src/sys/rpc/svc.c:1223 #7 0xffffffff8070b666 in nlm_syscall (td=Variable "td" is not available. ) at /usr/src/sys/nlm/nlm_prot_impl.c:1573 #8 0xffffffff8080bcd7 in syscall (frame=0xfffffffe9b8bec90) at /usr/src/sys/amd64/amd64/trap.c:898 #9 0xffffffff807e8e8b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:338 #10 0x00000008008b7c6c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) list *0xffffffff8070b666 0xffffffff8070b666 is in nlm_syscall (/usr/src/sys/nlm/nlm_prot_impl.c:1577). 1572 1573 svc_run(pool); 1574 error = 0; 1575 1576 #ifdef NFSCLIENT 1577 nfs_advlock_p = old_nfs_advlock; 1578 nfs_reclaim_p = old_nfs_reclaim; 1579 #endif 1580 1581 out: (kgdb) list *0xffffffff8071e174 0xffffffff8071e174 is in svc_run (/usr/src/sys/rpc/svc.c:1225). 1220 svc_new_thread(pool); 1221 } 1222 1223 svc_run_internal(pool, TRUE); 1224 1225 mtx_lock(&pool->sp_lock); 1226 while (pool->sp_threadcount > 0) 1227 msleep(pool, &pool->sp_lock, 0, "svcexit", 0); 1228 mtx_unlock(&pool->sp_lock); 1229 } ---------- Any suggestions? Should I go back to the newer RPC implementation? Thanks, Matthew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090429182440.GA74110>