From owner-freebsd-fs@FreeBSD.ORG Wed Jul 7 21:42:52 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E32B1065672; Wed, 7 Jul 2010 21:42:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 201218FC08; Wed, 7 Jul 2010 21:42:52 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 9267E46B94; Wed, 7 Jul 2010 17:42:51 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D79188A04E; Wed, 7 Jul 2010 17:42:49 -0400 (EDT) From: John Baldwin To: Nathaniel W Filardo Date: Wed, 7 Jul 2010 16:42:28 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <20100609212747.GF21929@gradx.cs.jhu.edu> <20100703085516.GH21929@gradx.cs.jhu.edu> In-Reply-To: <20100703085516.GH21929@gradx.cs.jhu.edu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201007071642.28847.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 07 Jul 2010 17:42:49 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: alc@freebsd.org, freebsd-fs@freebsd.org Subject: Re: [sparc64] [ZFS] panic: mutex vnode interlock not owned X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Jul 2010 21:42:52 -0000 On Saturday, July 03, 2010 4:55:16 am Nathaniel W Filardo wrote: > (hello freebsd-fs@; I'm cc:ing you since the latest part of my story > involves a ZFS-related panic and I hear you're the right place to go with > those. It began attempting to debug a VM locking panic and has moved a > little...) > > On Thu, Jun 10, 2010 at 12:23:24PM -0500, Alan Cox wrote: > > On Thu, Jun 10, 2010 at 7:16 AM, John Baldwin wrote: > > > > > On Wednesday 09 June 2010 5:27:47 pm Nathaniel W Filardo wrote: > > > > Attempting to boot on (2-way SMP; SUN Fire V240) sparc64 a 9.0-CURRENT > > > > kernel built on Jun 9 at 14:41, and fully csup'd before building (I don't > > > > have the SVN revision number, sorry) yields, surprisingly late in the > > > boot > > > > process, this panic: > > > > > > > > panic: mutex vm object not owned at /systank/src/sys/vm/vm_object.c:1692 > > > > cpuid = 0 > > > > KDB: stack backtrace: > > > > panic() at panic+0x1c8 > > > > _mtx_assert() at _mtx_assert+0xb0 > > > > vm_object_collapse() at vm_object_collapse+0x28 > > > > vm_object_deallocate() at vm_object_deallocate+0x538 > > > > _vm_map_unlock() at _vm_map_unlock+0x64 > > > > vm_map_remove() at vm_map_remove+0x64 > > > > vmspace_exit() at vmspace_exit+0x100 > > > > exit1() at exit1+0x788 > > > > sys_exit() at sys_exit+0x10 > > > > syscallenter() at syscallenter+0x268 > > > > syscall() at syscall+0x74 > > > > -- syscall (1, FreeBSD ELF64, sys_exit) %o7=0x11980c -- > > > > userland() at 0x406fe8c8 > > > > user trace: trap %o7=0x11980c > > > > pc 0x406fe8c8, sp 0x7fdffff7611 > > > > done > > > > Uptime: 4m7s > > > > > > > > The system was, at the time, attempting to bring up its jails. > > > > > > > > Anything else that would be helpful to know? > > > > > > Can you get a crashdump? If so, it would be good to pull up gdb and check > > > the > > > value sof 'object' and 'robject' in the vm_object_deallocate() frame. > > > > > > > > That would be useful. None of the locking changes of the last few weeks > > have altered the vm object locking, so this assertion failure and stack > > trace come as something of a surprise. > > > > Alan > > Well, I thought that no longer delegating ZFS (with "zfs jail") to the jail > whose startup was causing the above panic might solve the problem and indeed > the system made it slightly further. A few minutes after reaching the > login: prompt, though, it produced > > panic: mutex vnode interlock not owned at /systank/src/sys/kern/kern_mutex.c:223 > cpuid = 0 > KDB: stack backtrace: > panic() at panic+0x1c8 > _mtx_assert() at _mtx_assert+0xb0 > _mtx_unlock_flags() at _mtx_unlock_flags+0x144 > vnlru_free() at vnlru_free+0x500 > getnewvnode() at getnewvnode+0x7c > zfs_znode_cache_constructor() at zfs_znode_cache_constructor+0x4c > zfs_znode_alloc() at zfs_znode_alloc+0x34 > zfs_zget() at zfs_zget+0x2b8 > zfs_dirent_lock() at zfs_dirent_lock+0x508 > zfs_dirlook() at zfs_dirlook+0x50 > zfs_lookup() at zfs_lookup+0x1bc > zfs_freebsd_lookup() at zfs_freebsd_lookup+0x6c > VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x108 > vfs_cache_lookup() at vfs_cache_lookup+0xfc > VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x110 > lookup() at lookup+0x7d0 > namei() at namei+0x69c > kern_statat_vnhook() at kern_statat_vnhook+0x48 > kern_statat() at kern_statat+0x1c > kern_lstat() at kern_lstat+0x18 > lstat() at lstat+0x14 > syscallenter() at syscallenter+0x27c > syscall() at syscall+0x74 > -- syscall (190, FreeBSD ELF64, lstat) %o7=0x12b830 -- > ... > > which at least is consistent with my hunch that the original panic had > something to do with ZFS. The system is as of svn 209653 (git c65b199...) > with http://people.freebsd.org/~marius/sparc64_pin_ipis.diff applied. The > old kernel has uname > FreeBSD hydra.priv.oc.ietfng.org 9.0-CURRENT FreeBSD 9.0-CURRENT #20: Sun > Apr 4 20:31:58 EDT 2010 > root@hydra.priv.oc.ietfng.org:/systank/obj/systank/src/sys/NWFKERN sparc64 > which is probably too old to be of use to anybody, but just in case, there > it is. I don't suspect the machine of having bad hardware since this old > kernel runs apparently fine on it and zpool scrubs haven't found anything > yet. > > I can't easily get a crash dump on the system (if somebody could tell me how > to get one from a ddb(4) prompt, I could try that, but otherwise the system > just ceases to do anything after panic; I have swap and dump set, so I'm not > sure what's not happening there...). > > Anything more I should do? I really think you might have some sort of hardware issue as all of your reported panics have been weird "can't happen" cases. -- John Baldwin