From owner-freebsd-fs@FreeBSD.ORG  Wed Jul  7 21:42:52 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4E32B1065672;
	Wed,  7 Jul 2010 21:42:52 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 201218FC08;
	Wed,  7 Jul 2010 21:42:52 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 9267E46B94;
	Wed,  7 Jul 2010 17:42:51 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id D79188A04E;
	Wed,  7 Jul 2010 17:42:49 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Nathaniel W Filardo <nwf@cs.jhu.edu>
Date: Wed, 7 Jul 2010 16:42:28 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; )
References: <20100609212747.GF21929@gradx.cs.jhu.edu>
	<AANLkTim71FSw51tyzFE6EVwnPCT_b4JnMAdLdF_IkSWT@mail.gmail.com>
	<20100703085516.GH21929@gradx.cs.jhu.edu>
In-Reply-To: <20100703085516.GH21929@gradx.cs.jhu.edu>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201007071642.28847.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Wed, 07 Jul 2010 17:42:49 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: alc@freebsd.org, freebsd-fs@freebsd.org
Subject: Re: [sparc64] [ZFS] panic: mutex vnode interlock not owned
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Jul 2010 21:42:52 -0000

On Saturday, July 03, 2010 4:55:16 am Nathaniel W Filardo wrote:
> (hello freebsd-fs@; I'm cc:ing you since the latest part of my story
> involves a ZFS-related panic and I hear you're the right place to go with
> those.  It began attempting to debug a VM locking panic and has moved a
> little...)
> 
> On Thu, Jun 10, 2010 at 12:23:24PM -0500, Alan Cox wrote:
> > On Thu, Jun 10, 2010 at 7:16 AM, John Baldwin <jhb@freebsd.org> wrote:
> > 
> > > On Wednesday 09 June 2010 5:27:47 pm Nathaniel W Filardo wrote:
> > > > Attempting to boot on (2-way SMP; SUN Fire V240) sparc64 a 9.0-CURRENT
> > > > kernel built on Jun 9 at 14:41, and fully csup'd before building (I 
don't
> > > > have the SVN revision number, sorry) yields, surprisingly late in the
> > > boot
> > > > process, this panic:
> > > >
> > > > panic: mutex vm object not owned at 
/systank/src/sys/vm/vm_object.c:1692
> > > > cpuid = 0
> > > > KDB: stack backtrace:
> > > > panic() at panic+0x1c8
> > > > _mtx_assert() at _mtx_assert+0xb0
> > > > vm_object_collapse() at vm_object_collapse+0x28
> > > > vm_object_deallocate() at vm_object_deallocate+0x538
> > > > _vm_map_unlock() at _vm_map_unlock+0x64
> > > > vm_map_remove() at vm_map_remove+0x64
> > > > vmspace_exit() at vmspace_exit+0x100
> > > > exit1() at exit1+0x788
> > > > sys_exit() at sys_exit+0x10
> > > > syscallenter() at syscallenter+0x268
> > > > syscall() at syscall+0x74
> > > > -- syscall (1, FreeBSD ELF64, sys_exit) %o7=0x11980c --
> > > > userland() at 0x406fe8c8
> > > > user trace: trap %o7=0x11980c
> > > > pc 0x406fe8c8, sp 0x7fdffff7611
> > > > done
> > > > Uptime: 4m7s
> > > >
> > > > The system was, at the time, attempting to bring up its jails.
> > > >
> > > > Anything else that would be helpful to know?
> > >
> > > Can you get a crashdump?  If so, it would be good to pull up gdb and 
check
> > > the
> > > value sof 'object' and 'robject' in the vm_object_deallocate() frame.
> > >
> > >
> > That would be useful.  None of the locking changes of the last few weeks
> > have altered the vm object locking, so this assertion failure and stack
> > trace come as something of a surprise.
> > 
> > Alan
> 
> Well, I thought that no longer delegating ZFS (with "zfs jail") to the jail
> whose startup was causing the above panic might solve the problem and indeed
> the system made it slightly further.  A few minutes after reaching the
> login: prompt, though, it produced
> 
> panic: mutex vnode interlock not owned at 
/systank/src/sys/kern/kern_mutex.c:223
> cpuid = 0
> KDB: stack backtrace:
> panic() at panic+0x1c8
> _mtx_assert() at _mtx_assert+0xb0
> _mtx_unlock_flags() at _mtx_unlock_flags+0x144
> vnlru_free() at vnlru_free+0x500
> getnewvnode() at getnewvnode+0x7c
> zfs_znode_cache_constructor() at zfs_znode_cache_constructor+0x4c
> zfs_znode_alloc() at zfs_znode_alloc+0x34
> zfs_zget() at zfs_zget+0x2b8
> zfs_dirent_lock() at zfs_dirent_lock+0x508
> zfs_dirlook() at zfs_dirlook+0x50
> zfs_lookup() at zfs_lookup+0x1bc
> zfs_freebsd_lookup() at zfs_freebsd_lookup+0x6c
> VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x108
> vfs_cache_lookup() at vfs_cache_lookup+0xfc
> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x110
> lookup() at lookup+0x7d0
> namei() at namei+0x69c
> kern_statat_vnhook() at kern_statat_vnhook+0x48
> kern_statat() at kern_statat+0x1c
> kern_lstat() at kern_lstat+0x18
> lstat() at lstat+0x14
> syscallenter() at syscallenter+0x27c
> syscall() at syscall+0x74
> -- syscall (190, FreeBSD ELF64, lstat) %o7=0x12b830 --
> ...
> 
> which at least is consistent with my hunch that the original panic had
> something to do with ZFS.  The system is as of svn 209653 (git c65b199...)
> with http://people.freebsd.org/~marius/sparc64_pin_ipis.diff applied.  The
> old kernel has uname
>   FreeBSD hydra.priv.oc.ietfng.org 9.0-CURRENT FreeBSD 9.0-CURRENT #20: Sun
>   Apr  4 20:31:58 EDT 2010
>   root@hydra.priv.oc.ietfng.org:/systank/obj/systank/src/sys/NWFKERN  
sparc64
> which is probably too old to be of use to anybody, but just in case, there
> it is.  I don't suspect the machine of having bad hardware since this old
> kernel runs apparently fine on it and zpool scrubs haven't found anything
> yet.
> 
> I can't easily get a crash dump on the system (if somebody could tell me how
> to get one from a ddb(4) prompt, I could try that, but otherwise the system
> just ceases to do anything after panic; I have swap and dump set, so I'm not
> sure what's not happening there...).
> 
> Anything more I should do?

I really think you might have some sort of hardware issue as all of your 
reported panics have been weird "can't happen" cases.

-- 
John Baldwin