From owner-freebsd-fs@FreeBSD.ORG Sat Oct 5 23:05:03 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C1215B38; Sat, 5 Oct 2013 23:05:03 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id 3C5FF2A4B; Sat, 5 Oct 2013 23:05:03 +0000 (UTC) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id 6ECCC5CDC9; Sun, 6 Oct 2013 00:56:51 +0200 (CEST) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id lXNWRE-J9wTk; Sun, 6 Oct 2013 00:56:49 +0200 (CEST) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id 8C27A5CC5D; Sun, 6 Oct 2013 00:56:49 +0200 (CEST) Received: from bsdmhs.longwitz (unknown [192.168.99.6]) by mail.incore (Postfix) with ESMTP id B1C1D509E7; Sun, 6 Oct 2013 00:56:48 +0200 (CEST) Message-ID: <52509930.2090608@incore.de> Date: Sun, 06 Oct 2013 00:56:48 +0200 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: Andriy Gapon Subject: Re: zfs panic during find(1) on zfs snapshot directory References: <522DF5A9.4070103@incore.de> <522E0118.5020106@FreeBSD.org> <522F25AE.1080309@incore.de> <523621DD.7050600@FreeBSD.org> In-Reply-To: <523621DD.7050600@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Oct 2013 23:05:03 -0000 Andriy Gapon wrote: > > Please try this patch: > http://people.freebsd.org/~avg/zfs-gfs-8.diff Without your patch my Stable-8 crashes immediately when I run while true; do ls -l /prod/.zfs >/dev/null; done Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80e1d9cd stack pointer = 0x28:0xffffff82457757e0 frame pointer = 0x28:0xffffff8245775810 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 87276 (ls) [thread pid 87276 tid 100386 ] Stopped at gfs_vop_inactive+0x1d: movl 0x18(%r13),%ecx #10 0xffffffff805c2ba4 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #11 0xffffffff80e1d9cd in gfs_vop_inactive (ap=0xffffff8245775850) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c:1232 #12 0xffffffff80639638 in VOP_INACTIVE_APV (vop=0xffffffff80f3b3c0, a=0xffffff8245775850) at vnode_if.c:1923 #13 0xffffffff80491671 in vinactive (vp=0xffffff0103ebdb10, td=0xffffff0078da78e0) at vnode_if.h:807 #14 0xffffffff80496038 in vputx (vp=0xffffff0103ebdb10, func=2) at /usr/src/sys/kern/vfs_subr.c:2347 #15 0xffffffff8049a4ea in kern_statat_vnhook (td=0xffffff0078da78e0, flag=, fd=, path=, pathseg=, sbp=0xffffff8245775a80, hook=0) at /usr/src/sys/kern/vfs_syscalls.c:2351 #16 0xffffffff8049a645 in kern_statat (td=, flag=, fd=, path=, pathseg=, sbp=) at /usr/src/sys/kern/vfs_syscalls.c:2320 #17 0xffffffff8049a70a in lstat (td=, uap=0xffffff8245775bb0) at /usr/src/sys/kern/vfs_syscalls.c:2383 #18 0xffffffff805db824 in amd64_syscall (td=0xffffff0078da78e0, traced=0) at subr_syscall.c:114 #19 0xffffffff805c2e9c in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 #20 0x000000018098f97c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) f 11 #11 0xffffffff80e1d9cd in gfs_vop_inactive (ap=0xffffff8245775850) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c:1232 1232 gfs_file_t *fp = vp->v_data; (kgdb) info loc vp = (vnode_t *) 0xffffff0103ebdb10 fp = (gfs_file_t *) 0x0 (kgdb) l 1227 struct vnode *a_vp; 1228 struct thread *a_td; 1229 } */ *ap; 1230 { 1231 vnode_t *vp = ap->a_vp; 1232 gfs_file_t *fp = vp->v_data; 1233 1234 if (fp->gfs_type == GFS_DIR) 1235 gfs_dir_inactive(vp); Then I applied your patch to my system with some minor changes, because compile of zfs.ko was not clean: - vrecycle(vp) --> vrecycle(vp, curthread) - vn_lock --> zfs_vnode_lock - zfs_fhtovp(vfs_t *vfsp, fid_t *fidp, int flags, vnode_t **vpp) --> zfs_fhtovp(vfs_t *vfsp, fid_t *fidp, vnode_t **vpp) - in zfsctl_root_lookup arg list: flags --> 0 After loading a new system with your patch integrated in zfs.ko I could run several processes while true; do ls -l /prod/.zfs >/dev/null; done without problems. Starting one process "sudo find /prod/.zfs -mtime +30" gives immediately a panic: panic: __lockmgr_args: unknown lockmgr request 0x0 cpuid = 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1ce __lockmgr_args() at __lockmgr_args+0x21d vop_stdlock() at vop_stdlock+0x39 VOP_LOCK1_APV() at VOP_LOCK1_APV+0x70 _vn_lock() at _vn_lock+0x47 gfs_dir_lookup() at gfs_dir_lookup+0x15e zfsctl_root_lookup() at zfsctl_root_lookup+0x122 zfs_dirlook() at zfs_dirlook+0x297 zfs_lookup() at zfs_lookup+0x262 zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81 VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x62 vfs_cache_lookup() at vfs_cache_lookup+0xf5 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x62 lookup() at lookup+0x44c namei() at namei+0x53d vn_open_cred() at vn_open_cred+0x391 kern_openat() at kern_openat+0x16a amd64_syscall() at amd64_syscall+0x1f4 Xfast_syscall() at Xfast_syscall+0xfc --- syscall (5, FreeBSD ELF64, open), rip = 0x1807359fc, rsp = 0x7fffffffe828, rbp = 0xffffffff --- KDB: enter: panic [thread pid 91769 tid 100298 ] Stopped at kdb_enter+0x3b: movq $0,0x48aae2(%rip) Locked vnodes 0xffffff0002d91000: tag ufs, type VDIR usecount 1, writecount 0, refcount 4 mountedhere 0 flags () v_object 0xffffff0002de9ca8 ref 0 pages 3 lock type ufs: SHARED (count 1) ino 635904, on dev mirror/gm0p9.journal 0xffffff010018ece8: tag zfs, type VDIR usecount 3, writecount 0, refcount 4 mountedhere 0 flags () v_object 0xffffff0073f17d80 ref 0 pages 0 lock type zfs: SHARED (count 1) show lockchain thread 100298 (pid 91769, find) running on CPU 2 #10 0xffffffff803ff367 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:616 #11 0xffffffff803e74ad in __lockmgr_args (lk=0xffffff0073df2270, flags=0, ilk=0xffffff0073df2298, wmesg=, pri=982528, timo=1166294704, file=0xffffffff80d1faf8 "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c", line=984) at /usr/src/sys/kern/kern_lock.c:1194 #12 0xffffffff804833e9 in vop_stdlock (ap=) at lockmgr.h:94 #13 0xffffffff8063a870 in VOP_LOCK1_APV (vop=0xffffffff8082af60, a=0xffffff82458441f0) at vnode_if.c:2052 #14 0xffffffff804a3137 in _vn_lock (vp=0xffffff0073df21d8, flags=0, file=0xffffffff80d1faf8 "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c", line=984) at vnode_if.h:859 #15 0xffffffff80c1d5be in gfs_dir_lookup (dvp=0xffffff00739fc588, nm=0xffffffff80d31cb8 "snapshot", vpp=0xffffff8245844a10, cr=0xffffff0002345000, flags=0, direntflags=0x0, realpnp=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c:984 #16 0xffffffff80cab1b2 in zfsctl_root_lookup (dvp=0xffffff00739fc588, nm=0xffffffff80d31cb8 "snapshot", vpp=0xffffff8245844a10, pnp=0x0, flags=0, rdir=0x0, cr=0xffffff0002345000, ct=0x0, direntflags=0x0, realpnp=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:544 #17 0xffffffff80cae837 in zfs_dirlook (dzp=0xffffff0073e7d450, name=, vpp=0xffffff8245844a10, flags=, deflg=, rpnp=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:400 #18 0xffffffff80cc0ac2 in zfs_lookup (dvp=0xffffff010018ece8, nm=0xffffff82458444a0 "..", vpp=0xffffff8245844a10, cnp=0xffffff8245844a38, nameiop=0, cr=0xffffff0073e09400, td=0xffffff007367b470, flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1481 #19 0xffffffff80cc12f1 in zfs_freebsd_lookup (ap=0xffffff8245844610) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5839 #20 0xffffffff8063a2b2 in VOP_CACHEDLOOKUP_APV (vop=0xffffffff80d3c100, a=0xffffff8245844610) at vnode_if.c:193 #21 0xffffffff80481215 in vfs_cache_lookup (ap=) at vnode_if.h:80 #22 0xffffffff8063a372 in VOP_LOOKUP_APV (vop=0xffffffff80d3c100, a=0xffffff8245844710) at vnode_if.c:126 #23 0xffffffff8048898c in lookup (ndp=0xffffff82458449e0) at vnode_if.h:54 #24 0xffffffff80489b0d in namei (ndp=0xffffff82458449e0) at /usr/src/sys/kern/vfs_lookup.c:269 #25 0xffffffff804a2641 in vn_open_cred (ndp=0xffffff82458449e0, flagp=0xffffff82458449dc, cmode=0, vn_open_flags=, cred=0xffffff0073e09400, fp=0xffffff0002c682d0) at /usr/src/sys/kern/vfs_vnops.c:192 #26 0xffffffff804a161a in kern_openat (td=0xffffff007367b470, fd=-100, path=0x18074e069
, pathseg=UIO_USERSPACE, flags=1, mode=) at /usr/src/sys/kern/vfs_syscalls.c:1077 #27 0xffffffff805db824 in amd64_syscall (td=0xffffff007367b470, traced=0) at subr_syscall.c:114 #28 0xffffffff805c2e9c in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 #29 0x00000001807359fc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) f 11 #11 0xffffffff803e74ad in __lockmgr_args (lk=0xffffff0073df2270, flags=0, ilk=0xffffff0073df2298, wmesg=, pri=982528, timo=1166294704, file=0xffffffff80d1faf8 "/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c", line=984) at /usr/src/sys/kern/kern_lock.c:1194 1194 panic("%s: unknown lockmgr request 0x%x\n", __func__, op); (kgdb) info loc _i = class = (struct lock_class *) 0x0 iwmesg = 0xffffffff80d2d91c "zfs" tid = 0 v = x = op = 0 realexslp = error = ipri = 80 itimo = 51 queue = wakeup_swapper = __func__ = "__lockmgr_args" (kgdb) list 1189 } 1190 break; 1191 default: 1192 if (flags & LK_INTERLOCK) 1193 class->lc_unlock(ilk); 1194 panic("%s: unknown lockmgr request 0x%x\n", __func__, op); 1195 } 1196 1197 if (flags & LK_INTERLOCK) 1198 class->lc_unlock(ilk); If you like I can give more info from the dumps. >> kern/180060 > I will try to look into this issue. Would be fine, maybe you like also to look at http://lists.freebsd.org/pipermail/freebsd-fs/2013-August/018008.html Regards -- Andreas Longwitz