From owner-freebsd-fs@FreeBSD.ORG Thu Oct 11 09:32:57 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 573A69DC; Thu, 11 Oct 2012 09:32:57 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 63C2E8FC08; Thu, 11 Oct 2012 09:32:55 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA00482; Thu, 11 Oct 2012 12:32:54 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TMF8P-000Opr-Ry; Thu, 11 Oct 2012 12:32:54 +0300 Message-ID: <50769243.2010208@FreeBSD.org> Date: Thu, 11 Oct 2012 12:32:51 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: Pawel Jakub Dawidek Subject: Re: ZFS crashing during snapdir lookup for non-existent snapshot... References: <5075E3E0.7060706@FreeBSD.org> <0A6567E7-3BA5-4F27-AEB2-1C00EDE00641@chittenden.org> <5075EDDD.4030008@FreeBSD.org> <5075FA8E.10200@FreeBSD.org> In-Reply-To: <5075FA8E.10200@FreeBSD.org> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" , Sean Chittenden X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Oct 2012 09:32:57 -0000 on 11/10/2012 01:45 Andriy Gapon said the following: > > [restoring mailing list cc] > > on 11/10/2012 00:58 Sean Chittenden said the following: >>>> I don't have a dump from this particular system, only the backtrace from the crash. The system is ZFS only and I only have a ZFS swapdir. :-/ >>>> >>>> I have the kernel still so I can poke at the code and the compiled kernel (kernel.symbols). ? What are you looking for? -sc >>>> >>> >>> list *zfsctl_snapdir_lookup+0x124 in kgdb >> >> (kgdb) list *zfsctl_snapdir_lookup+0x124 >> 0xffffffff816e9384 is in zfsctl_snapdir_lookup (/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:992). >> 987 *direntflags = ED_CASE_CONFLICT; >> 988 #endif >> 989 } >> 990 >> 991 mutex_enter(&sdp->sd_lock); >> 992 search.se_name = (char *)nm; >> 993 if ((sep = avl_find(&sdp->sd_snaps, &search, &where)) != NULL) { >> 994 *vpp = sep->se_root; >> 995 VN_HOLD(*vpp); >> 996 err = traverse(vpp, LK_EXCLUSIVE | LK_RETRY); > > It seems that the problem is in Solaris-ism that remained in the code. > I think that zfsctl_snapdir_inactive should not destroy sdp, that should be a > job of vop_reclaim. Otherwise, if the vnode is re-activated its v_data points > to freed memory. > Particularly I have this scenario in mind: - one thread, T1, performs a vput-ish operation which leads to vop_inactive on a current vnode that represents ".zfs/snapshot" - at the same time T2 executes a lookup that goes into zfsctl_root_lookup - let's assume that at some point T1 is at the very start of zfsctl_snapdir_inactive, it holds just a vnode lock - at the same time T2 is in gfs_dir_lookup->gfs_dir_lookup_static and it has gfs_dir_lock - so T2 finds the 'snapshot' static entry in gfsd_static[] - T2 finds the cached vnode and adds a reference - T2 does gfs_dir_unlock and returns the vnode - now T1 proceeds through zfsctl_snapdir_inactive and destroys the v_data (but without clearing the pointer, even) - T2 uses the vnode and gets a crash Possible resolutions: - make vop_inactive a noop and make vop_reclaim call the current inactive methods - check v_usecount in gfs_file_inactive after gfs_dir_lock is obtained and bail out if it is > 0 (somewhat similar to what zfs_zinactive does) - something else? Easy way to reproduce the problem in one way or another - run many of the following in parallel: while true; do ls -l /pool/fs/.zfs/ >/dev/null; done Here is another panic that is a variation of the above scenario. Duplicate gfs_vop_inactive is called after a "harmless" vop_pathconf call (doesn't touch a vnode). In this case the "shares" entry appears to be a random victim: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff825fe7dd stack pointer = 0x28:0xffffff80e040b800 frame pointer = 0x28:0xffffff80e040b830 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 712 (ls) trap number = 12 panic: page fault cpuid = 1 curthread: 0xfffffe0003d8a9a0 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff802d2bba = db_trace_self_wrapper+0x2a kdb_backtrace() at 0xffffffff805596fa = kdb_backtrace+0x3a panic() at 0xffffffff8051c2a6 = panic+0x266 trap_fatal() at 0xffffffff8070741d = trap_fatal+0x3ad trap_pfault() at 0xffffffff8070756c = trap_pfault+0x12c trap() at 0xffffffff80707d19 = trap+0x4f9 calltrap() at 0xffffffff806ef903 = calltrap+0x8 --- trap 0xc, rip = 0xffffffff825fe7dd, rsp = 0xffffff80e040b800, rbp = 0xffffff80e040b830 --- gfs_vop_inactive() at 0xffffffff825fe7dd = gfs_vop_inactive+0x1d VOP_INACTIVE_APV() at 0xffffffff80782fb4 = VOP_INACTIVE_APV+0x114 vinactive() at 0xffffffff805c84ad = vinactive+0x15d vputx() at 0xffffffff805ca962 = vputx+0x4d2 vput() at 0xffffffff805ca9ce = vput+0xe kern_pathconf() at 0xffffffff805cd44e = kern_pathconf+0x10e sys_lpathconf() at 0xffffffff805cd4aa = sys_lpathconf+0x1a amd64_syscall() at 0xffffffff80706953 = amd64_syscall+0x313 Xfast_syscall() at 0xffffffff806efbe7 = Xfast_syscall+0xf7 -- Andriy Gapon