From owner-freebsd-fs@FreeBSD.ORG Wed Oct 10 21:09:11 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 53829A25 for ; Wed, 10 Oct 2012 21:09:11 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7FD808FC08 for ; Wed, 10 Oct 2012 21:09:10 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA24084; Thu, 11 Oct 2012 00:08:49 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TM3WL-000LPu-7D; Thu, 11 Oct 2012 00:08:49 +0300 Message-ID: <5075E3E0.7060706@FreeBSD.org> Date: Thu, 11 Oct 2012 00:08:48 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: Sean Chittenden Subject: Re: ZFS crashing during snapdir lookup for non-existent snapshot... References: In-Reply-To: X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2012 21:09:11 -0000 on 10/10/2012 23:57 Sean Chittenden said the following: > Using a FreeBSD -STABLE build from 2012-09-17, I now have the ability to crash FreeBSD/ZFS within a few hours of stress testing. It appears as though there's a locking problem when attempting to interrogate stats on a ZFS snapshot that doesn't exist any more. I believe the scenario is as follows: > > Background: > > *) `zfs set snapdir=visible` /was/ set on a data set > > *) Snapshots were being run once an hour for weeks, long enough for zabbix to auto-discover the snapshots as valid file systems. > > *) `zfs inherit snapdir` was recently set (about a week ago), but zabbix is still attempting to inquire about no snapshots that are no longer visible or exist. > > > After snapshots were deleted through the normal process of aging, zabbix is still interrogating the file system attempting to acquire information about the now deleted snapshots. > > FreeBSD crashes once every few minutes when zabbix is running and pulling ZFS information about the now hidden (or most likely deleted) snapshots. I believe that zabbix is using getfsspec(3) with the now stale snapshot name in rapid succession and is somehow triggering a race when there are two concurrent calls to two different non-existent snapshots. > > -sc > > > kernel: Fatal trap 12: page fault while in kernel mode > kernel: cpuid = 0; apic id = 00 > kernel: fault virtual address = 0x368 > kernel: fault code = supervisor read data, page not present > kernel: instruction pointer = 0x20:0xffffffff80922be2 > kernel: stack pointer = 0x28:0xffffff8487d7b0d0 > kernel: frame pointer = 0x28:0xffffff8487d7b170 > kernel: code segment = base 0x0, limit 0xfffff, type 0x1b > kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 > kernel: processor eflags = interrupt enabled, resume, IOPL = 0 > kernel: current process = 3536 (zabbix_agentd) > kernel: trap number = 12 > kernel: panic: page fault > kernel: cpuid = 0 > kernel: KDB: stack backtrace: > kernel: #0 0xffffffff80950800 at kdb_backtrace+0x60 > kernel: #1 0xffffffff8091ac2d at panic+0x1fd > kernel: #2 0xffffffff80c21858 at trap_fatal+0x388 > kernel: #3 0xffffffff80c21b23 at trap_pfault+0x2b3 > kernel: #4 0xffffffff80c212b5 at trap+0x5b5 > kernel: #5 0xffffffff80c0ba22 at calltrap+0x8 > kernel: #6 0xffffffff8092271e at _sx_xlock+0x5e > kernel: #7 0xffffffff816e9384 at zfsctl_snapdir_lookup+0x124 > kernel: #8 0xffffffff80cb385f at VOP_LOOKUP_APV+0x5f > kernel: #9 0xffffffff809a307f at lookup+0x5ef > kernel: #10 0xffffffff809a263d at namei+0x62d > kernel: #11 0xffffffff809b2b39 at kern_statfs+0x89 > kernel: #12 0xffffffff809b2a80 at sys_statfs+0x20 > kernel: #13 0xffffffff80c22134 at amd64_syscall+0x334 > > FreeBSD example.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Mon Sep 17 04:34:37 UTC 2012 root@example.com:/usr/obj/usr/src/sys/GENERIC amd64 > > 0xffffffff80922be2 is in _sx_xlock_hard (/usr/src/sys/kern/kern_sx.c:546). > 541 x = sx->sx_lock; > 542 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0) { > 543 if ((x & SX_LOCK_SHARED) == 0) { > 544 x = SX_OWNER(x); > 545 owner = (struct thread *)x; > 546 if (TD_IS_RUNNING(owner)) { > 547 if (LOCK_LOG_TEST(&sx->lock_object, 0)) > 548 CTR3(KTR_LOCK, > 549 "%s: spinning on %p held by %p", > 550 __func__, sx, owner); > Could you please rather list frame #7 (zfsctl_snapdir_lookup+0x124)? -- Andriy Gapon