Date: Thu, 11 Oct 2012 00:08:48 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Sean Chittenden <sean@chittenden.org> Cc: freebsd-fs@FreeBSD.org Subject: Re: ZFS crashing during snapdir lookup for non-existent snapshot... Message-ID: <5075E3E0.7060706@FreeBSD.org> In-Reply-To: <B244C0E9-539D-4F7C-8616-378E8469F4BB@chittenden.org> References: <B244C0E9-539D-4F7C-8616-378E8469F4BB@chittenden.org>
next in thread | previous in thread | raw e-mail | index | archive | help
on 10/10/2012 23:57 Sean Chittenden said the following: > Using a FreeBSD -STABLE build from 2012-09-17, I now have the ability to crash FreeBSD/ZFS within a few hours of stress testing. It appears as though there's a locking problem when attempting to interrogate stats on a ZFS snapshot that doesn't exist any more. I believe the scenario is as follows: > > Background: > > *) `zfs set snapdir=visible` /was/ set on a data set > > *) Snapshots were being run once an hour for weeks, long enough for zabbix to auto-discover the snapshots as valid file systems. > > *) `zfs inherit snapdir` was recently set (about a week ago), but zabbix is still attempting to inquire about no snapshots that are no longer visible or exist. > > > After snapshots were deleted through the normal process of aging, zabbix is still interrogating the file system attempting to acquire information about the now deleted snapshots. > > FreeBSD crashes once every few minutes when zabbix is running and pulling ZFS information about the now hidden (or most likely deleted) snapshots. I believe that zabbix is using getfsspec(3) with the now stale snapshot name in rapid succession and is somehow triggering a race when there are two concurrent calls to two different non-existent snapshots. > > -sc > > > kernel: Fatal trap 12: page fault while in kernel mode > kernel: cpuid = 0; apic id = 00 > kernel: fault virtual address = 0x368 > kernel: fault code = supervisor read data, page not present > kernel: instruction pointer = 0x20:0xffffffff80922be2 > kernel: stack pointer = 0x28:0xffffff8487d7b0d0 > kernel: frame pointer = 0x28:0xffffff8487d7b170 > kernel: code segment = base 0x0, limit 0xfffff, type 0x1b > kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 > kernel: processor eflags = interrupt enabled, resume, IOPL = 0 > kernel: current process = 3536 (zabbix_agentd) > kernel: trap number = 12 > kernel: panic: page fault > kernel: cpuid = 0 > kernel: KDB: stack backtrace: > kernel: #0 0xffffffff80950800 at kdb_backtrace+0x60 > kernel: #1 0xffffffff8091ac2d at panic+0x1fd > kernel: #2 0xffffffff80c21858 at trap_fatal+0x388 > kernel: #3 0xffffffff80c21b23 at trap_pfault+0x2b3 > kernel: #4 0xffffffff80c212b5 at trap+0x5b5 > kernel: #5 0xffffffff80c0ba22 at calltrap+0x8 > kernel: #6 0xffffffff8092271e at _sx_xlock+0x5e > kernel: #7 0xffffffff816e9384 at zfsctl_snapdir_lookup+0x124 > kernel: #8 0xffffffff80cb385f at VOP_LOOKUP_APV+0x5f > kernel: #9 0xffffffff809a307f at lookup+0x5ef > kernel: #10 0xffffffff809a263d at namei+0x62d > kernel: #11 0xffffffff809b2b39 at kern_statfs+0x89 > kernel: #12 0xffffffff809b2a80 at sys_statfs+0x20 > kernel: #13 0xffffffff80c22134 at amd64_syscall+0x334 > > FreeBSD example.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Mon Sep 17 04:34:37 UTC 2012 root@example.com:/usr/obj/usr/src/sys/GENERIC amd64 > > 0xffffffff80922be2 is in _sx_xlock_hard (/usr/src/sys/kern/kern_sx.c:546). > 541 x = sx->sx_lock; > 542 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0) { > 543 if ((x & SX_LOCK_SHARED) == 0) { > 544 x = SX_OWNER(x); > 545 owner = (struct thread *)x; > 546 if (TD_IS_RUNNING(owner)) { > 547 if (LOCK_LOG_TEST(&sx->lock_object, 0)) > 548 CTR3(KTR_LOCK, > 549 "%s: spinning on %p held by %p", > 550 __func__, sx, owner); > Could you please rather list frame #7 (zfsctl_snapdir_lookup+0x124)? -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5075E3E0.7060706>