Date: Wed, 10 Oct 2012 13:57:00 -0700 From: Sean Chittenden <sean@chittenden.org> To: freebsd-fs@freebsd.org Subject: ZFS crashing during snapdir lookup for non-existent snapshot... Message-ID: <B244C0E9-539D-4F7C-8616-378E8469F4BB@chittenden.org>
next in thread | raw e-mail | index | archive | help
Using a FreeBSD -STABLE build from 2012-09-17, I now have the ability to = crash FreeBSD/ZFS within a few hours of stress testing. It appears as = though there's a locking problem when attempting to interrogate stats on = a ZFS snapshot that doesn't exist any more. I believe the scenario is as = follows: Background: *) `zfs set snapdir=3Dvisible` /was/ set on a data set *) Snapshots were being run once an hour for weeks, long enough for = zabbix to auto-discover the snapshots as valid file systems. *) `zfs inherit snapdir` was recently set (about a week ago), but zabbix = is still attempting to inquire about no snapshots that are no longer = visible or exist. After snapshots were deleted through the normal process of aging, zabbix = is still interrogating the file system attempting to acquire information = about the now deleted snapshots. FreeBSD crashes once every few minutes when zabbix is running and = pulling ZFS information about the now hidden (or most likely deleted) = snapshots. I believe that zabbix is using getfsspec(3) with the now = stale snapshot name in rapid succession and is somehow triggering a race = when there are two concurrent calls to two different non-existent = snapshots. -sc kernel: Fatal trap 12: page fault while in kernel mode kernel: cpuid =3D 0; apic id =3D 00 kernel: fault virtual address =3D 0x368 kernel: fault code =3D supervisor read data, page not = present kernel: instruction pointer =3D 0x20:0xffffffff80922be2 kernel: stack pointer =3D 0x28:0xffffff8487d7b0d0 kernel: frame pointer =3D 0x28:0xffffff8487d7b170 kernel: code segment =3D base 0x0, limit 0xfffff, type 0x1b kernel: =3D DPL 0, pres 1, long 1, def32 0, gran 1 kernel: processor eflags =3D interrupt enabled, resume, IOPL =3D 0 kernel: current process =3D 3536 (zabbix_agentd) kernel: trap number =3D 12 kernel: panic: page fault kernel: cpuid =3D 0 kernel: KDB: stack backtrace: kernel: #0 0xffffffff80950800 at kdb_backtrace+0x60 kernel: #1 0xffffffff8091ac2d at panic+0x1fd kernel: #2 0xffffffff80c21858 at trap_fatal+0x388 kernel: #3 0xffffffff80c21b23 at trap_pfault+0x2b3 kernel: #4 0xffffffff80c212b5 at trap+0x5b5 kernel: #5 0xffffffff80c0ba22 at calltrap+0x8 kernel: #6 0xffffffff8092271e at _sx_xlock+0x5e kernel: #7 0xffffffff816e9384 at zfsctl_snapdir_lookup+0x124 kernel: #8 0xffffffff80cb385f at VOP_LOOKUP_APV+0x5f kernel: #9 0xffffffff809a307f at lookup+0x5ef kernel: #10 0xffffffff809a263d at namei+0x62d kernel: #11 0xffffffff809b2b39 at kern_statfs+0x89 kernel: #12 0xffffffff809b2a80 at sys_statfs+0x20 kernel: #13 0xffffffff80c22134 at amd64_syscall+0x334 FreeBSD example.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Mon Sep 17 = 04:34:37 UTC 2012 root@example.com:/usr/obj/usr/src/sys/GENERIC = amd64 0xffffffff80922be2 is in _sx_xlock_hard = (/usr/src/sys/kern/kern_sx.c:546). 541 x =3D sx->sx_lock; 542 if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) = =3D=3D 0) { 543 if ((x & SX_LOCK_SHARED) =3D=3D 0) { 544 x =3D SX_OWNER(x); 545 owner =3D (struct thread *)x; 546 if (TD_IS_RUNNING(owner)) { 547 if = (LOCK_LOG_TEST(&sx->lock_object, 0)) 548 CTR3(KTR_LOCK, 549 "%s: spinning on %p = held by %p", 550 __func__, = sx, owner); -- Sean Chittenden sean@chittenden.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B244C0E9-539D-4F7C-8616-378E8469F4BB>