Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Oct 2012 13:57:00 -0700
From:      Sean Chittenden <sean@chittenden.org>
To:        freebsd-fs@freebsd.org
Subject:   ZFS crashing during snapdir lookup for non-existent snapshot...
Message-ID:  <B244C0E9-539D-4F7C-8616-378E8469F4BB@chittenden.org>

next in thread | raw e-mail | index | archive | help
Using a FreeBSD -STABLE build from 2012-09-17, I now have the ability to =
crash FreeBSD/ZFS within a few hours of stress testing. It appears as =
though there's a locking problem when attempting to interrogate stats on =
a ZFS snapshot that doesn't exist any more. I believe the scenario is as =
follows:

Background:

*) `zfs set snapdir=3Dvisible` /was/ set on a data set

*) Snapshots were being run once an hour for weeks, long enough for =
zabbix to auto-discover the snapshots as valid file systems.

*) `zfs inherit snapdir` was recently set (about a week ago), but zabbix =
is still attempting to inquire about no snapshots that are no longer =
visible or exist.


After snapshots were deleted through the normal process of aging, zabbix =
is still interrogating the file system attempting to acquire information =
about the now deleted snapshots.

FreeBSD crashes once every few minutes when zabbix is running and =
pulling ZFS information about the now hidden (or most likely deleted) =
snapshots. I believe that zabbix is using getfsspec(3) with the now =
stale snapshot name in rapid succession and is somehow triggering a race =
when there are two concurrent calls to two different non-existent =
snapshots.

-sc


kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid =3D 0; apic id =3D 00
kernel: fault virtual address    =3D 0x368
kernel: fault code               =3D supervisor read data, page not =
present
kernel: instruction pointer      =3D 0x20:0xffffffff80922be2
kernel: stack pointer            =3D 0x28:0xffffff8487d7b0d0
kernel: frame pointer            =3D 0x28:0xffffff8487d7b170
kernel: code segment             =3D base 0x0, limit 0xfffff, type 0x1b
kernel: =3D DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags =3D interrupt enabled, resume, IOPL =3D 0
kernel: current process          =3D 3536 (zabbix_agentd)
kernel: trap number              =3D 12
kernel: panic: page fault
kernel: cpuid =3D 0
kernel: KDB: stack backtrace:
kernel: #0 0xffffffff80950800 at kdb_backtrace+0x60
kernel: #1 0xffffffff8091ac2d at panic+0x1fd
kernel: #2 0xffffffff80c21858 at trap_fatal+0x388
kernel: #3 0xffffffff80c21b23 at trap_pfault+0x2b3
kernel: #4 0xffffffff80c212b5 at trap+0x5b5
kernel: #5 0xffffffff80c0ba22 at calltrap+0x8
kernel: #6 0xffffffff8092271e at _sx_xlock+0x5e
kernel: #7 0xffffffff816e9384 at zfsctl_snapdir_lookup+0x124
kernel: #8 0xffffffff80cb385f at VOP_LOOKUP_APV+0x5f
kernel: #9 0xffffffff809a307f at lookup+0x5ef
kernel: #10 0xffffffff809a263d at namei+0x62d
kernel: #11 0xffffffff809b2b39 at kern_statfs+0x89
kernel: #12 0xffffffff809b2a80 at sys_statfs+0x20
kernel: #13 0xffffffff80c22134 at amd64_syscall+0x334

FreeBSD example.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Mon Sep 17 =
04:34:37 UTC 2012     root@example.com:/usr/obj/usr/src/sys/GENERIC  =
amd64

0xffffffff80922be2 is in _sx_xlock_hard =
(/usr/src/sys/kern/kern_sx.c:546).
541			x =3D sx->sx_lock;
542			if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) =
=3D=3D 0) {
543				if ((x & SX_LOCK_SHARED) =3D=3D 0) {
544					x =3D SX_OWNER(x);
545					owner =3D (struct thread *)x;
546					if (TD_IS_RUNNING(owner)) {
547						if =
(LOCK_LOG_TEST(&sx->lock_object, 0))
548							CTR3(KTR_LOCK,
549						    "%s: spinning on %p =
held by %p",
550							    __func__, =
sx, owner);




--
Sean Chittenden
sean@chittenden.org




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B244C0E9-539D-4F7C-8616-378E8469F4BB>