Date: Thu, 11 Oct 2012 00:08:48 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Sean Chittenden <sean@chittenden.org> Cc: freebsd-fs@FreeBSD.org Subject: Re: ZFS crashing during snapdir lookup for non-existent snapshot... Message-ID: <5075E3E0.7060706@FreeBSD.org> In-Reply-To: <B244C0E9-539D-4F7C-8616-378E8469F4BB@chittenden.org> References: <B244C0E9-539D-4F7C-8616-378E8469F4BB@chittenden.org>
next in thread | previous in thread | raw e-mail | index | archive | help
on 10/10/2012 23:57 Sean Chittenden said the following:
> Using a FreeBSD -STABLE build from 2012-09-17, I now have the ability to crash FreeBSD/ZFS within a few hours of stress testing. It appears as though there's a locking problem when attempting to interrogate stats on a ZFS snapshot that doesn't exist any more. I believe the scenario is as follows:
> 
> Background:
> 
> *) `zfs set snapdir=visible` /was/ set on a data set
> 
> *) Snapshots were being run once an hour for weeks, long enough for zabbix to auto-discover the snapshots as valid file systems.
> 
> *) `zfs inherit snapdir` was recently set (about a week ago), but zabbix is still attempting to inquire about no snapshots that are no longer visible or exist.
> 
> 
> After snapshots were deleted through the normal process of aging, zabbix is still interrogating the file system attempting to acquire information about the now deleted snapshots.
> 
> FreeBSD crashes once every few minutes when zabbix is running and pulling ZFS information about the now hidden (or most likely deleted) snapshots. I believe that zabbix is using getfsspec(3) with the now stale snapshot name in rapid succession and is somehow triggering a race when there are two concurrent calls to two different non-existent snapshots.
> 
> -sc
> 
> 
> kernel: Fatal trap 12: page fault while in kernel mode
> kernel: cpuid = 0; apic id = 00
> kernel: fault virtual address    = 0x368
> kernel: fault code               = supervisor read data, page not present
> kernel: instruction pointer      = 0x20:0xffffffff80922be2
> kernel: stack pointer            = 0x28:0xffffff8487d7b0d0
> kernel: frame pointer            = 0x28:0xffffff8487d7b170
> kernel: code segment             = base 0x0, limit 0xfffff, type 0x1b
> kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
> kernel: processor eflags = interrupt enabled, resume, IOPL = 0
> kernel: current process          = 3536 (zabbix_agentd)
> kernel: trap number              = 12
> kernel: panic: page fault
> kernel: cpuid = 0
> kernel: KDB: stack backtrace:
> kernel: #0 0xffffffff80950800 at kdb_backtrace+0x60
> kernel: #1 0xffffffff8091ac2d at panic+0x1fd
> kernel: #2 0xffffffff80c21858 at trap_fatal+0x388
> kernel: #3 0xffffffff80c21b23 at trap_pfault+0x2b3
> kernel: #4 0xffffffff80c212b5 at trap+0x5b5
> kernel: #5 0xffffffff80c0ba22 at calltrap+0x8
> kernel: #6 0xffffffff8092271e at _sx_xlock+0x5e
> kernel: #7 0xffffffff816e9384 at zfsctl_snapdir_lookup+0x124
> kernel: #8 0xffffffff80cb385f at VOP_LOOKUP_APV+0x5f
> kernel: #9 0xffffffff809a307f at lookup+0x5ef
> kernel: #10 0xffffffff809a263d at namei+0x62d
> kernel: #11 0xffffffff809b2b39 at kern_statfs+0x89
> kernel: #12 0xffffffff809b2a80 at sys_statfs+0x20
> kernel: #13 0xffffffff80c22134 at amd64_syscall+0x334
> 
> FreeBSD example.com 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Mon Sep 17 04:34:37 UTC 2012     root@example.com:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> 0xffffffff80922be2 is in _sx_xlock_hard (/usr/src/sys/kern/kern_sx.c:546).
> 541			x = sx->sx_lock;
> 542			if ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0) {
> 543				if ((x & SX_LOCK_SHARED) == 0) {
> 544					x = SX_OWNER(x);
> 545					owner = (struct thread *)x;
> 546					if (TD_IS_RUNNING(owner)) {
> 547						if (LOCK_LOG_TEST(&sx->lock_object, 0))
> 548							CTR3(KTR_LOCK,
> 549						    "%s: spinning on %p held by %p",
> 550							    __func__, sx, owner);
> 
Could you please rather list frame #7 (zfsctl_snapdir_lookup+0x124)?
-- 
Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5075E3E0.7060706>
