Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Mar 2015 16:17:19 -0500
From:      Lars <lars@odin-corporation.com>
To:        Shane Ambler <FreeBSD@ShaneWare.Biz>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Is a high witness refcount indicative of a missing unlock?
Message-ID:  <8858A68D-D68C-4CD5-A6D9-4886EE746216@odin-corporation.com>
In-Reply-To: <5518957B.4050505@ShaneWare.Biz>
References:  <1117D087-AD76-4A87-8798-AB5526BECF3A@odin-corporation.com> <5518957B.4050505@ShaneWare.Biz>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Shane,
While our configs shoulds much the same (ignoring 10.1-stable vs =
current) and I had the nvidia driver loaded, my lockups did not involve =
the nvidia driver. That of course does not necessarily mean anything if =
the issue is squarly in zfs somewhere.

You can see the reference counts from the ddb kernel debugger (man 8 =
ddb) using the =E2=80=9Cshow witness=E2=80=9D command

Lars
> On Mar 29, 2015, at 19:14, Shane Ambler <FreeBSD@ShaneWare.Biz> wrote:
>=20
> On 30/03/2015 05:59, Lars wrote:
>> Hi, I am poking around for a cause for my repeating deadlock issues
>> on my system based on r 279869. ddb show witness show the =
=C3=A2=E2=82=AC=C5=93vnode
>> interlock=C3=A2=E2=82=AC=C2=9D and the =C3=A2=E2=82=AC=C5=93zfs=C3=A2=E2=
=82=AC=C2=9D locks both with reference counts over
>> 200K. Obviously they are related, and there is a find running (all
>> the filesystems on this machine are zfs ( minus the specialty ones
>> like devfs).
>>=20
>> I don=C3=A2=E2=82=AC=E2=84=A2t see any other withness entry with =
reference counts even in
>> the ballpark of these two, so does this indicate that we have a
>> vnode/zfs path were we don=C3=A2=E2=82=AC=E2=84=A2t unlock?
>>=20
>=20
> I am running 10.1-STABLE and have bad locking issues. Running a =
witness
> kernel I got a duplicate lock from nvidia and lock order reversals
> involving zfs. Any chance your issue is related to mine?
>=20
> What command can give me the witness lock counts?
>=20
> The debug data I have collected so far is at -
> http://shaneware.biz/freebsddebugdata/
>=20
> The lock reversal output I had was (after uptime of about 12 mins) -
>=20
> Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system =
process `vnlru' to stop...done
> Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system =
process `bufdaemon' to stop...done
> Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system =
process `syncer' to stop...
> Mar 24 00:24:25 leader kernel: Syncing disks, vnodes remaining...0 0 0 =
0 0 0 0 0 done
> Mar 24 00:24:25 leader kernel: All buffers synced.
> Mar 24 00:24:25 leader kernel: lock order reversal:
> Mar 24 00:24:25 leader kernel: 1st 0xfffff800224555f0 zfs (zfs) @ =
/usr/src/sys/kern/vfs_mount.c:1229
> Mar 24 00:24:25 leader kernel: 2nd 0xfffff800222d67c8 syncer (syncer) =
@ /usr/src/sys/kern/vfs_subr.c:2268
> Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
> Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at =
db_trace_self_wrapper+0x2b/frame 0xfffffe022df6e4c0
> Mar 24 00:24:25 leader kernel: kdb_backtrace() at =
kdb_backtrace+0x39/frame 0xfffffe022df6e570
> Mar 24 00:24:25 leader kernel: witness_checkorder() at =
witness_checkorder+0xdc2/frame 0xfffffe022df6e600
> Mar 24 00:24:25 leader kernel: __lockmgr_args() at =
__lockmgr_args+0x9ea/frame 0xfffffe022df6e740
> Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame =
0xfffffe022df6e760
> Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at =
VOP_LOCK1_APV+0xfc/frame 0xfffffe022df6e790
> Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame =
0xfffffe022df6e800
> Mar 24 00:24:25 leader kernel: vputx() at vputx+0x232/frame =
0xfffffe022df6e860
> Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x301/frame =
0xfffffe022df6e8e0
> Mar 24 00:24:25 leader kernel: vfs_unmountall() at =
vfs_unmountall+0x61/frame 0xfffffe022df6e910
> Mar 24 00:24:25 leader kernel: kern_reboot() at =
kern_reboot+0x540/frame 0xfffffe022df6e980
> Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame =
0xfffffe022df6e9a0
> Mar 24 00:24:25 leader kernel: amd64_syscall() at =
amd64_syscall+0x25a/frame 0xfffffe022df6eab0
> Mar 24 00:24:25 leader kernel: Xfast_syscall() at =
Xfast_syscall+0xfb/frame 0xfffffe022df6eab0
> Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, =
sys_reboot), rip =3D 0x40f1bc, rsp =3D 0x7fffffffe6d8, rbp =3D =
0x7fffffffe7d0 ---
> Mar 24 00:24:25 leader kernel: lock order reversal:
> Mar 24 00:24:25 leader kernel: 1st 0xfffff800222d6b78 zfs (zfs) @ =
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/=
zfs_vfsops.c:1814
> Mar 24 00:24:25 leader kernel: 2nd 0xffffffff818514a8 allproc =
(allproc) @ /usr/src/sys/kern/kern_descrip.c:2872
> Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
> Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at =
db_trace_self_wrapper+0x2b/frame 0xfffffe022df6e690
> Mar 24 00:24:25 leader kernel: kdb_backtrace() at =
kdb_backtrace+0x39/frame 0xfffffe022df6e740
> Mar 24 00:24:25 leader kernel: witness_checkorder() at =
witness_checkorder+0xdc2/frame 0xfffffe022df6e7d0
> Mar 24 00:24:25 leader kernel: _sx_slock() at _sx_slock+0x76/frame =
0xfffffe022df6e810
> Mar 24 00:24:25 leader kernel: mountcheckdirs() at =
mountcheckdirs+0x47/frame 0xfffffe022df6e860
> Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x36f/frame =
0xfffffe022df6e8e0
> Mar 24 00:24:25 leader kernel: vfs_unmountall() at =
vfs_unmountall+0x61/frame 0xfffffe022df6e910
> Mar 24 00:24:25 leader kernel: kern_reboot() at =
kern_reboot+0x540/frame 0xfffffe022df6e980
> Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame =
0xfffffe022df6e9a0
> Mar 24 00:24:25 leader kernel: amd64_syscall() at =
amd64_syscall+0x25a/frame 0xfffffe022df6eab0
> Mar 24 00:24:25 leader kernel: Xfast_syscall() at =
Xfast_syscall+0xfb/frame 0xfffffe022df6eab0
> Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, =
sys_reboot), rip =3D 0x40f1bc, rsp =3D 0x7fffffffe6d8, rbp =3D =
0x7fffffffe7d0 ---
> Mar 24 00:24:25 leader kernel: lock order reversal:
> Mar 24 00:24:25 leader kernel: 1st 0xfffff8001ca8e240 zfs (zfs) @ =
/usr/src/sys/kern/vfs_mount.c:1229
> Mar 24 00:24:25 leader kernel: 2nd 0xfffff8001ca8e5f0 devfs (devfs) @ =
/usr/src/sys/kern/vfs_subr.c:2157
> Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
> Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at =
db_trace_self_wrapper+0x2b/frame 0xfffffe022df6e460
> Mar 24 00:24:25 leader kernel: kdb_backtrace() at =
kdb_backtrace+0x39/frame 0xfffffe022df6e510
> Mar 24 00:24:25 leader kernel: witness_checkorder() at =
witness_checkorder+0xdc2/frame 0xfffffe022df6e5a0
> Mar 24 00:24:25 leader kernel: __lockmgr_args() at =
__lockmgr_args+0x9ea/frame 0xfffffe022df6e6e0
> Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame =
0xfffffe022df6e700
> Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at =
VOP_LOCK1_APV+0xfc/frame 0xfffffe022df6e730
> Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame =
0xfffffe022df6e7a0
> Mar 24 00:24:25 leader kernel: vget() at vget+0x67/frame =
0xfffffe022df6e7e0
> Mar 24 00:24:25 leader kernel: devfs_allocv() at =
devfs_allocv+0xfd/frame 0xfffffe022df6e830
> Mar 24 00:24:25 leader kernel: devfs_root() at devfs_root+0x43/frame =
0xfffffe022df6e860
> Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x345/frame =
0xfffffe022df6e8e0
> Mar 24 00:24:25 leader kernel: vfs_unmountall() at =
vfs_unmountall+0x61/frame 0xfffffe022df6e910
> Mar 24 00:24:25 leader kernel: kern_reboot() at =
kern_reboot+0x540/frame 0xfffffe022df6e980
> Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame =
0xfffffe022df6e9a0
> Mar 24 00:24:25 leader kernel: amd64_syscall() at =
amd64_syscall+0x25a/frame 0xfffffe022df6eab0
> Mar 24 00:24:25 leader kernel: Xfast_syscall() at =
Xfast_syscall+0xfb/frame 0xfffffe022df6eab0
> Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, =
sys_reboot), rip =3D 0x40f1bc, rsp =3D 0x7fffffffe6d8, rbp =3D =
0x7fffffffe7d0 ---
> Mar 24 00:24:25 leader kernel: Uptime: 12m42s
>=20
>=20
> --=20
> FreeBSD - the place to B...Software Developing
>=20
> Shane Ambler
>=20
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to =
"freebsd-current-unsubscribe@freebsd.org"
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8858A68D-D68C-4CD5-A6D9-4886EE746216>