From owner-freebsd-current@FreeBSD.ORG Tue Mar 31 21:17:35 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B5D2CF9B for ; Tue, 31 Mar 2015 21:17:35 +0000 (UTC) Received: from munin.odin-corporation.com (173-161-46-1-Illinois.hfc.comcastbusiness.net [173.161.46.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Lars Fredriksen", Issuer "Lars Fredriksen" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 736F9901 for ; Tue, 31 Mar 2015 21:17:34 +0000 (UTC) Received: from larsmacmini.fredriksen.us (valhall.odin-corporation.com [173.161.46.2]) (authenticated bits=0) by munin.odin-corporation.com (8.14.5/8.14.5) with ESMTP id t2VLHRk9075607 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 31 Mar 2015 16:17:27 -0500 (CDT) (envelope-from lars@odin-corporation.com) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2096\)) Subject: Re: Is a high witness refcount indicative of a missing unlock? From: Lars In-Reply-To: <5518957B.4050505@ShaneWare.Biz> Date: Tue, 31 Mar 2015 16:17:19 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <8858A68D-D68C-4CD5-A6D9-4886EE746216@odin-corporation.com> References: <1117D087-AD76-4A87-8798-AB5526BECF3A@odin-corporation.com> <5518957B.4050505@ShaneWare.Biz> To: Shane Ambler X-Mailer: Apple Mail (2.2096) Cc: freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Mar 2015 21:17:35 -0000 Hi Shane, While our configs shoulds much the same (ignoring 10.1-stable vs = current) and I had the nvidia driver loaded, my lockups did not involve = the nvidia driver. That of course does not necessarily mean anything if = the issue is squarly in zfs somewhere. You can see the reference counts from the ddb kernel debugger (man 8 = ddb) using the =E2=80=9Cshow witness=E2=80=9D command Lars > On Mar 29, 2015, at 19:14, Shane Ambler wrote: >=20 > On 30/03/2015 05:59, Lars wrote: >> Hi, I am poking around for a cause for my repeating deadlock issues >> on my system based on r 279869. ddb show witness show the = =C3=A2=E2=82=AC=C5=93vnode >> interlock=C3=A2=E2=82=AC=C2=9D and the =C3=A2=E2=82=AC=C5=93zfs=C3=A2=E2= =82=AC=C2=9D locks both with reference counts over >> 200K. Obviously they are related, and there is a find running (all >> the filesystems on this machine are zfs ( minus the specialty ones >> like devfs). >>=20 >> I don=C3=A2=E2=82=AC=E2=84=A2t see any other withness entry with = reference counts even in >> the ballpark of these two, so does this indicate that we have a >> vnode/zfs path were we don=C3=A2=E2=82=AC=E2=84=A2t unlock? >>=20 >=20 > I am running 10.1-STABLE and have bad locking issues. Running a = witness > kernel I got a duplicate lock from nvidia and lock order reversals > involving zfs. Any chance your issue is related to mine? >=20 > What command can give me the witness lock counts? >=20 > The debug data I have collected so far is at - > http://shaneware.biz/freebsddebugdata/ >=20 > The lock reversal output I had was (after uptime of about 12 mins) - >=20 > Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system = process `vnlru' to stop...done > Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system = process `bufdaemon' to stop...done > Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system = process `syncer' to stop... > Mar 24 00:24:25 leader kernel: Syncing disks, vnodes remaining...0 0 0 = 0 0 0 0 0 done > Mar 24 00:24:25 leader kernel: All buffers synced. > Mar 24 00:24:25 leader kernel: lock order reversal: > Mar 24 00:24:25 leader kernel: 1st 0xfffff800224555f0 zfs (zfs) @ = /usr/src/sys/kern/vfs_mount.c:1229 > Mar 24 00:24:25 leader kernel: 2nd 0xfffff800222d67c8 syncer (syncer) = @ /usr/src/sys/kern/vfs_subr.c:2268 > Mar 24 00:24:25 leader kernel: KDB: stack backtrace: > Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at = db_trace_self_wrapper+0x2b/frame 0xfffffe022df6e4c0 > Mar 24 00:24:25 leader kernel: kdb_backtrace() at = kdb_backtrace+0x39/frame 0xfffffe022df6e570 > Mar 24 00:24:25 leader kernel: witness_checkorder() at = witness_checkorder+0xdc2/frame 0xfffffe022df6e600 > Mar 24 00:24:25 leader kernel: __lockmgr_args() at = __lockmgr_args+0x9ea/frame 0xfffffe022df6e740 > Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame = 0xfffffe022df6e760 > Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at = VOP_LOCK1_APV+0xfc/frame 0xfffffe022df6e790 > Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame = 0xfffffe022df6e800 > Mar 24 00:24:25 leader kernel: vputx() at vputx+0x232/frame = 0xfffffe022df6e860 > Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x301/frame = 0xfffffe022df6e8e0 > Mar 24 00:24:25 leader kernel: vfs_unmountall() at = vfs_unmountall+0x61/frame 0xfffffe022df6e910 > Mar 24 00:24:25 leader kernel: kern_reboot() at = kern_reboot+0x540/frame 0xfffffe022df6e980 > Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame = 0xfffffe022df6e9a0 > Mar 24 00:24:25 leader kernel: amd64_syscall() at = amd64_syscall+0x25a/frame 0xfffffe022df6eab0 > Mar 24 00:24:25 leader kernel: Xfast_syscall() at = Xfast_syscall+0xfb/frame 0xfffffe022df6eab0 > Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, = sys_reboot), rip =3D 0x40f1bc, rsp =3D 0x7fffffffe6d8, rbp =3D = 0x7fffffffe7d0 --- > Mar 24 00:24:25 leader kernel: lock order reversal: > Mar 24 00:24:25 leader kernel: 1st 0xfffff800222d6b78 zfs (zfs) @ = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zfs_vfsops.c:1814 > Mar 24 00:24:25 leader kernel: 2nd 0xffffffff818514a8 allproc = (allproc) @ /usr/src/sys/kern/kern_descrip.c:2872 > Mar 24 00:24:25 leader kernel: KDB: stack backtrace: > Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at = db_trace_self_wrapper+0x2b/frame 0xfffffe022df6e690 > Mar 24 00:24:25 leader kernel: kdb_backtrace() at = kdb_backtrace+0x39/frame 0xfffffe022df6e740 > Mar 24 00:24:25 leader kernel: witness_checkorder() at = witness_checkorder+0xdc2/frame 0xfffffe022df6e7d0 > Mar 24 00:24:25 leader kernel: _sx_slock() at _sx_slock+0x76/frame = 0xfffffe022df6e810 > Mar 24 00:24:25 leader kernel: mountcheckdirs() at = mountcheckdirs+0x47/frame 0xfffffe022df6e860 > Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x36f/frame = 0xfffffe022df6e8e0 > Mar 24 00:24:25 leader kernel: vfs_unmountall() at = vfs_unmountall+0x61/frame 0xfffffe022df6e910 > Mar 24 00:24:25 leader kernel: kern_reboot() at = kern_reboot+0x540/frame 0xfffffe022df6e980 > Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame = 0xfffffe022df6e9a0 > Mar 24 00:24:25 leader kernel: amd64_syscall() at = amd64_syscall+0x25a/frame 0xfffffe022df6eab0 > Mar 24 00:24:25 leader kernel: Xfast_syscall() at = Xfast_syscall+0xfb/frame 0xfffffe022df6eab0 > Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, = sys_reboot), rip =3D 0x40f1bc, rsp =3D 0x7fffffffe6d8, rbp =3D = 0x7fffffffe7d0 --- > Mar 24 00:24:25 leader kernel: lock order reversal: > Mar 24 00:24:25 leader kernel: 1st 0xfffff8001ca8e240 zfs (zfs) @ = /usr/src/sys/kern/vfs_mount.c:1229 > Mar 24 00:24:25 leader kernel: 2nd 0xfffff8001ca8e5f0 devfs (devfs) @ = /usr/src/sys/kern/vfs_subr.c:2157 > Mar 24 00:24:25 leader kernel: KDB: stack backtrace: > Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at = db_trace_self_wrapper+0x2b/frame 0xfffffe022df6e460 > Mar 24 00:24:25 leader kernel: kdb_backtrace() at = kdb_backtrace+0x39/frame 0xfffffe022df6e510 > Mar 24 00:24:25 leader kernel: witness_checkorder() at = witness_checkorder+0xdc2/frame 0xfffffe022df6e5a0 > Mar 24 00:24:25 leader kernel: __lockmgr_args() at = __lockmgr_args+0x9ea/frame 0xfffffe022df6e6e0 > Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame = 0xfffffe022df6e700 > Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at = VOP_LOCK1_APV+0xfc/frame 0xfffffe022df6e730 > Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame = 0xfffffe022df6e7a0 > Mar 24 00:24:25 leader kernel: vget() at vget+0x67/frame = 0xfffffe022df6e7e0 > Mar 24 00:24:25 leader kernel: devfs_allocv() at = devfs_allocv+0xfd/frame 0xfffffe022df6e830 > Mar 24 00:24:25 leader kernel: devfs_root() at devfs_root+0x43/frame = 0xfffffe022df6e860 > Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x345/frame = 0xfffffe022df6e8e0 > Mar 24 00:24:25 leader kernel: vfs_unmountall() at = vfs_unmountall+0x61/frame 0xfffffe022df6e910 > Mar 24 00:24:25 leader kernel: kern_reboot() at = kern_reboot+0x540/frame 0xfffffe022df6e980 > Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame = 0xfffffe022df6e9a0 > Mar 24 00:24:25 leader kernel: amd64_syscall() at = amd64_syscall+0x25a/frame 0xfffffe022df6eab0 > Mar 24 00:24:25 leader kernel: Xfast_syscall() at = Xfast_syscall+0xfb/frame 0xfffffe022df6eab0 > Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, = sys_reboot), rip =3D 0x40f1bc, rsp =3D 0x7fffffffe6d8, rbp =3D = 0x7fffffffe7d0 --- > Mar 24 00:24:25 leader kernel: Uptime: 12m42s >=20 >=20 > --=20 > FreeBSD - the place to B...Software Developing >=20 > Shane Ambler >=20 > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" >=20