From owner-freebsd-fs@FreeBSD.ORG Mon Aug 19 09:51:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D48E1A6D for ; Mon, 19 Aug 2013 09:51:08 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.org (mail.yamagi.org [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id 3C6E22064 for ; Mon, 19 Aug 2013 09:51:08 +0000 (UTC) Received: from lennart.pwag-local.de (unknown [212.48.125.109]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.org (Postfix) with ESMTPSA id 44E0A1666312 for ; Mon, 19 Aug 2013 11:51:06 +0200 (CEST) Date: Mon, 19 Aug 2013 11:51:01 +0200 From: Yamagi Burmeister To: freebsd-fs@freebsd.org Subject: 9.2-RC1: LORs / Deadlock with SU+J on HAST in "memsync" mode Message-Id: <20130819115101.ae9c0cf788f881dc4de464c5@yamagi.org> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.19; amd64-portbld-freebsd9.2) Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA1"; boundary="Signature=_Mon__19_Aug_2013_11_51_01_+0200_Zrns0XowhPXw=CDr" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Aug 2013 09:51:08 -0000 --Signature=_Mon__19_Aug_2013_11_51_01_+0200_Zrns0XowhPXw=CDr Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello, in the last week I've seen several deadlocked processes (stucked in state "vofflock" or "wswbuf0" on an UFS2 filesystem with SU+J on HAST with "memsync" mode. TheThe deadlock disappear if either SU+J is disabled (only SU are active) or HAST is switched to "fullsync" mode. The system is: FreeBSD tvtransfer.local 9.2-RC1 FreeBSD 9.2-RC1 #0 r254355M: Fri Aug 16 12:35:30 UTC 2013 support@tvtransfer.local:/usr/obj/usr/src/sys/GENERIC amd64 I've build a kernel with full debugging support and seen this two LOR beside the known false positive between bufwait and dirhash: lock order reversal: 1st 0xfffffe0197310d68 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2223 2nd 0xffffff87be8be0f8 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:261 3rd 0xfffffe019746f098 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2223 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff88715dcd50 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff88715dce10 _witness_debugger() at _witness_debugger+0x2c/frame 0xffffff88715dce30 witness_checkorder() at witness_checkorder +0x875/frame 0xffffff88715dcef0 __lockmgr_args() at __lockmgr_args +0x1141/frame 0xffffff88715dcfd0 ffs_lock() at ffs_lock+0x9c/frame 0xffffff88715dd020 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xe3/frame 0xffffff88715dd050 _vn_lock() at _vn_lock+0x55/frame 0xffffff88715dd0b0 vget() at vget+0x7b/frame 0xffffff88715dd100 vfs_hash_get() at vfs_hash_get+0xd5/frame 0xffffff88715dd150 ffs_vgetf() at ffs_vgetf+0x48/frame 0xffffff88715dd1d0 softdep_sync_buf() at softdep_sync_buf+0x547/frame 0xffffff88715dd2a0 ffs_syncvnode() at ffs_syncvnode+0x2c1/frame 0xffffff88715dd310 ffs_truncate() at ffs_truncate+0x10bb/frame 0xffffff88715dd560 ufs_direnter() at ufs_direnter+0x550/frame 0xffffff88715dd630 ufs_makeinode() at ufs_makeinode+0x355/frame 0xffffff88715dd7f0 VOP_CREATE_APV() at VOP_CREATE_APV+0x102/frame 0xffffff88715dd820 vn_open_cred() at vn_open_cred+0x45e/frame 0xffffff88715dd990 kern_openat() at kern_openat+0x20a/frame 0xffffff88715ddb10 amd64_syscall() at amd64_syscall+0x2f9/frame 0xffffff88715ddc30 Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff88715ddc30 --- syscall (5, FreeBSD ELF64, sys_open), rip =3D 0x800b38d3c, rsp =3D 0x7fffffffd128, rbp =3D 0x7fffffffda40 ---=20 lock order reversal: 1st 0xfffffe0035dfbf30 so_snd_sx (so_snd_sx) @ /usr/src/sys/kern/uipc_sockbuf.c:145 2nd 0xfffffe02ead2b328 ufs (ufs) @ /usr/src/sys/kern/uipc_syscalls.c:2062 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff8871515530 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff88715155f0 _witness_debugger() at _witness_debugger+0x2c/frame 0xffffff8871515610 witness_checkorder() at witness_checkorder +0x875/frame 0xffffff88715156d0 __lockmgr_args() at __lockmgr_args +0x1446/frame 0xffffff88715157b0 ffs_lock() at ffs_lock+0x9c/frame 0xffffff8871515800 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xe3/frame 0xffffff8871515830 _vn_lock() at _vn_lock+0x55/frame 0xffffff8871515890 kern_sendfile() at kern_sendfile+0x8e7/frame 0xffffff8871515ab0 do_sendfile() at do_sendfile+0xdc/frame 0xffffff8871515b10 amd64_syscall() at amd64_syscall+0x2f9/frame 0xffffff8871515c30 Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff8871515c30 --- syscall (393, FreeBSD ELF64, sys_sendfile), rip =3D 0x8022075bc, rsp =3D 0x7fffffffb268, rbp =3D 0x7fffffffd570 --- Both LORs can be found in a more readable format here: http://deponie.yamagi.org/freebsd/debug/lor_hast/lor.txt The transcript of a DDB session with all informations requested in chapter "10.7. Debugging Deadlocks" of the developers handbook can be found here: http://deponie.yamagi.org/freebsd/debug/lor_hast/ddb.txt Reproducing this problem is easy: 1. Create a HAST setup in "memsync" mode with both primary and secondary connected 2. "newfs -U -j /dev/hast/$device" on the primary 3. mount "mount /dev/hast/$device /mnt" 4. Create some load on /mnt. For example copy /usr/ports in an endless loop. Sooner or later processes accessing /mnt will deadlock. This may be easer to reproduce while HAST is performing the initial sync. More information can be provided if necessary. Thanks, Yamagi --=20 Homepage: www.yamagi.org XMPP: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB --Signature=_Mon__19_Aug_2013_11_51_01_+0200_Zrns0XowhPXw=CDr Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iEYEARECAAYFAlIR6ooACgkQWTjlg++8y8vSHgCfXlfbW9JY8u8L+wuqp8TmdUFa RPcAoNx26N4sp+J4VrDHqwJ8svriKCpQ =STtg -----END PGP SIGNATURE----- --Signature=_Mon__19_Aug_2013_11_51_01_+0200_Zrns0XowhPXw=CDr--