From owner-freebsd-bugs@FreeBSD.ORG Sun Sep 11 15:50:07 2011 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECF3A106566C for ; Sun, 11 Sep 2011 15:50:07 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C9EDB8FC15 for ; Sun, 11 Sep 2011 15:50:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p8BFo7vd005554 for ; Sun, 11 Sep 2011 15:50:07 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p8BFo71E005553; Sun, 11 Sep 2011 15:50:07 GMT (envelope-from gnats) Resent-Date: Sun, 11 Sep 2011 15:50:07 GMT Resent-Message-Id: <201109111550.p8BFo71E005553@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Hans Ottevanger Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9EA4B1065674 for ; Sun, 11 Sep 2011 15:40:44 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 8288C8FC1F for ; Sun, 11 Sep 2011 15:40:44 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p8BFehlX089532 for ; Sun, 11 Sep 2011 15:40:43 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id p8BFehHl089531; Sun, 11 Sep 2011 15:40:43 GMT (envelope-from nobody) Message-Id: <201109111540.p8BFehHl089531@red.freebsd.org> Date: Sun, 11 Sep 2011 15:40:43 GMT From: Hans Ottevanger To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/160662: Snapshots cause a lockup on UFS with SU+J enabled X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Sep 2011 15:50:08 -0000 >Number: 160662 >Category: kern >Synopsis: Snapshots cause a lockup on UFS with SU+J enabled >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Sep 11 15:50:07 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Hans Ottevanger >Release: 9.0-BETA2 >Organization: >Environment: FreeBSD testp4.beastielabs.net 9.0-BETA2 FreeBSD 9.0-BETA2 #0: Wed Aug 31 17:26:34 UTC 2011 root@obrian.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 >Description: On a UFS filesystem with SU+J enabled attempting to make a snapshot with mksnap_ffs causes the system to lockup completely after a while, needing a reset to recover. This is not the extreme slowdown due to the snapshot taking all available disk bandwidth: the system becomes fully unresponsive, i.e. no reaction on keyboard or mouse and e.g. remote ssh sessions just stop. However, the system remains pingable. If journalling is disabled by running tunefs -j disable (in single user mode, if needed), making a snapshot will succeed again. Two lock order reversal occurs in both cases, identical modulo the addresses. These are the ones for an SU+J case: lock order reversal: 1st 0xc6347498 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425 2nd 0xdf326728 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 3rd 0xc603baf8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546 KDB: stack backtrace: db_trace_self_wrapper(c0efdd0c,616e735f,6f687370,3a632e74,a363435,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c0a415fb,c0f016fc,c5965370,c5969198,c57a8404,...) at kdb_backtrace+0x2a _witness_debugger(c0f016fc,c603baf8,c0ef0968,c5969198,c0f2c002,...) at _witness_debugger+0x25 witness_checkorder(c603baf8,9,c0f2c002,222,0,...) at witness_checkorder+0x839 __lockmgr_args(c603baf8,80100,c603bb18,0,0,...) at __lockmgr_args+0x824 ffs_lock(c57a852c,c11dd3c8,c5ee5390,80100,c603baa0,...) at ffs_lock+0x8a VOP_LOCK1_APV(c1047760,c57a852c,c57a854c,c1057e00,c603baa0,...) at VOP_LOCK1_APV+0xb5 _vn_lock(c603baa0,80100,c0f2c002,222,c598de80,...) at _vn_lock+0x5e ffs_snapshot(c5ec4798,c5aea300,c0f2f410,1a2,0,...) at ffs_snapshot+0x14fc ffs_mount(c5ec4798,c5ca6000,ff,394,c5962450,...) at ffs_mount+0x1c13 vfs_donmount(c5ee52e0,11000,c5997d80,c5997d80,c5f0d588,...) at vfs_donmount+0x1219 nmount(c5ee52e0,c57a8cec,c57a8d28,c0efffda,0,...) at nmount+0x84 syscallenter(c5ee52e0,c57a8ce4,c57a8ce4,0,0,...) at syscallenter+0x263 syscall(c57a8d28) at syscall+0x34 Xint0x80_syscall() at Xint0x80_syscall+0x21 --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280dc61b, esp = 0xbfbfe56c, ebp = 0xbfbfece8 --- lock order reversal: 1st 0xdf326728 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 2nd 0xc5995a1c snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818 KDB: stack backtrace: db_trace_self_wrapper(c0efdd0c,662f7366,735f7366,7370616e,2e746f68,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c0a415fb,c0f016e3,c5965370,c5969540,c57a8404,...) at kdb_backtrace+0x2a _witness_debugger(c0f016e3,c5995a1c,c0f2c064,c5969540,c0f2c002,...) at _witness_debugger+0x25 witness_checkorder(c5995a1c,9,c0f2c002,332,c63474b8,...) at witness_checkorder+0x839 __lockmgr_args(c5995a1c,80400,c63474b8,0,0,...) at __lockmgr_args+0x824 ffs_lock(c57a852c,df2f7f68,100000,80400,c6347440,...) at ffs_lock+0x8a VOP_LOCK1_APV(c1047760,c57a852c,df2f7fc4,c1057e00,c6347440,...) at VOP_LOCK1_APV+0xb5 _vn_lock(c6347440,80400,c0f2c002,332,0,...) at _vn_lock+0x5e ffs_snapshot(c5ec4798,c5aea300,c0f2f410,1a2,0,...) at ffs_snapshot+0x298e ffs_mount(c5ec4798,c5ca6000,ff,394,c5962450,...) at ffs_mount+0x1c13 vfs_donmount(c5ee52e0,11000,c5997d80,c5997d80,c5f0d588,...) at vfs_donmount+0x1219 nmount(c5ee52e0,c57a8cec,c57a8d28,c0efffda,0,...) at nmount+0x84 syscallenter(c5ee52e0,c57a8ce4,c57a8ce4,0,0,...) at syscallenter+0x263 syscall(c57a8d28) at syscall+0x34 Xint0x80_syscall() at Xint0x80_syscall+0x21 --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280dc61b, esp = 0xbfbfe56c, ebp = 0xbfbfece8 It is not clear if the LORs are related to the lockup. This issue specifically occurs on i386 (2.4 GHz P4, 2 GByte RAM, 500 GByte PATA disk) running 9.0-BETA2 as distributed, but the problem is also 100% reproducible on amd64 running a more recent 9.0-BETA2. >How-To-Repeat: Attempt to make a snapshot of the /usr filesystem (32 GByte in my case), which is SU+J enabled by default. This can done by typing (as root): cd /usr; mksnap_ffs /usr/.snap/testsnap After a lot of disk activity for a few seconds the system locks up a described. >Fix: >Release-Note: >Audit-Trail: >Unformatted: