From owner-freebsd-current@FreeBSD.ORG Thu Aug 12 07:37:05 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 829E41065688 for ; Thu, 12 Aug 2010 07:37:05 +0000 (UTC) (envelope-from lwindschuh@googlemail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 3920B8FC16 for ; Thu, 12 Aug 2010 07:37:05 +0000 (UTC) Received: by qyk11 with SMTP id 11so6225929qyk.13 for ; Thu, 12 Aug 2010 00:37:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=DkT4DFHewwJ93OiB6MO6OaqMTISd97qbmu8NYWWepnk=; b=KVLt1MhxohE68Vr/qxR2GJAZfHl4Z/wMxzST+Sssqr+6T5EMSmW6w+mlpyy4NlSxEN VWCoR8F8nEszPiGPOwEXpVtjpMNXjBW/FERzUNYxp4EhVCzGZoFKn8ERhiPdC2gqniRk 3glMUlYpkP5jqfdcCS7ITnUrcImwS/DhgJ3fM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=nGI0fDGCnLJnJmlxioctoHkQr7XuO5D2/rcyxUBhzT/7Rj3/anNvyCtXygihjXa5J3 6WeQcayIOdpXVp8D9mBWIO9qumnArKcMX4IZoHFcALEKxknMN6VxoAHBg9tgdUNFo6za riPrJvN6rt4QFw8ZXIK8N6uBjojSC/PWLFnp4= MIME-Version: 1.0 Received: by 10.224.11.66 with SMTP id s2mr11704902qas.211.1281598624374; Thu, 12 Aug 2010 00:37:04 -0700 (PDT) Received: by 10.220.203.130 with HTTP; Thu, 12 Aug 2010 00:37:04 -0700 (PDT) Date: Thu, 12 Aug 2010 09:37:04 +0200 Message-ID: From: Lucius Windschuh To: FreeBSD Current Content-Type: text/plain; charset=ISO-8859-1 Subject: Deadlock in UFS/SU+J? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2010 07:37:05 -0000 Dear list members, I tried to reproduce another bug on my test machine (i386, CURRENT r211175), but ran into the following deadlock: lock order reversal: 1st 0xc66677c8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2101 2nd 0xe5ba2460 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11291 3rd 0xc8647af8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2101 KDB: stack backtrace: db_trace_self_wrapper(c08cfabe,f60ee2d4,c06441c5,c06339ab,c08d2b4d,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c06339ab,c08d2b4d,c5112100,c5115b18,f60ee330,...) at kdb_backtrace+0x29 _witness_debugger(c08d2b4d,c8647af8,c08c4963,c5115b18,c08d9dc5,...) at _witness_debugger+0x25 witness_checkorder(c8647af8,9,c08d9dc5,835,0,...) at witness_checkorder+0x839 __lockmgr_args(c8647af8,80100,c8647b18,0,0,...) at __lockmgr_args+0x7f4 ffs_lock(f60ee458,c0643f6b,c08d9267,80100,c8647aa0,...) at ffs_lock+0x8a VOP_LOCK1_APV(c0945da0,f60ee458,c82ea364,c095e280,c8647aa0,...) at VOP_LOCK1_APV+0xb5 _vn_lock(c8647aa0,80100,c08d9dc5,835,4,...) at _vn_lock+0x5e vget(c8647aa0,80100,c82ea2c0,50,0,...) at vget+0xb9 vfs_hash_get(c5ea2a20,5c000e,80000,c82ea2c0,f60ee5a8,...) at vfs_hash_get+0xe6 ffs_vgetf(c5ea2a20,5c000e,80000,f60ee5a8,1,...) at ffs_vgetf+0x49 softdep_sync_metadata(c6667770,0,c08edf39,144,0,...) at softdep_sync_metadata+0xc92 ffs_syncvnode(c6667770,1,f60ee664,246,c666781c,...) at ffs_syncvnode+0x3e2 ffs_truncate(c6667770,200,0,880,c5e4f780,...) at ffs_truncate+0x87e ufs_direnter(c6667770,c8647aa0,f60ee91c,f60eeba4,0,...) at ufs_direnter+0x8d4 ufs_makeinode(f60eeba4,0,f60eeb00,f60eea64,c0879745,...) at ufs_makeinode+0x517 ufs_create(f60eeb00,f60eeb18,0,0,f60eeb78,...) at ufs_create+0x30 VOP_CREATE_APV(c0945da0,f60eeb00,2,c0643f6b,0,...) at VOP_CREATE_APV+0xa5 vn_open_cred(f60eeb78,f60eec2c,180,0,c5e4f780,...) at vn_open_cred+0x1d3 vn_open(f60eeb78,f60eec2c,180,c5f077e0,0,...) at vn_open+0x3b kern_openat(c82ea2c0,ffffff9c,2848cc80,0,a03,...) at kern_openat+0xf5 kern_open(c82ea2c0,2848cc80,0,a02,180,...) at kern_open+0x35 open(c82ea2c0,f60eecec,f60eed28,c08d1426,0,...) at open+0x30 syscallenter(c82ea2c0,f60eece4,f60eece4,0,c097b8f0,...) at syscallenter+0x23f syscall(f60eed28) at syscall+0x34 Xint0x80_syscall() at Xint0x80_syscall+0x21 --- syscall (5, FreeBSD ELF32, open), eip = 0x282a2623, esp = 0xbfbfb08c, ebp = 0xbfbfb538 --- The test, upgrading lang/gcc44 in an endless loop, gut stuck sometime in the night when /etc/periodic/daily/450.status-security was running in parallel: $ ps axl | grep ufs 0 44219 44215 0 76 0 13624 7200 ufs D ?? 0:03.19 find -sx / /usr /zfs/test /dev/null -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} + 0 44713 2638 0 44 0 9752 1304 piperd S+ 3 0:00.01 grep ufs 0 43824 43823 0 76 0 10532 1368 ufs D+ 2 0:00.02 /usr/ports/lang/gcc44/work/build/./prev-gcc/collect2 -dynamic-linker /libexec/ld-elf.so.1 -o cc1 /usr/lib/crt1.o /usr/lib/crti.o /usr/ports/lang/gcc44/work/build/./prev-gcc/crtbegin.o -L/usr/local/lib -L/usr/ports/lang/gcc44/work/build/./prev-gcc -L/usr/local/i386-portbld-freebsd9.0/bin c-lang.o stub-objc.o attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-opts.o c-format.o c-semantics.o c-ppoutput.o c-cppbuiltin.o c-objc-common.o c-dump.o c-pch.o c-parser.o i386-c.o c-gimplify.o tree-mudflap.o c-pretty-print.o c-omp.o cc1-checksum.o main.o libbackend.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a ../libcpp/libcpp.a /usr/local/lib/libiconv.so -rpath /usr/local/lib ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a -lmpfr -lgmp -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/ports/lang/gcc44/work/build/./prev-gcc/crtend.o /usr/lib/crtn.o 0 43828 43780 0 76 0 9608 3580 ufs D+ 2 0:00.02 /usr/local/bin/as -I. -I. -I.././../gcc-4.4-20100803/gcc -I.././../gcc-4.4-20100803/gcc/. -I.././../gcc-4.4-20100803/gcc/../include -I.././../gcc-4.4-20100803/gcc/../libcpp/include -I/usr/local/include -I.././../gcc-4.4-20100803/gcc/../libdecnumber -I.././../gcc-4.4-20100803/gcc/../libdecnumber/dpd -I../libdecnumber -I/usr/local/include -o gcc.o /var/tmp//ccxkWkSs.s Short backtraces of the affected processes: # procstat -kk 44219 PID TID COMM TDNAME KSTACK 44219 100252 find - mi_switch+0x200 sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9 cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5 lookup+0x5b3 namei+0x4af kern_statat_vnhook+0x72 kern_statat+0x3c kern_lstat+0x36 lstat+0x2f syscallenter+0x23f # procstat -kk 43824 PID TID COMM TDNAME KSTACK 43824 100082 collect2 - mi_switch+0x200 sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9 cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5 lookup+0x5b3 namei+0x4af vn_open_cred+0x2ed vn_open+0x3b kern_openat+0xf5 kern_open+0x35 open+0x30 # procstat -kk 43828 PID TID COMM TDNAME KSTACK 43828 100276 as - mi_switch+0x200 sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9 cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5 lookup+0x5b3 namei+0x4af vn_open_cred+0x2ed vn_open+0x3b kern_openat+0xf5 kern_open+0x35 open+0x30 As you see, both gcc-related process hang in open(), the find process hangs in lstat(). I don't know which one got stuck first, but I am confident that I ran into the deadlock described by WITNESS above. Is there anything I can help with? The machine is still hanging in this state, entering the debugger is possible, generating a coredump should work (did hang the last time, but since than, I updated the kernel). Kernel config: https://sites.google.com/site/lwfreebsd/Home/files/kernelconfig-CURRENT.txt.gz?attredirects=0&d=1 DDB output from a textdump simulation: https://sites.google.com/site/lwfreebsd/Home/files/current-r211175-UFS-deadlock.txt.gz?attredirects=0&d=1 Regards Lucius