From owner-freebsd-stable@FreeBSD.ORG Sun Dec 14 18:37:41 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 401EED60 for ; Sun, 14 Dec 2014 18:37:41 +0000 (UTC) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 02005E4D for ; Sun, 14 Dec 2014 18:37:41 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 58E36B80B0; Sun, 14 Dec 2014 19:37:39 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 462EF28494; Sun, 14 Dec 2014 19:37:39 +0100 (CET) Date: Sun, 14 Dec 2014 19:37:39 +0100 From: Jilles Tjoelker To: Walter Hop Subject: Re: System hang on shutdown when running freebsd-update Message-ID: <20141214183739.GC84077@stack.nl> References: <548846F8.4080208@club-internet.fr> <20141210133658.GA12721@ozzmosis.com> <54885894.2060006@club-internet.fr> <7FE045BA-246F-460F-81F5-CFC312072A92@spam.lifeforms.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7FE045BA-246F-460F-81F5-CFC312072A92@spam.lifeforms.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Dec 2014 18:37:41 -0000 On Wed, Dec 10, 2014 at 09:48:18PM +0100, Walter Hop wrote: > root@current:~ # chflags noschg /sbin/init > root@current:~ # cp -Rp /sbin/init /sbin/init2 > lock order reversal: > 1st 0xfffffe007b842fa0 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3093 > 2nd 0xfffff80002b9ea00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000025c270 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe000025c320 > witness_checkorder() at witness_checkorder+0xdad/frame 0xfffffe000025c3b0 > _sx_xlock() at _sx_xlock+0x75/frame 0xfffffe000025c3f0 > ufsdirhash_add() at ufsdirhash_add+0x3a/frame 0xfffffe000025c430 > ufs_direnter() at ufs_direnter+0x6a0/frame 0xfffffe000025c4f0 > ufs_makeinode() at ufs_makeinode+0x560/frame 0xfffffe000025c6a0 > VOP_CREATE_APV() at VOP_CREATE_APV+0xf1/frame 0xfffffe000025c6d0 > vn_open_cred() at vn_open_cred+0x29d/frame 0xfffffe000025c820 > kern_openat() at kern_openat+0x26f/frame 0xfffffe000025c9a0 > amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe000025cab0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe000025cab0 > --- syscall (5, FreeBSD ELF64, sys_open), rip = 0x80094f01a, rsp = 0x7fffffffe958, rbp = 0x7fffffffe9c0 --- > Screenshot is here: http://lf.ms/current-r273635-hang-1.png > Finally, when rebooting, another lock order reversal appears and the > system hangs. I don’t have a text log of this, so I’ll copy the first > few lines: > Syncing disks, vnodes remaining…1 0 0 done > All buffers synced. > lock order reversal: > 1st 0xfffff80002e65d50 ufs (ufs) @ /usr/src/sys/kern/vfs_mount.c:1223 > 2nd 0xfffff80002e665f0 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2144 > Screenshot is here: http://lf.ms/current-r273635-hang-2.png > I don’t have kernel hacking experience, but these source files look > awfully related to the parts that we are having problems with. > I would really love some research into this and possibly an errata for > 10.1. What can we do to make this actionable? Both of these LORs are false positives. There is no mechanism in WITNESS to suppress them properly. I cannot reproduce the problem (VirtualBox, stable/10 amd64 and head i386), so apparently there is something special about some users' environments that causes this. -- Jilles Tjoelker