From owner-freebsd-current@FreeBSD.ORG Sun Feb 22 23:00:30 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D1A09E63 for ; Sun, 22 Feb 2015 23:00:30 +0000 (UTC) Received: from mail.ignoranthack.me (ignoranthack.me [199.102.79.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9CB07157 for ; Sun, 22 Feb 2015 23:00:29 +0000 (UTC) Received: from [192.168.200.212] (unknown [50.136.155.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: sbruno@ignoranthack.me) by mail.ignoranthack.me (Postfix) with ESMTPSA id E889D192A3B for ; Sun, 22 Feb 2015 23:00:27 +0000 (UTC) Message-ID: <54EA5F89.1010102@ignoranthack.me> Date: Sun, 22 Feb 2015 15:00:25 -0800 From: Sean Bruno Reply-To: sbruno@freebsd.org User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: freebsd-current@freebsd.org Subject: Re: panic on application core dump? References: <54E8EA2A.7020904@ignoranthack.me> <20150221211712.GG74514@kib.kiev.ua> <54EA1325.6070009@ignoranthack.me> <20150222180425.GJ74514@kib.kiev.ua> <54EA241D.6020606@ignoranthack.me> <20150222185352.GL74514@kib.kiev.ua> <54EA25FE.60401@ignoranthack.me> In-Reply-To: <54EA25FE.60401@ignoranthack.me> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Feb 2015 23:00:31 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 02/22/15 10:54, Sean Bruno wrote: > On 02/22/15 10:53, Konstantin Belousov wrote: >> On Sun, Feb 22, 2015 at 10:46:53AM -0800, Sean Bruno wrote: >>> Hmm ... looks unrelated to signals (maybe). This looks like a >>> common ZFS deadlock that is yet undiagnosed. I do not have a >>> show alllocks command available in db> . I will show each >>> lock information below: >> Add witness. > >>> >>> db> show lockedvnods Locked vnodes >>> >>> 0xfffff801141a6588: tag zfs, type VDIR usecount 19, writecount >>> 0, refcount 20 mountedhere 0 flags (VV_ROOT|VI_ACTIVE) >>> v_object 0xfffff80079be4500 ref 0 pages 0 cleanbuf 0 dirtybuf 0 >>> lock type zfs: EXCL by thread 0xfffff801ca10c4a0 (pid 75907, >>> sh, tid 101262) with exclusive waiters pending >> Without backtraces of the acquisition, it is not useful. You >> need DEBUG_VFS_LOCKS for this. > > > > Thank you. I will do so and restart my non-determinstic test and > see what I can find. > > sean _______________________ Well, that was certainly enlightening. I was able to get a WITNESS panic in imgact_binmisc.c in an hour or two. I need to *not* hold the mtx protecting the list of activators over the bcopy in imgact_binmisc_exec(). Jiles proposes that we switch to an sx lock here for simplicity of change of the code. Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex imgact_binmisc (imgact_binmisc) r = 0 (0xffffffff82012418) locked @ /usr/src/sys/modules/imgact_binmisc/../../kern/imgact_binmisc.c:596 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe046a236280 witness_warn() at witness_warn+0x4ae/frame 0xfffffe046a236350 trap_pfault() at trap_pfault+0x59/frame 0xfffffe046a2363f0 trap() at trap+0x45e/frame 0xfffffe046a236600 calltrap() at calltrap+0x8/frame 0xfffffe046a236600 - --- trap 0xc, rip = 0xffffffff80d21279, rsp = 0xfffffe046a2366c0, rbp = 0xfffffe046a2366d0 --- bcopy() at bcopy+0x39/frame 0xfffffe046a2366d0 imgact_binmisc_exec() at imgact_binmisc_exec+0x23d/frame 0xfffffe046a236720 kern_execve() at kern_execve+0x4c6/frame 0xfffffe046a236a80 sys_execve() at sys_execve+0x37/frame 0xfffffe046a236ae0 amd64_syscall() at amd64_syscall+0x27f/frame 0xfffffe046a236bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe046a236bf0 - --- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x4297ba, rsp = 0x7fffffffdaf8, rbp = 0x7fffffffdb00 --- Fatal trap 12: page fault while in kernel mode cpuid = 13; apic id = 33 fault virtual address = 0xfffffe0456c01007 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80d21279 stack pointer = 0x28:0xfffffe046a2366c0 frame pointer = 0x28:0xfffffe046a2366d0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 27028 (cc) [ thread pid 27028 tid 100872 ] Stopped at bcopy+0x39: repe movsb (%rsi),%es:(%rdi) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJU6l+GXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kG8kH/j6+UD8cf8rrLyd369eQQQmo ZTORZ9pAC6bMS9Dnu7VFpWGuelqFF9IXnjVml4QY4ieOBieavZYbfJ0nR3q+Htgh CRhvradu2yIBSbmW2sBPzIXsMn/XZCc6DAy21k5ieS29ksCL7wi9tDMVtcRZR2i5 rLowPix4M7MFoNASdPZepuLSnHyxHF00okeYFxaOzQ8sfyAA+zXYQjh5F8Xh0hRM M0HOF0J9nDxIZtueJSHDYSO94M0IxF+sMn/rmHznOFJZyNFfMY/zd9l9w+dx/8wW ve0WzZzGGfvYG9J80C6d1iEqIEDIS5tf7/VEwSWuR2cQFtsz3GUJJXI2+lzCl3s= =MwoX -----END PGP SIGNATURE-----