From owner-freebsd-current@freebsd.org Fri Aug 28 04:36:24 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B24CF9C252A for ; Fri, 28 Aug 2015 04:36:24 +0000 (UTC) (envelope-from lstewart@room52.net) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id 1B1862A1; Fri, 28 Aug 2015 04:36:23 +0000 (UTC) (envelope-from lstewart@room52.net) Received: from lgwl-lstewart2.corp.netflix.com (unknown [119.17.36.2]) by lauren.room52.net (Postfix) with ESMTPSA id 2A8407E84F; Fri, 28 Aug 2015 14:36:14 +1000 (EST) Message-ID: <55DFE537.4080200@room52.net> Date: Fri, 28 Aug 2015 14:36:07 +1000 From: Lawrence Stewart User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Konstantin Belousov , Andriy Gapon CC: John-Mark Gurney , FreeBSD Current , Pawel Pekala , "K. Macy" Subject: Re: Instant panic while trying run ports-mgmt/poudriere References: <20150713231205.627bab36@FreeBSD.org> <20150714223829.GY8523@funkthat.com> <20150715174616.652d0aea@FreeBSD.org> <20150715180526.GM8523@funkthat.com> <20150715223703.78b9197c@FreeBSD.org> <20150806233328.47a02594@FreeBSD.org> <55CB5428.2090505@room52.net> <55D96E24.9060106@FreeBSD.org> <20150823090816.GJ2072@kib.kiev.ua> <20150823125442.GK2072@kib.kiev.ua> In-Reply-To: <20150823125442.GK2072@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=2.4 required=5.0 tests=DNS_FROM_AHBL_RHSBL, UNPARSEABLE_RELAY autolearn=no version=3.3.2 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2015 04:36:24 -0000 On 08/23/15 22:54, Konstantin Belousov wrote: > On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote: >> On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote: >>> On 12/08/2015 17:11, Lawrence Stewart wrote: >>>> On 08/07/15 07:33, Pawel Pekala wrote: >>>>> Hi K., >>>>> >>>>> On 2015-08-06 12:33 -0700, "K. Macy" wrote: >>>>>> Is this still happening? >>>>> >>>>> Still crashes: >>>> >>>> +1 for me running r286617 >>> >>> Here is another +1 with r286922. >>> I can add a couple of bits of debugging data: >>> >>> (kgdb) fr 8 >>> #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0, >>> hint=2147483648, lockflags=) at >>> /usr/src/sys/kern/kern_event.c:1964 >>> 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) { >>> (kgdb) p *list >>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0 >>> , kl_unlock = 0xffffffff8063a200 , >>> kl_assert_locked = 0xffffffff8063a220 , >>> kl_assert_unlocked = 0xffffffff8063a240 , >>> kl_lockarg = 0xfffff8019a733bb0} >>> (kgdb) disassemble >>> Dump of assembler code for function knote: >>> 0xffffffff80639d00 : push %rbp >>> 0xffffffff80639d01 : mov %rsp,%rbp >>> 0xffffffff80639d04 : push %r15 >>> 0xffffffff80639d06 : push %r14 >>> 0xffffffff80639d08 : push %r13 >>> 0xffffffff80639d0a : push %r12 >>> 0xffffffff80639d0c : push %rbx >>> 0xffffffff80639d0d : sub $0x18,%rsp >>> 0xffffffff80639d11 : mov %edx,%r12d >>> 0xffffffff80639d14 : mov %rsi,-0x30(%rbp) >>> 0xffffffff80639d18 : mov %rdi,%rbx >>> 0xffffffff80639d1b : test %rbx,%rbx >>> 0xffffffff80639d1e : je 0xffffffff80639ef6 >>> 0xffffffff80639d24 : mov %r12d,%eax >>> 0xffffffff80639d27 : and $0x1,%eax >>> 0xffffffff80639d2a : mov %eax,-0x3c(%rbp) >>> 0xffffffff80639d2d : mov 0x28(%rbx),%rdi >>> 0xffffffff80639d31 : je 0xffffffff80639d38 >>> 0xffffffff80639d33 : callq *0x18(%rbx) >>> 0xffffffff80639d36 : jmp 0xffffffff80639d42 >>> 0xffffffff80639d38 : callq *0x20(%rbx) >>> 0xffffffff80639d3b : mov 0x28(%rbx),%rdi >>> 0xffffffff80639d3f : callq *0x8(%rbx) >>> 0xffffffff80639d42 : mov %rbx,-0x38(%rbp) >>> 0xffffffff80639d46 : mov (%rbx),%rbx >>> 0xffffffff80639d49 : test %rbx,%rbx >>> 0xffffffff80639d4c : je 0xffffffff80639ee5 >>> 0xffffffff80639d52 : and $0x2,%r12d >>> 0xffffffff80639d56 : nopw %cs:0x0(%rax,%rax,1) >>> 0xffffffff80639d60 : mov 0x28(%rbx),%r14 >>> >>> Panic is in the last quoted instruction. >>> And: >>> (kgdb) i reg >>> rax 0x246 582 >>> rbx 0xdeadc0dedeadc0de -2401050962867404578 >>> rcx 0x0 0 >>> rdx 0x12e 302 >>> rsi 0xffffffff80a26a5a -2136839590 >>> rdi 0xffffffff80e81b80 -2132272256 >>> rbp 0xfffffe02b7efea20 0xfffffe02b7efea20 >>> rsp 0xfffffe02b7efe9e0 0xfffffe02b7efe9e0 >>> r8 0xffffffff80a269ce -2136839730 >>> r9 0xffffffff80e82838 -2132269000 >>> r10 0x10000 65536 >>> r11 0xffffffff80fabd10 -2131051248 >>> r12 0x0 0 >>> r13 0xfffff801ff84a818 -8787511171048 >>> r14 0xfffff801ff84a800 -8787511171072 >>> r15 0xfffff8019a6974f0 -8789207452432 >>> rip 0xffffffff80639d60 0xffffffff80639d60 >>> eflags 0x10286 66182 >>> >>> I think that $rbx stands out here (this is a kernel with INVARIANTS). >>> >>> Looking at the code, is it possible that one of the calls from within >>> the loop's body modifies the list? If that is so and provided that is a >>> valid behavior, then maybe using SLIST_FOREACH_SAFE would help. >> >> This is first time a useful debugging data was posted. >> >> The 0x28 offset may indicate either kn_kq member access of the struct >> knote, or kq_list of the struct kqueue. >> >> kl_list.slh_first of the list parameter is NULL, how would a list >> iteration loop even start ? Can you look up the list argument value >> from the previous frame (%rdi is overwritten, so debugger might be >> confused) ? > > After looking at your data closely, I think you are right. The panic > occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT). This is the > only case in the tree where filter uses knlist_remove_inevent() to detach > processed note, so indeed the slist is modified under the iterator. > > Below is the patch with the suggested change and unrelated cleanup of > the uma(9) KPI use. Please test, everybody who has a panic with the > backtrace pointing to the sys_exit(). Fixes the panic for me too, thanks Kostik. Cheers, Lawrence