From owner-freebsd-current@freebsd.org Thu Aug 27 20:22:18 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 70D2A9C369F for ; Thu, 27 Aug 2015 20:22:18 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5C705121F; Thu, 27 Aug 2015 20:22:16 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA18554; Thu, 27 Aug 2015 23:22:14 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZV3gb-000Ngg-Qs; Thu, 27 Aug 2015 23:22:13 +0300 Subject: Re: Instant panic while trying run ports-mgmt/poudriere To: John-Mark Gurney References: <20150714223829.GY8523@funkthat.com> <20150715174616.652d0aea@FreeBSD.org> <20150715180526.GM8523@funkthat.com> <20150715223703.78b9197c@FreeBSD.org> <20150806233328.47a02594@FreeBSD.org> <55CB5428.2090505@room52.net> <55D96E24.9060106@FreeBSD.org> <20150826233616.GU33167@funkthat.com> <55DEBA8B.5060009@FreeBSD.org> <20150827180945.GW33167@funkthat.com> Cc: FreeBSD Current , Lawrence Stewart , Pawel Pekala , "K. Macy" From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <55DF713D.7000808@FreeBSD.org> Date: Thu, 27 Aug 2015 23:21:17 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150827180945.GW33167@funkthat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 20:22:18 -0000 On 27/08/2015 21:09, John-Mark Gurney wrote: > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300: >> On 27/08/2015 02:36, John-Mark Gurney wrote: >>> We should/cannot get here w/ an empty list. If we do, then there is >>> something seriously wrong... The current kn (which we must have as we >>> are here) MUST be on the list, but as you just showed, there are no >>> knotes on the list. >>> >>> Can you get me a print of the knote? That way I can see what flags >>> are on it? >> >> Apologies if the following might sound a little bit patronizing, but it >> seems that you have got all the facts correctly, but somehow the >> connection between them did not become clear. >> >> So: >> 1. The list originally is NOT empty. I guess that it has one entry, but >> that's an unimportant detail. >> 2. This is why the loop is entered. It's a fact that it is entered. >> 3. The list becomes empty precisely because the entry is removed during >> the iteration in the loop (as kib has explained). It's a fact that the >> list became empty at least in the panic that I reported. > > On you're latest dump, you said: > Here is another +1 with r286922. > I can add a couple of bits of debugging data: > > (kgdb) fr 8 > #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0, > hint=2147483648, lockflags=) at > /usr/src/sys/kern/kern_event.c:1964 > 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) { > > First off, that can't be r286922, per: > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922 > > line 1964 is blank... The line of code above should be at line 1884, > so not sure what is wrong here... No, it can not be indeed, because I am running head. r286922 was the latest version of the repository, not the head branch, at the moment when I pulled the repository via git. > Assuming that the pc really is at the line, f_event has not yet been > called, Even on the second loop iteration? >which is why I said that the list cannot be empty yet, as > f_event hasn't been called yet to remove the knote... It could be that > optimization moved stuff around, but if that is the case, then the > above wasn't useful.. I provided the disassembly of the code as well, it's very obvious how the code was translated. >> 4. The element is not only unlinked from the list, but its memory is >> also freed. > > Where is the memory freed? A knote MUST NOT be freed in an f_event > handler. The only location that a list element is allowed to be > freed is in knote_drop, which must happen after f_detach is called, > but that can't/won't happen from knote (I believe the timer handles > this specially, but we are talking about normal knlist type filters).. Well, right. knote()->filt_proc()->knlist_remove_inevent() just removes the knote from the list. But then there is KNOTE_ACTIVATE() that passes the knote to a different owner (so to say). And given that the knote has EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a stress load on a system, I am not surprised that another thread gets a chance to call knote_drop() on the knote before the original thread proceeds to the next iteration. > The rest of your explination is invalid due to the invalid assumption > of this point... Eagerly waiting for your explanation... > If you can provide to me where the knote is free'd in knote, w/ > function/line number stack trace (does not have to be dump, but a > sample call path), then I'll reconsider, and fix that bug... >> 5. That's why we have the use after free: SLIST_FOREACH is trying to get >> a pointer to a next element from the freed memory. >> 6. This is why the commit for trashing the freed memory made all the >> difference: previously the freed memory was unlikely to be re-used / >> modified, so the use-after-free had a high chance of succeeding. It's a >> fact that in my panic there was an attempt to dereference a trashed pointer. >> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the >> pointer to the next element beforehand and, thus, we do not access the >> freed memory. >> >> Please let me know if you see any fault in above reasoning or if >> something is still no clear. > -- Andriy Gapon