From owner-freebsd-jail@FreeBSD.ORG Sat Aug 20 20:34:56 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42F871065672; Sat, 20 Aug 2011 20:34:56 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 6FC888FC0C; Sat, 20 Aug 2011 20:34:55 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA14332; Sat, 20 Aug 2011 23:34:52 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QusFo-000O9N-7B; Sat, 20 Aug 2011 23:34:52 +0300 Message-ID: <4E501A6A.3030801@FreeBSD.org> Date: Sat, 20 Aug 2011 23:34:50 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110819 Thunderbird/6.0 MIME-Version: 1.0 To: Steven Hartland References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@ FreeBSD.org> In-Reply-To: X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 20:34:56 -0000 on 20/08/2011 23:24 Steven Hartland said the following: > ----- Original Message ----- From: "Steven Hartland" >> Looking through the code I believe I may have noticed a scenario which could >> trigger the problem. >> >> Given the following code:- >> >> static void >> prison_deref(struct prison *pr, int flags) >> { >> struct prison *ppr, *tpr; >> int vfslocked; >> >> if (!(flags & PD_LOCKED)) >> mtx_lock(&pr->pr_mtx); >> /* Decrement the user references in a separate loop. */ >> if (flags & PD_DEUREF) { >> for (tpr = pr;; tpr = tpr->pr_parent) { >> if (tpr != pr) >> mtx_lock(&tpr->pr_mtx); >> if (--tpr->pr_uref > 0) >> break; >> KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); >> mtx_unlock(&tpr->pr_mtx); >> } >> /* Done if there were only user references to remove. */ >> if (!(flags & PD_DEREF)) { >> mtx_unlock(&tpr->pr_mtx); >> if (flags & PD_LIST_SLOCKED) >> sx_sunlock(&allprison_lock); >> else if (flags & PD_LIST_XLOCKED) >> sx_xunlock(&allprison_lock); >> return; >> } >> if (tpr != pr) { >> mtx_unlock(&tpr->pr_mtx); >> mtx_lock(&pr->pr_mtx); >> } >> } >> >> If you take a scenario of a simple one level prison setup running a single >> process >> where a prison has just been stopped. >> >> In the above code pr_uref of the processes prison is decremented. As this is the >> last process then pr_uref will hit 0 and the loop continues instead of breaking >> early. >> >> Now at the end of the loop iteration the mtx is unlocked so other process can >> now manipulate the jail, this is where I think the problem may be. >> >> If we now have another process come in and attach to the jail but then instantly >> exit, this process may allow another kernel thread to hit this same bit of code >> and so two process for the same prison get into the section which decrements >> prison0's pr_uref, instead of only one. >> >> In essence I think we can get the following flow where 1# = process1 >> and 2# = process2 >> 1#1. prison1.pr_uref = 1 (single process jail) >> 1#2. prison_deref( prison1,... >> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) >> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref >> 1#3. prison0.pr_uref-- >> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) >> 2#2. process1.exit >> 2#3. prison_deref( prison1,... >> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) >> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref >> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1) >> >> It seems like the action on the parent prison to decrement the pr_uref is >> happening too early, while the jail can still be used and without the lock on >> the child jails mtx, so causing a race condition. >> >> I think the fix is to the move the decrement of parent prison pr_uref's down >> so it only takes place if the jail is "really" being removed. Either that or >> to change the locking semantics so that once the lock is aquired in this >> prison_deref its not unlocked until the function completes. >> >> What do people think? > > After reviewing the changes to prison_deref in commit which added hierarchical > jails, the removal of the lock by the inital loop on the passed in prison may > be unintentional. > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h > > > If so the following may be all that's needed to fix this issue:- > > diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c > --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 > +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100 > @@ -2455,7 +2455,8 @@ > if (--tpr->pr_uref > 0) > break; > KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); > - mtx_unlock(&tpr->pr_mtx); > + if (tpr != pr) > + mtx_unlock(&tpr->pr_mtx); > } > /* Done if there were only user references to remove. */ > if (!(flags & PD_DEREF)) { Not sure if this would fly as is - please double check the later block where pr->pr_mtx is re-locked. -- Andriy Gapon