Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Aug 2011 23:34:50 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <4E501A6A.3030801@FreeBSD.org>
In-Reply-To: <D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk>
References:  eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@ FreeBSD.org> <B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk> <D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
on 20/08/2011 23:24 Steven Hartland said the following:
> ----- Original Message ----- From: "Steven Hartland" <killing@multiplay.co.uk>
>> Looking through the code I believe I may have noticed a scenario which could
>> trigger the problem.
>>
>> Given the following code:-
>>
>> static void
>> prison_deref(struct prison *pr, int flags)
>> {
>>    struct prison *ppr, *tpr;
>>    int vfslocked;
>>
>>    if (!(flags & PD_LOCKED))
>>        mtx_lock(&pr->pr_mtx);
>>    /* Decrement the user references in a separate loop. */
>>    if (flags & PD_DEUREF) {
>>        for (tpr = pr;; tpr = tpr->pr_parent) {
>>            if (tpr != pr)
>>                mtx_lock(&tpr->pr_mtx);
>>            if (--tpr->pr_uref > 0)
>>                break;
>>            KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
>>            mtx_unlock(&tpr->pr_mtx);
>>        }
>>        /* Done if there were only user references to remove. */
>>        if (!(flags & PD_DEREF)) {
>>            mtx_unlock(&tpr->pr_mtx);
>>            if (flags & PD_LIST_SLOCKED)
>>                sx_sunlock(&allprison_lock);
>>            else if (flags & PD_LIST_XLOCKED)
>>                sx_xunlock(&allprison_lock);
>>            return;
>>        }
>>        if (tpr != pr) {
>>            mtx_unlock(&tpr->pr_mtx);
>>            mtx_lock(&pr->pr_mtx);
>>        }
>>    }
>>
>> If you take a scenario of a simple one level prison setup running a single
>> process
>> where a prison has just been stopped.
>>
>> In the above code pr_uref of the processes prison is decremented. As this is the
>> last process then pr_uref will hit 0 and the loop continues instead of breaking
>> early.
>>
>> Now at the end of the loop iteration the mtx is unlocked so other process can
>> now manipulate the jail, this is where I think the problem may be.
>>
>> If we now have another process come in and attach to the jail but then instantly
>> exit, this process may allow another kernel thread to hit this same bit of code
>> and so two process for the same prison get into the section which decrements
>> prison0's pr_uref, instead of only one.
>>
>> In essence I think we can get the following flow where 1# = process1
>> and 2# = process2
>> 1#1. prison1.pr_uref = 1 (single process jail)
>> 1#2. prison_deref( prison1,...
>> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
>> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
>> 1#3. prison0.pr_uref--
>> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
>> 2#2. process1.exit
>> 2#3. prison_deref( prison1,...
>> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
>> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
>> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1)
>>
>> It seems like the action on the parent prison to decrement the pr_uref is
>> happening too early, while the jail can still be used and without the lock on
>> the child jails mtx, so causing a race condition.
>>
>> I think the fix is to the move the decrement of parent prison pr_uref's down
>> so it only takes place if the jail is "really" being removed. Either that or
>> to change the locking semantics so that once the lock is aquired in this
>> prison_deref its not unlocked until the function completes.
>>
>> What do people think?
> 
> After reviewing the changes to prison_deref in commit which added hierarchical
> jails, the removal of the lock by the inital loop on the passed in prison may
> be unintentional.
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h
> 
> 
> If so the following may be all that's needed to fix this issue:-
> 
> diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
> --- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
> +++ sys/kern/kern_jail.c        2011-08-20 21:18:35.307201425 +0100
> @@ -2455,7 +2455,8 @@
>                        if (--tpr->pr_uref > 0)
>                                break;
>                        KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
> -                       mtx_unlock(&tpr->pr_mtx);
> +                       if (tpr != pr)
> +                               mtx_unlock(&tpr->pr_mtx);
>                }
>                /* Done if there were only user references to remove. */
>                if (!(flags & PD_DEREF)) {

Not sure if this would fly as is - please double check the later block where
pr->pr_mtx is re-locked.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E501A6A.3030801>