Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Aug 2011 21:01:00 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Andriy Gapon" <avg@FreeBSD.org>
Cc:        freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk>
References:  <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E43E272.1060204@FreeBSD.org><62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk><4E440865.1040500@FreeBSD.org><6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk><4E441314.6060606@FreeBSD.org><2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk><4E48D967.9060804@FreeBSD.org><9D034F992B064E8092E5D1D249B3E959@multiplay.co.uk><4E490DAF.1080009@FreeBSD.org><796FD5A096DE4558B57338A8FA1E125B@multiplay.co.uk><4E491D01.1090902@FreeBSD.org><570C5495A5E242F7946E806CA7AC5D68@multiplay.co.uk><4E4AD35C.7020504@FreeBSD.org><6A7238AED44542A880B082A40304D940@multiplay.co.uk><4E4BA21F.6010805@FreeBSD.org><581C95046B0948FC82D6F2E86948F87B@multiplay.co.uk><4E4BBA7F.30907@FreeBSD.org><88A6CE3E8B174E0694A3A9A5283479B4@multiplay.co.uk><4E4C22D6.6070407@FreeBSD.org><4019027648B5493AAC4B654BD821DE88@multiplay.co.uk><4E4F8631.1070300@FreeBSD.org> <4E4F8821.80108@Fre eBSD.org> <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> <4E4FE55A.9000101@ FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>

> thanks for doing this!  I'll reiterate my suspicion just in case - I think that
> you should look for the cases where you stop a jail, but then re-attach and
> resurrect the jail before it's completely dead.

Yer that's where I think its happening too, but I also suspect its not just
dieing jail that's needed, I think its a dieing jail in the final stages of
cleanup.

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
    struct prison *ppr, *tpr;
    int vfslocked;

    if (!(flags & PD_LOCKED))
        mtx_lock(&pr->pr_mtx);
    /* Decrement the user references in a separate loop. */
    if (flags & PD_DEUREF) {
        for (tpr = pr;; tpr = tpr->pr_parent) {
            if (tpr != pr)
                mtx_lock(&tpr->pr_mtx);
            if (--tpr->pr_uref > 0)
                break;
            KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
            mtx_unlock(&tpr->pr_mtx);
        }
        /* Done if there were only user references to remove. */
        if (!(flags & PD_DEREF)) {
            mtx_unlock(&tpr->pr_mtx);
            if (flags & PD_LIST_SLOCKED)
                sx_sunlock(&allprison_lock);
            else if (flags & PD_LIST_XLOCKED)
                sx_xunlock(&allprison_lock);
            return;
        }
        if (tpr != pr) {
            mtx_unlock(&tpr->pr_mtx);
            mtx_lock(&pr->pr_mtx);
        }
    }

If you take a scenario of a simple one level prison setup running a single process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is "really" being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?

    Regards
    Steve






================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B367F6CE151B4B6A8B0E048022D7F8AB>