Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Aug 2011 10:49:51 -0600
From:      Jamie Gritton <jamie@FreeBSD.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        "Bjoern A. Zeeb" <bz@FreeBSD.org>, freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org, Andriy Gapon <avg@FreeBSD.org>
Subject:   Re: debugging frequent kernel panics on 8.2-RELEASE
Message-ID:  <4E51372F.1020606@FreeBSD.org>
In-Reply-To: <A83D4882F48A432591DC9C05DEF226A0@multiplay.co.uk>
References:  eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org> <B367F6CE151B4B6A8B0E048022D7F8AB@multiplay.co.uk><D34B2BC140E14E8A8D450474CCC39CB8@multiplay.co.uk> <4E501A6A.3030801@FreeBSD.org> <3D52B19B71CA49A4BB3BCCDF25961F46@multiplay.co.uk> <4E50ADC5.6050403@FreeBSD.org> <A83D4882F48A432591DC9C05DEF226A0@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On 08/21/11 05:01, Steven Hartland wrote:
> ----- Original Message ----- From: "Jamie Gritton" <jamie@FreeBSD.org>
>> The problem isn't with the conditional locking of tpr in prison_deref.
>> That locking is actually correct, and there's no race condition.
>
> Are you sure? I do think that unlocking the mtx half way through the
> call allows the above scenario to create a race condition, all be it
> very briefly, when ignoring the overriding issue.
>
> In addition if the code where changed to so that the pr_uref++ also
> maintained the parents uref this would definitely lead to a potential
> problems in my mind, especially if you had more than one child prison,
> of a given parent, entering the dying state at any one time.
>
> In this case I believe you would have to acquire the locks of all
> the parent prisons before it would be safe to precede.

Lock order requires that I unlock the child if I want to lock the
parent. While that does allow periods where neither is locked, it's safe
in this case. There may be multiple processes dying in one jail, or in
multiple children of a single jail. But as long as a parent jail is
locked while decrementing pr_uref, then only one of these simultaneous
prison_deref calls would set pr_uref to zero and continue in the loop to
that prison's parent. This might be mixed with pr_uref being incremented
elsewhere, but that's not a problem either as long as the jail in
question is locked.

>> The trouble lies in the resurrection of dead jails, as Andriy has noted
>> (though not just attaching, but also by setting its persist flag causes
>> the same problem).
>
> I not sure that persistent prisons actually suffer from this in any
> different way tbh, as they have an additional uref increment so would
> never hit this case unless they have been actively removed and hence
> unpersisted first.

Right - both the attach and persist cases are only a problem when a jail
has disappeared. There are various ways for a jail to be removed,
potentially to be kept around but in the dying state, but only two
related ways for it to be resurrected: attaching a new process or
setting the persist flag, both via jail_set with the JAIL_DYING flag passed.

>> There are two possible fixes to this. One is the patch you've given,
>> which only decrements a parent jail's pr_uref when the child jail
>> completely goes away (as opposed to when it loses its last uref). This
>> provides symmetry with the current way pr_uref is incremented on the
>> parent, which is only when a jail is created.
>>
>> The other fix is to increment a parent's pr_uref when a jail is
>> resurrected, which will match the current logic in prison_deref. I like
>> the external semantics of this solution: a jail isn't visible if it is
>> not persistent and has no processes and no *visible* sub-jails, as
>> opposed to having no sub-jails at all. But this solution ends up pretty
>> complicated - there are a few places where pr_uref is incremented, where
>> I might need to increment parent jails' pr_uref as well, much like the
>> current tpr loop in prison_deref decrements them.
>
> Ahh yes in the hierarchical case my patch would indeed mean that none
> persistent parent jails would remain visible even when its last child
> jail is in a dying state.
>
> As you say making this not the case would likely require replacing all
> instances of pr_uref++ with a prison_uref method that implements the
> opposite of the loop in prison_dref should the prisons pr_uref be 0 when
> called.

Yes, that's the problem. Maybe not all instances, but at least most have
enough times a jail is unlocked that we can't assume the pr_uref hasn't
been set to zero somewhere else, and so we need to do that loop.

>> Your solution removes code instead of adding it, which is generally a
>> good thing. While it does change the semantics of pr_uref in the
>> hierarchical case at least from what I thought it was, those semantics
>> haven't been working properly anyway.
>
> Good to know my interpretation was correct, even if I was missing the
> visibility factor in the hierarchical case :)
>
>> Bjoern, I'm adding you to the CC list for this because the whole pr_uref
>> thing was your idea (though it was pr_nprocs at the time), so you might
>> care about the hierarchical semantics of it - or you may not. Also, this
>> is a panic-inducing bug in current and may interest you for that reason.
>
>  From an admin perspective the current jail dying state does cause
> confusion when your not aware of its existence. You ask a jail to stop it
> appears to have completed that request, but really hasn't, an generally
> due to just a lingering tcp connection.
>
> With the introduction of hierarchical jails that gets a little worse
> where a whole series of jails could disappear from normal view only to
> be resurrected shortly after. Something to bear in mind when deciding
> which solution of the two presented to use.

The good news is that the only time a jail (or perhaps a whole set of
jails) can only come back from the dead when the administrator makes a
concerted effort to do so. So it at least shouldn't surprise the
administrator who did that. To any other program/person that's watching,
it just looks like jails were removed and then other jails with the same
ID were created, the same as could happen when there are no dying-state
issues, i.e. no outstanding TCP connections.

One drawback to the suggested patch is that in the hierarchical case, an
administrator that is naive to the JAIL_DYING flag could remove a jail
and then continue to see it existing for a while anyway. In the typical
single-level jail case, removing a jail always makes it go away, except
to the eyes of the JAIL_DYING search. Preserving those semantics may be
enough to want to go with the more complicated solution of adding a
matching prison_ref call.

So ... not necessarily decided on this issue after all.

- Jamie



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E51372F.1020606>