From owner-freebsd-jail@FreeBSD.ORG Sun Aug 21 01:29:53 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 169381065670; Sun, 21 Aug 2011 01:29:53 +0000 (UTC) (envelope-from prvs=12149e54fe=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id F2CF18FC15; Sun, 21 Aug 2011 01:29:51 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sun, 21 Aug 2011 02:18:43 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 21 Aug 2011 02:18:43 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014678950.msg; Sun, 21 Aug 2011 02:18:41 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12149e54fe=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <3D52B19B71CA49A4BB3BCCDF25961F46@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" , References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org> <4E501A6A.3030801@FreeBSD.org> Date: Sun, 21 Aug 2011 02:19:58 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 01:29:53 -0000 ----- Original Message ----- From: "Andriy Gapon" > on 20/08/2011 23:24 Steven Hartland said the following: >> ----- Original Message ----- From: "Steven Hartland" >>> Looking through the code I believe I may have noticed a scenario which could >>> trigger the problem. >>> >>> Given the following code:- >>> >>> static void >>> prison_deref(struct prison *pr, int flags) >>> { >>> struct prison *ppr, *tpr; >>> int vfslocked; >>> >>> if (!(flags & PD_LOCKED)) >>> mtx_lock(&pr->pr_mtx); >>> /* Decrement the user references in a separate loop. */ >>> if (flags & PD_DEUREF) { >>> for (tpr = pr;; tpr = tpr->pr_parent) { >>> if (tpr != pr) >>> mtx_lock(&tpr->pr_mtx); >>> if (--tpr->pr_uref > 0) >>> break; >>> KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); >>> mtx_unlock(&tpr->pr_mtx); >>> } >>> /* Done if there were only user references to remove. */ >>> if (!(flags & PD_DEREF)) { >>> mtx_unlock(&tpr->pr_mtx); >>> if (flags & PD_LIST_SLOCKED) >>> sx_sunlock(&allprison_lock); >>> else if (flags & PD_LIST_XLOCKED) >>> sx_xunlock(&allprison_lock); >>> return; >>> } >>> if (tpr != pr) { >>> mtx_unlock(&tpr->pr_mtx); >>> mtx_lock(&pr->pr_mtx); >>> } >>> } >>> >>> If you take a scenario of a simple one level prison setup running a single >>> process >>> where a prison has just been stopped. >>> >>> In the above code pr_uref of the processes prison is decremented. As this is the >>> last process then pr_uref will hit 0 and the loop continues instead of breaking >>> early. >>> >>> Now at the end of the loop iteration the mtx is unlocked so other process can >>> now manipulate the jail, this is where I think the problem may be. >>> >>> If we now have another process come in and attach to the jail but then instantly >>> exit, this process may allow another kernel thread to hit this same bit of code >>> and so two process for the same prison get into the section which decrements >>> prison0's pr_uref, instead of only one. >>> >>> In essence I think we can get the following flow where 1# = process1 >>> and 2# = process2 >>> 1#1. prison1.pr_uref = 1 (single process jail) >>> 1#2. prison_deref( prison1,... >>> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) >>> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref >>> 1#3. prison0.pr_uref-- >>> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) >>> 2#2. process1.exit >>> 2#3. prison_deref( prison1,... >>> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) >>> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref >>> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1) >>> >>> It seems like the action on the parent prison to decrement the pr_uref is >>> happening too early, while the jail can still be used and without the lock on >>> the child jails mtx, so causing a race condition. >>> >>> I think the fix is to the move the decrement of parent prison pr_uref's down >>> so it only takes place if the jail is "really" being removed. Either that or >>> to change the locking semantics so that once the lock is aquired in this >>> prison_deref its not unlocked until the function completes. >>> >>> What do people think? >> >> After reviewing the changes to prison_deref in commit which added hierarchical >> jails, the removal of the lock by the inital loop on the passed in prison may >> be unintentional. >> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h >> >> >> If so the following may be all that's needed to fix this issue:- >> >> diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c >> --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 >> +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100 >> @@ -2455,7 +2455,8 @@ >> if (--tpr->pr_uref > 0) >> break; >> KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); >> - mtx_unlock(&tpr->pr_mtx); >> + if (tpr != pr) >> + mtx_unlock(&tpr->pr_mtx); >> } >> /* Done if there were only user references to remove. */ >> if (!(flags & PD_DEREF)) { > > Not sure if this would fly as is - please double check the later block where > pr->pr_mtx is re-locked. Your right, and its actually more complex than that. Although changing it to not unlock in the middle of prison_deref fixes that race condition it doesn't prevent pr_uref being incorrectly decremented each time the jail gets into the dying state, which is really the problem we are seeing. If hierarchical prisons are used there seems to be an additional problem where the counter of all prisons in the hierarchy are decremented, but as far as I can tell only the immediate parent is ever incremented, so another reference problem there as well I think. The following patch I believe fixes both of these issues. I've testing with debug added and confirmed prison0's pr_uref is maintained correctly even when a jail hits dying state multiple times. It essentially reverts the changes to the "if (flags & PD_DEUREF)" by 192895 and moves it to after the jail has been actually removed. diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 +++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100 @@ -2449,27 +2449,16 @@ mtx_lock(&pr->pr_mtx); /* Decrement the user references in a separate loop. */ if (flags & PD_DEUREF) { - for (tpr = pr;; tpr = tpr->pr_parent) { - if (tpr != pr) - mtx_lock(&tpr->pr_mtx); - if (--tpr->pr_uref > 0) - break; - KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); - mtx_unlock(&tpr->pr_mtx); - } + pr->pr_uref--; /* Done if there were only user references to remove. */ if (!(flags & PD_DEREF)) { - mtx_unlock(&tpr->pr_mtx); + mtx_unlock(&pr->pr_mtx); if (flags & PD_LIST_SLOCKED) sx_sunlock(&allprison_lock); else if (flags & PD_LIST_XLOCKED) sx_xunlock(&allprison_lock); return; } - if (tpr != pr) { - mtx_unlock(&tpr->pr_mtx); - mtx_lock(&pr->pr_mtx); - } } for (;;) { @@ -2525,6 +2514,8 @@ /* Removing a prison frees a reference on its parent. */ pr = ppr; mtx_lock(&pr->pr_mtx); + /* Ensure user reference added on create is removed */ + pr->pr_uref--; flags = PD_DEREF; } } Jamie from what I can tell you where the original committer of hierarchical prisons and this in the following svn change set so would really appreciate your feedback on this. http://svnweb.freebsd.org/base?view=revision&revision=192895 Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-jail@FreeBSD.ORG Sun Aug 21 07:03:36 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A4044106566B; Sun, 21 Aug 2011 07:03:36 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from m2.gritton.org (gritton.org [64.34.175.71]) by mx1.freebsd.org (Postfix) with ESMTP id 6F9838FC13; Sun, 21 Aug 2011 07:03:36 +0000 (UTC) Received: from glorfindel.gritton.org (c-174-52-133-59.hsd1.ut.comcast.net [174.52.133.59]) (authenticated bits=0) by m2.gritton.org (8.14.4/8.14.4) with ESMTP id p7L73YLe078411; Sun, 21 Aug 2011 01:03:34 -0600 (MDT) (envelope-from jamie@FreeBSD.org) Message-ID: <4E50ADC5.6050403@FreeBSD.org> Date: Sun, 21 Aug 2011 01:03:33 -0600 From: Jamie Gritton User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101110 Thunderbird/3.1.6 MIME-Version: 1.0 To: Steven Hartland References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org> <4E501A6A.3030801@FreeBSD.org> <3D52B19B71CA49A4BB3BCCDF25961F46@multiplay.co.uk> In-Reply-To: <3D52B19B71CA49A4BB3BCCDF25961F46@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Bjoern A. Zeeb" , freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org, Andriy Gapon Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 07:03:36 -0000 On 08/20/11 19:19, Steven Hartland wrote: > ----- Original Message ----- From: "Andriy Gapon" > >> on 20/08/2011 23:24 Steven Hartland said the following: >>> ----- Original Message ----- From: "Steven Hartland" >>>> Looking through the code I believe I may have noticed a scenario >>>> which could >>>> trigger the problem. >>>> >>>> Given the following code:- >>>> >>>> static void >>>> prison_deref(struct prison *pr, int flags) >>>> { >>>> struct prison *ppr, *tpr; >>>> int vfslocked; >>>> >>>> if (!(flags & PD_LOCKED)) >>>> mtx_lock(&pr->pr_mtx); >>>> /* Decrement the user references in a separate loop. */ >>>> if (flags & PD_DEUREF) { >>>> for (tpr = pr;; tpr = tpr->pr_parent) { >>>> if (tpr != pr) >>>> mtx_lock(&tpr->pr_mtx); >>>> if (--tpr->pr_uref > 0) >>>> break; >>>> KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); >>>> mtx_unlock(&tpr->pr_mtx); >>>> } >>>> /* Done if there were only user references to remove. */ >>>> if (!(flags & PD_DEREF)) { >>>> mtx_unlock(&tpr->pr_mtx); >>>> if (flags & PD_LIST_SLOCKED) >>>> sx_sunlock(&allprison_lock); >>>> else if (flags & PD_LIST_XLOCKED) >>>> sx_xunlock(&allprison_lock); >>>> return; >>>> } >>>> if (tpr != pr) { >>>> mtx_unlock(&tpr->pr_mtx); >>>> mtx_lock(&pr->pr_mtx); >>>> } >>>> } >>>> >>>> If you take a scenario of a simple one level prison setup running a >>>> single >>>> process >>>> where a prison has just been stopped. >>>> >>>> In the above code pr_uref of the processes prison is decremented. As >>>> this is the >>>> last process then pr_uref will hit 0 and the loop continues instead >>>> of breaking >>>> early. >>>> >>>> Now at the end of the loop iteration the mtx is unlocked so other >>>> process can >>>> now manipulate the jail, this is where I think the problem may be. >>>> >>>> If we now have another process come in and attach to the jail but >>>> then instantly >>>> exit, this process may allow another kernel thread to hit this same >>>> bit of code >>>> and so two process for the same prison get into the section which >>>> decrements >>>> prison0's pr_uref, instead of only one. >>>> >>>> In essence I think we can get the following flow where 1# = process1 >>>> and 2# = process2 >>>> 1#1. prison1.pr_uref = 1 (single process jail) >>>> 1#2. prison_deref( prison1,... >>>> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) >>>> 1#3. prison1.mtx_unlock <-- this now allows others to change >>>> prison1.pr_uref >>>> 1#3. prison0.pr_uref-- >>>> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) >>>> 2#2. process1.exit >>>> 2#3. prison_deref( prison1,... >>>> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) >>>> 2#5. prison1.mtx_unlock <-- this now allows others to change >>>> prison1.pr_uref >>>> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented >>>> twice by prison1) >>>> >>>> It seems like the action on the parent prison to decrement the >>>> pr_uref is >>>> happening too early, while the jail can still be used and without >>>> the lock on >>>> the child jails mtx, so causing a race condition. >>>> >>>> I think the fix is to the move the decrement of parent prison >>>> pr_uref's down >>>> so it only takes place if the jail is "really" being removed. Either >>>> that or >>>> to change the locking semantics so that once the lock is aquired in >>>> this >>>> prison_deref its not unlocked until the function completes. >>>> >>>> What do people think? >>> >>> After reviewing the changes to prison_deref in commit which added >>> hierarchical >>> jails, the removal of the lock by the inital loop on the passed in >>> prison may >>> be unintentional. >>> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h >>> >>> >>> >>> If so the following may be all that's needed to fix this issue:- >>> >>> diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c >>> --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 >>> +++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100 >>> @@ -2455,7 +2455,8 @@ >>> if (--tpr->pr_uref > 0) >>> break; >>> KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); >>> - mtx_unlock(&tpr->pr_mtx); >>> + if (tpr != pr) >>> + mtx_unlock(&tpr->pr_mtx); >>> } >>> /* Done if there were only user references to remove. */ >>> if (!(flags & PD_DEREF)) { >> >> Not sure if this would fly as is - please double check the later block >> where >> pr->pr_mtx is re-locked. > > Your right, and its actually more complex than that. Although changing > it to > not unlock in the middle of prison_deref fixes that race condition it > doesn't > prevent pr_uref being incorrectly decremented each time the jail gets into > the dying state, which is really the problem we are seeing. > > If hierarchical prisons are used there seems to be an additional problem > where the counter of all prisons in the hierarchy are decremented, but as > far as I can tell only the immediate parent is ever incremented, so another > reference problem there as well I think. > > The following patch I believe fixes both of these issues. > > I've testing with debug added and confirmed prison0's pr_uref is maintained > correctly even when a jail hits dying state multiple times. > > It essentially reverts the changes to the "if (flags & PD_DEUREF)" by > 192895 and moves it to after the jail has been actually removed. > > diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c > --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 > +++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100 > @@ -2449,27 +2449,16 @@ > mtx_lock(&pr->pr_mtx); > /* Decrement the user references in a separate loop. */ > if (flags & PD_DEUREF) { > - for (tpr = pr;; tpr = tpr->pr_parent) { > - if (tpr != pr) > - mtx_lock(&tpr->pr_mtx); > - if (--tpr->pr_uref > 0) > - break; > - KASSERT(tpr != &prison0, ("prison0 pr_uref=0")); > - mtx_unlock(&tpr->pr_mtx); > - } > + pr->pr_uref--; > /* Done if there were only user references to remove. */ > if (!(flags & PD_DEREF)) { > - mtx_unlock(&tpr->pr_mtx); > + mtx_unlock(&pr->pr_mtx); > if (flags & PD_LIST_SLOCKED) > sx_sunlock(&allprison_lock); > else if (flags & PD_LIST_XLOCKED) > sx_xunlock(&allprison_lock); > return; > } > - if (tpr != pr) { > - mtx_unlock(&tpr->pr_mtx); > - mtx_lock(&pr->pr_mtx); > - } > } > > for (;;) { > @@ -2525,6 +2514,8 @@ > /* Removing a prison frees a reference on its parent. */ > pr = ppr; > mtx_lock(&pr->pr_mtx); > + /* Ensure user reference added on create is removed */ > + pr->pr_uref--; > flags = PD_DEREF; > } > } > > Jamie from what I can tell you where the original committer of hierarchical > prisons and this in the following svn change set so would really appreciate > your feedback on this. > http://svnweb.freebsd.org/base?view=revision&revision=192895 The problem isn't with the conditional locking of tpr in prison_deref. That locking is actually correct, and there's no race condition. The trouble lies in the resurrection of dead jails, as Andriy has noted (though not just attaching, but also by setting its persist flag causes the same problem). There are two possible fixes to this. One is the patch you've given, which only decrements a parent jail's pr_uref when the child jail completely goes away (as opposed to when it loses its last uref). This provides symmetry with the current way pr_uref is incremented on the parent, which is only when a jail is created. The other fix is to increment a parent's pr_uref when a jail is resurrected, which will match the current logic in prison_deref. I like the external semantics of this solution: a jail isn't visible if it is not persistent and has no processes and no *visible* sub-jails, as opposed to having no sub-jails at all. But this solution ends up pretty complicated - there are a few places where pr_uref is incremented, where I might need to increment parent jails' pr_uref as well, much like the current tpr loop in prison_deref decrements them. Your solution removes code instead of adding it, which is generally a good thing. While it does change the semantics of pr_uref in the hierarchical case at least from what I thought it was, those semantics haven't been working properly anyway. Bjoern, I'm adding you to the CC list for this because the whole pr_uref thing was your idea (though it was pr_nprocs at the time), so you might care about the hierarchical semantics of it - or you may not. Also, this is a panic-inducing bug in current and may interest you for that reason. - Jamie From owner-freebsd-jail@FreeBSD.ORG Sun Aug 21 11:00:58 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73F3E1065670; Sun, 21 Aug 2011 11:00:58 +0000 (UTC) (envelope-from prvs=12149e54fe=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 3388E8FC13; Sun, 21 Aug 2011 11:00:56 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Sun, 21 Aug 2011 12:00:22 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 21 Aug 2011 12:00:22 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014681961.msg; Sun, 21 Aug 2011 12:00:21 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=12149e54fe=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: From: "Steven Hartland" To: "Jamie Gritton" References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org> <4E501A6A.3030801@FreeBSD.org> <3D52B19B71CA49A4BB3BCCDF25961F46@multiplay.co.uk> <4E50ADC5.6050403@FreeBSD.org> Date: Sun, 21 Aug 2011 12:01:34 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: "Bjoern A. Zeeb" , freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org, Andriy Gapon Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 11:00:58 -0000 ----- Original Message ----- From: "Jamie Gritton" >>>>> In essence I think we can get the following flow where 1# = process1 >>>>> and 2# = process2 >>>>> 1#1. prison1.pr_uref = 1 (single process jail) >>>>> 1#2. prison_deref( prison1,... >>>>> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) >>>>> 1#3. prison1.mtx_unlock <-- this now allows others to change >>>>> prison1.pr_uref >>>>> 1#3. prison0.pr_uref-- >>>>> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1) >>>>> 2#2. process1.exit >>>>> 2#3. prison_deref( prison1,... >>>>> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0) >>>>> 2#5. prison1.mtx_unlock <-- this now allows others to change >>>>> prison1.pr_uref >>>>> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented >>>>> twice by prison1) First off thanks for the feedback Jamie most appreciated :) > The problem isn't with the conditional locking of tpr in prison_deref. > That locking is actually correct, and there's no race condition. Are you sure? I do think that unlocking the mtx half way through the call allows the above scenario to create a race condition, all be it very briefly, when ignoring the overriding issue. In addition if the code where changed to so that the pr_uref++ also maintained the parents uref this would definitely lead to a potential problems in my mind, especially if you had more than one child prison, of a given parent, entering the dying state at any one time. In this case I believe you would have to acquire the locks of all the parent prisons before it would be safe to precede. > The trouble lies in the resurrection of dead jails, as Andriy has noted > (though not just attaching, but also by setting its persist flag causes > the same problem). I not sure that persistent prisons actually suffer from this in any different way tbh, as they have an additional uref increment so would never hit this case unless they have been actively removed and hence unpersisted first. > There are two possible fixes to this. One is the patch you've given, > which only decrements a parent jail's pr_uref when the child jail > completely goes away (as opposed to when it loses its last uref). This > provides symmetry with the current way pr_uref is incremented on the > parent, which is only when a jail is created. > > The other fix is to increment a parent's pr_uref when a jail is > resurrected, which will match the current logic in prison_deref. I like > the external semantics of this solution: a jail isn't visible if it is > not persistent and has no processes and no *visible* sub-jails, as > opposed to having no sub-jails at all. But this solution ends up pretty > complicated - there are a few places where pr_uref is incremented, where > I might need to increment parent jails' pr_uref as well, much like the > current tpr loop in prison_deref decrements them. Ahh yes in the hierarchical case my patch would indeed mean that none persistent parent jails would remain visible even when its last child jail is in a dying state. As you say making this not the case would likely require replacing all instances of pr_uref++ with a prison_uref method that implements the opposite of the loop in prison_dref should the prisons pr_uref be 0 when called. > Your solution removes code instead of adding it, which is generally a > good thing. While it does change the semantics of pr_uref in the > hierarchical case at least from what I thought it was, those semantics > haven't been working properly anyway. Good to know my interpretation was correct, even if I was missing the visibility factor in the hierarchical case :) > Bjoern, I'm adding you to the CC list for this because the whole pr_uref > thing was your idea (though it was pr_nprocs at the time), so you might > care about the hierarchical semantics of it - or you may not. Also, this > is a panic-inducing bug in current and may interest you for that reason. >From an admin perspective the current jail dying state does cause confusion when your not aware of its existence. You ask a jail to stop it appears to have completed that request, but really hasn't, an generally due to just a lingering tcp connection. With the introduction of hierarchical jails that gets a little worse where a whole series of jails could disappear from normal view only to be resurrected shortly after. Something to bear in mind when deciding which solution of the two presented to use. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-jail@FreeBSD.ORG Sun Aug 21 16:49:55 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 95B0C1065672; Sun, 21 Aug 2011 16:49:55 +0000 (UTC) (envelope-from jamie@FreeBSD.org) Received: from m2.gritton.org (gritton.org [64.34.175.71]) by mx1.freebsd.org (Postfix) with ESMTP id 605D68FC08; Sun, 21 Aug 2011 16:49:54 +0000 (UTC) Received: from glorfindel.gritton.org (c-174-52-133-59.hsd1.ut.comcast.net [174.52.133.59]) (authenticated bits=0) by m2.gritton.org (8.14.4/8.14.4) with ESMTP id p7LGnrL7082438; Sun, 21 Aug 2011 10:49:53 -0600 (MDT) (envelope-from jamie@FreeBSD.org) Message-ID: <4E51372F.1020606@FreeBSD.org> Date: Sun, 21 Aug 2011 10:49:51 -0600 From: Jamie Gritton User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101110 Thunderbird/3.1.6 MIME-Version: 1.0 To: Steven Hartland References: eBSD.org><82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk><4E4FE55A.9000101@FreeBSD.org> <4E501A6A.3030801@FreeBSD.org> <3D52B19B71CA49A4BB3BCCDF25961F46@multiplay.co.uk> <4E50ADC5.6050403@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Bjoern A. Zeeb" , freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org, Andriy Gapon Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 16:49:55 -0000 On 08/21/11 05:01, Steven Hartland wrote: > ----- Original Message ----- From: "Jamie Gritton" >> The problem isn't with the conditional locking of tpr in prison_deref. >> That locking is actually correct, and there's no race condition. > > Are you sure? I do think that unlocking the mtx half way through the > call allows the above scenario to create a race condition, all be it > very briefly, when ignoring the overriding issue. > > In addition if the code where changed to so that the pr_uref++ also > maintained the parents uref this would definitely lead to a potential > problems in my mind, especially if you had more than one child prison, > of a given parent, entering the dying state at any one time. > > In this case I believe you would have to acquire the locks of all > the parent prisons before it would be safe to precede. Lock order requires that I unlock the child if I want to lock the parent. While that does allow periods where neither is locked, it's safe in this case. There may be multiple processes dying in one jail, or in multiple children of a single jail. But as long as a parent jail is locked while decrementing pr_uref, then only one of these simultaneous prison_deref calls would set pr_uref to zero and continue in the loop to that prison's parent. This might be mixed with pr_uref being incremented elsewhere, but that's not a problem either as long as the jail in question is locked. >> The trouble lies in the resurrection of dead jails, as Andriy has noted >> (though not just attaching, but also by setting its persist flag causes >> the same problem). > > I not sure that persistent prisons actually suffer from this in any > different way tbh, as they have an additional uref increment so would > never hit this case unless they have been actively removed and hence > unpersisted first. Right - both the attach and persist cases are only a problem when a jail has disappeared. There are various ways for a jail to be removed, potentially to be kept around but in the dying state, but only two related ways for it to be resurrected: attaching a new process or setting the persist flag, both via jail_set with the JAIL_DYING flag passed. >> There are two possible fixes to this. One is the patch you've given, >> which only decrements a parent jail's pr_uref when the child jail >> completely goes away (as opposed to when it loses its last uref). This >> provides symmetry with the current way pr_uref is incremented on the >> parent, which is only when a jail is created. >> >> The other fix is to increment a parent's pr_uref when a jail is >> resurrected, which will match the current logic in prison_deref. I like >> the external semantics of this solution: a jail isn't visible if it is >> not persistent and has no processes and no *visible* sub-jails, as >> opposed to having no sub-jails at all. But this solution ends up pretty >> complicated - there are a few places where pr_uref is incremented, where >> I might need to increment parent jails' pr_uref as well, much like the >> current tpr loop in prison_deref decrements them. > > Ahh yes in the hierarchical case my patch would indeed mean that none > persistent parent jails would remain visible even when its last child > jail is in a dying state. > > As you say making this not the case would likely require replacing all > instances of pr_uref++ with a prison_uref method that implements the > opposite of the loop in prison_dref should the prisons pr_uref be 0 when > called. Yes, that's the problem. Maybe not all instances, but at least most have enough times a jail is unlocked that we can't assume the pr_uref hasn't been set to zero somewhere else, and so we need to do that loop. >> Your solution removes code instead of adding it, which is generally a >> good thing. While it does change the semantics of pr_uref in the >> hierarchical case at least from what I thought it was, those semantics >> haven't been working properly anyway. > > Good to know my interpretation was correct, even if I was missing the > visibility factor in the hierarchical case :) > >> Bjoern, I'm adding you to the CC list for this because the whole pr_uref >> thing was your idea (though it was pr_nprocs at the time), so you might >> care about the hierarchical semantics of it - or you may not. Also, this >> is a panic-inducing bug in current and may interest you for that reason. > > From an admin perspective the current jail dying state does cause > confusion when your not aware of its existence. You ask a jail to stop it > appears to have completed that request, but really hasn't, an generally > due to just a lingering tcp connection. > > With the introduction of hierarchical jails that gets a little worse > where a whole series of jails could disappear from normal view only to > be resurrected shortly after. Something to bear in mind when deciding > which solution of the two presented to use. The good news is that the only time a jail (or perhaps a whole set of jails) can only come back from the dead when the administrator makes a concerted effort to do so. So it at least shouldn't surprise the administrator who did that. To any other program/person that's watching, it just looks like jails were removed and then other jails with the same ID were created, the same as could happen when there are no dying-state issues, i.e. no outstanding TCP connections. One drawback to the suggested patch is that in the hierarchical case, an administrator that is naive to the JAIL_DYING flag could remove a jail and then continue to see it existing for a while anyway. In the typical single-level jail case, removing a jail always makes it go away, except to the eyes of the JAIL_DYING search. Preserving those semantics may be enough to want to go with the more complicated solution of adding a matching prison_ref call. So ... not necessarily decided on this issue after all. - Jamie From owner-freebsd-jail@FreeBSD.ORG Sun Aug 21 17:30:04 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 18CEA106566B; Sun, 21 Aug 2011 17:30:04 +0000 (UTC) (envelope-from marquis@roble.com) Received: from mx5.roble.com (mx5.roble.com [206.40.34.5]) by mx1.freebsd.org (Postfix) with ESMTP id 061028FC13; Sun, 21 Aug 2011 17:30:03 +0000 (UTC) Received: from mx5.roble.com (mx5.roble.com [206.40.34.5]) by mx5.roble.com (Postfix) with ESMTP id A806867895; Sun, 21 Aug 2011 10:30:03 -0700 (PDT) Date: Sun, 21 Aug 2011 10:30:03 -0700 (PDT) From: Roger Marquis To: Steven Hartland In-Reply-To: References: <82E865FBA30747078AF6EE3C1701F973@multiplay.co.uk> <20110820182330.C6852106566B@hub.freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Message-Id: <20110821173004.18CEA106566B@hub.freebsd.org> Cc: freebsd-jail@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 17:30:04 -0000 On Sat, 20 Aug 2011, Steven Hartland wrote: > Are you seeing a double fault panic? We're seeing both. At least one double (or more) fault finishing with "Fatal Trap 12: page fault while in kernel mode". Subsequent panics have been single fault (all visible on the IPMI console) "Fatal Trap 9: general protection fault while in kernel mode". Could well be unrelated. The system is undergoing hardware diags now. Roger Marquis From owner-freebsd-jail@FreeBSD.ORG Mon Aug 22 11:07:05 2011 Return-Path: Delivered-To: freebsd-jail@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3CEA810656AA for ; Mon, 22 Aug 2011 11:07:05 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2B11E8FC1D for ; Mon, 22 Aug 2011 11:07:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7MB75XY097180 for ; Mon, 22 Aug 2011 11:07:05 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7MB74at097178 for freebsd-jail@FreeBSD.org; Mon, 22 Aug 2011 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 22 Aug 2011 11:07:04 GMT Message-Id: <201108221107.p7MB74at097178@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-jail@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-jail@FreeBSD.org X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2011 11:07:05 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/159918 jail [jail] inter-jail communication failure o kern/156111 jail [jail] procstat -b not supported in jail o misc/155765 jail [patch] `buildworld' does not honors WITHOUT_JAIL o conf/154246 jail [jail] [patch] Bad symlink created if devfs mount poin o conf/149050 jail [jail] rcorder ``nojail'' too coarse for Jail+VNET s conf/142972 jail [jail] [patch] Support JAILv2 and vnet in rc.d/jail o conf/141317 jail [patch] uncorrect jail stop in /etc/rc.d/jail o kern/133265 jail [jail] is there a solution how to run nfs client in ja o kern/119842 jail [smbfs] [jail] "Bad address" with smbfs inside a jail o bin/99566 jail [jail] [patch] fstat(1) according to specified jid o bin/32828 jail [jail] w(1) incorrectly handles stale utmp slots with 11 problems total. From owner-freebsd-jail@FreeBSD.ORG Tue Aug 23 09:15:45 2011 Return-Path: Delivered-To: jail@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA2AB106566B for ; Tue, 23 Aug 2011 09:15:45 +0000 (UTC) (envelope-from reddvinylene@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 699BF8FC12 for ; Tue, 23 Aug 2011 09:15:45 +0000 (UTC) Received: by qyk4 with SMTP id 4so2158655qyk.13 for ; Tue, 23 Aug 2011 02:15:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=PIy2nNNS9AFhUr9K8MoKs/DkGLrxli3x9Q9bD9H/IVI=; b=YUyc2hOQ2BLWB3FPBSaOWys6BJsL1Y9SRLf727RreDAcL0UAR1NXISZqIm8nXTeaVE GQfKJO6/9xmYbF+8Ev+hxcA/8loaokk+pD66txaalszBQFF99bulDxL4VQ3BkCO5JpRp 37lw5GXeG3H7sm2OF7JtndKDbiS/DulE0mVmk= MIME-Version: 1.0 Received: by 10.229.101.20 with SMTP id a20mr2012624qco.222.1314089283339; Tue, 23 Aug 2011 01:48:03 -0700 (PDT) Received: by 10.229.109.203 with HTTP; Tue, 23 Aug 2011 01:48:03 -0700 (PDT) In-Reply-To: References: Date: Tue, 23 Aug 2011 10:48:03 +0200 Message-ID: From: Redd Vinylene To: jail@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: Old jail dir reappears after reboot - why? X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Aug 2011 09:15:45 -0000 On Fri, Jun 10, 2011 at 1:36 PM, Redd Vinylene wrote: > Hi, > > After rebooting my host server, some old dir I once had my jails in > reappears. What might be the cause of that and how do I stop it? > > More specifically, I once had my jails in /jail, but now I've moved them > all into /jails. rc.conf or fstab does not reference /jail and I can't > find any file on my system that does - so why does this dir keep reappearing > all the time? > > I have to umount it before I can delete it though. > > Anybody know? > > Thanks! Can somebody help me? Redd From owner-freebsd-jail@FreeBSD.ORG Tue Aug 23 13:33:44 2011 Return-Path: Delivered-To: jail@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21A5D1065674 for ; Tue, 23 Aug 2011 13:33:44 +0000 (UTC) (envelope-from glarkin@FreeBSD.org) Received: from mail1.sourcehosting.net (mail1.sourcehosting.net [74.205.51.45]) by mx1.freebsd.org (Postfix) with ESMTP id EBAD28FC12 for ; Tue, 23 Aug 2011 13:33:43 +0000 (UTC) Received: from 68-189-245-235.dhcp.oxfr.ma.charter.com ([68.189.245.235] helo=cube.entropy.prv) by mail1.sourcehosting.net with esmtp (Exim 4.73 (FreeBSD)) (envelope-from ) id 1Qvqdm-00056m-GY; Tue, 23 Aug 2011 09:03:42 -0400 Received: from v104.entropy.prv (v104.entropy.prv [192.168.1.104]) by cube.entropy.prv (Postfix) with ESMTP id 371F151D704A; Tue, 23 Aug 2011 09:03:47 -0400 (EDT) Message-ID: <4E53A532.7080801@FreeBSD.org> Date: Tue, 23 Aug 2011 09:03:46 -0400 From: Greg Larkin Organization: The FreeBSD Project User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.20) Gecko/20110804 Lightning/1.0b2 Thunderbird/3.1.12 MIME-Version: 1.0 To: Redd Vinylene References: In-Reply-To: X-Enigmail-Version: 1.1.1 OpenPGP: id=1C940290 X-SA-Exim-Connect-IP: 68.189.245.235 X-SA-Exim-Mail-From: glarkin@FreeBSD.org X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail1.sourcehosting.net X-Spam-Level: *** X-Spam-Status: No, score=3.2 required=5.0 tests=AWL,BAYES_00,RCVD_IN_PBL, RCVD_IN_RP_RNBL, RCVD_IN_SORBS_DUL, RDNS_DYNAMIC, TVD_RCVD_IP autolearn=no version=3.3.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mail1.sourcehosting.net) Cc: jail@freebsd.org Subject: Re: Old jail dir reappears after reboot - why? X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: glarkin@FreeBSD.org List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Aug 2011 13:33:44 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 8/23/11 4:48 AM, Redd Vinylene wrote: > On Fri, Jun 10, 2011 at 1:36 PM, Redd Vinylene wrote: > >> Hi, >> >> After rebooting my host server, some old dir I once had my jails in >> reappears. What might be the cause of that and how do I stop it? >> >> More specifically, I once had my jails in /jail, but now I've moved them >> all into /jails. rc.conf or fstab does not reference /jail and I can't >> find any file on my system that does - so why does this dir keep reappearing >> all the time? >> >> I have to umount it before I can delete it though. >> >> Anybody know? >> >> Thanks! > > > Can somebody help me? > > Redd Hi Redd, What is the output of "df /jail"? Is that directory on the same mounted filesystem as any other directories? I assume that you executed "rm -rf /jail" after you moved your jail directories to /jails? If you've done that, it may not hurt to fsck the device to make sure there aren't any filesystem problems. Regards, Greg - -- Greg Larkin http://www.FreeBSD.org/ - The Power To Serve http://www.sourcehosting.net/ - Ready. Set. Code. http://twitter.com/cpucycle/ - Follow you, follow me -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5TpTIACgkQ0sRouByUApBdLwCfSTowJ2soh+hskt+urzQMTieT IMUAmwdHM6z5fdbiUYUaJ2snPJnERCTz =fbYu -----END PGP SIGNATURE----- From owner-freebsd-jail@FreeBSD.ORG Thu Aug 25 15:33:56 2011 Return-Path: Delivered-To: freebsd-jail@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0E21106566C for ; Thu, 25 Aug 2011 15:33:56 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (mx0.hoeg.nl [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id AB1B48FC16 for ; Thu, 25 Aug 2011 15:33:56 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id 1EB7F2A28CB3; Thu, 25 Aug 2011 17:33:56 +0200 (CEST) Date: Thu, 25 Aug 2011 17:33:56 +0200 From: Ed Schouten To: freebsd-jail@freebsd.org Message-ID: <20110825153356.GA1929@hoeg.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="7cR/5cY1igHxmpEa" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: Jexec and access to tty X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Aug 2011 15:33:57 -0000 --7cR/5cY1igHxmpEa Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi all, I am not on this list, but to drop in on this discussion: I don't think you don't need patches of any sort to make jexec + TTYs work in FreeBSD 9.0. This issue has been fixed in r200732. So the changes to jexec(8) are not needed to fix this specific issue. Best regards, --=20 Ed Schouten WWW: http://80386.nl/ --7cR/5cY1igHxmpEa Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQIcBAEBAgAGBQJOVmtjAAoJEG5e2P40kaK7OJIP/3tLLEc41PewoL0yd3CzslKy 4XSkS2OnYDQa+Qi+gbrbgXtkM4aTRHSSjGW72p0uUWBepXBtpkWnrQKB7T4BLW9l Dxj6VE3zudVmvNU1jkGoKJ9vvWODi8Xb8glIID7efz6G3dgqk6wC00WcdXwdV6th +eBeL0PilddWDyg0GUUxxYjEDRZs/jHl6HWMXTKlSbt5DwHNn2F2prHTELMys7Hz oIiHg9R0wd8g+u1QgrzNcHszzV9ng6p2hydNehChiAhzvfBJX5BPUHOXY98fEiXn 3gcBfDgSwz3jzel6eSerG408a6TyMYecHqG+2ie6Aw6wxBIzo4j+RgIJHXg46Mvp hdUcUUZ8+HX192Oso3bpfcQnL+EvFKKpM71blilTzL+3clCiws7+gr9l3otaZRzC eWJFOvAQqkd4GkbLkiZMYbDbndwjd2/WUt4Fv2pZ+AJZsbIcUDB2AKU6q0Ow76RE XR9lzmZHHZbPeLzx6eWMWf8u0lHzNb7XeIyEA0H61rhcY1MWAtAnJg09HuKcxmcU x3LFlhefSJIcA9ZlfsyF14x2Nfi3ulIwkFlwS4gVdSaJFfAHPcHejYgbGMuk1Fno IRZpceVa5DY6dAplygH2oVFwNe1NLt6zZznMAhcddiuTM82pQ8Vj221mCZgJLe3J UZJ+zkXyzM7Aeazgh5jR =47DH -----END PGP SIGNATURE----- --7cR/5cY1igHxmpEa-- From owner-freebsd-jail@FreeBSD.ORG Sat Aug 27 17:47:17 2011 Return-Path: Delivered-To: freebsd-jail@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D6C6106564A; Sat, 27 Aug 2011 17:47:17 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 4F4A78FC0C; Sat, 27 Aug 2011 17:47:16 +0000 (UTC) Received: from SBHFISLREXT03 ([10.132.254.62]) by SCSFISLTC02 (8.14.3/8.14.3) with ESMTP id p7RHlGYR023236; Sat, 27 Aug 2011 12:47:16 -0500 Received: from sbhfisltcgw01.FNFIS.COM (Not Verified[10.132.248.121]) by SBHFISLREXT03 with MailMarshal (v6, 5, 4, 7535) id ; Sat, 27 Aug 2011 12:47:43 -0500 Received: from smtp.fisglobal.com ([10.132.206.31]) by sbhfisltcgw01.FNFIS.COM with Microsoft SMTPSVC(6.0.3790.4675); Sat, 27 Aug 2011 12:47:15 -0500 Received: from [10.0.0.104] (10.14.152.54) by smtp.fisglobal.com (10.132.206.31) with Microsoft SMTP Server (TLS) id 14.1.289.1; Sat, 27 Aug 2011 12:47:08 -0500 From: Devin Teske Content-Type: multipart/mixed; boundary="Apple-Mail-25--994899661" Date: Sat, 27 Aug 2011 10:47:12 -0700 Message-ID: To: FreeBSD Hackers MIME-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) X-Originating-IP: [10.14.152.54] X-OriginalArrivalTime: 27 Aug 2011 17:47:15.0198 (UTC) FILETIME=[5B7909E0:01CC64E1] Cc: Julian Elischer , FreeBSD Jail , FreeBSD RC , Dave Robison Subject: [PATCH] Add /etc/rc.d/vimage startup script for creating vnet jails X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Aug 2011 17:47:17 -0000 --Apple-Mail-25--994899661 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="ISO-8859-1" Hi All, I'd like to submit a patch for review (attached) that adds a new /etc/rc.d = script named "vimage". _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. _____________ --Apple-Mail-25--994899661 Content-Disposition: attachment; filename="vimage_rc.20110827104104.patch" Content-Type: application/octet-stream; name="vimage_rc.20110827104104.patch" Content-Transfer-Encoding: 7bit --- etc/defaults/rc.conf.orig Fri Aug 26 20:36:52 2011 +++ etc/defaults/rc.conf Sat Aug 27 10:34:54 2011 @@ -697,6 +697,43 @@ #jail_example_flags="-l -U root" # flags for jail(8) ############################################################## +### Vimage Configuration ##################################### +############################################################## +vimage_enable="NO" # Set to NO to disable starting of any vimages +vimage_parallel_start="NO" # Start vimages in the background +vimage_list="" # Space separated list of names of vimages +vimage_set_hostname_allow="YES" # Allow root user in a vimage to change its hostname +vimage_socket_unixiproute_only="NO" # Route only TCP/IP within a vimage +vimage_sysvipc_allow="YES" # Allow SystemV IPC use from within a vimage + +# +# To use rc's built-in vimage infrastructure create entries for +# each vimage, specified in vimage_list, with the following variables. +# NOTES: +# - replace 'example' with the vimage's name. +# - except rootdir, and hostname, all of the following variables may be made +# global vimage variables if you don't specify a vimage name (ie. +# vimage_fib, vimage_devfs_ruleset). +# +#vimage_example_rootdir="/usr/jail/default" # Vimage's root directory +#vimage_example_hostname="default.domain.com" # Vimage's hostname +#vimage_example_vnets="epair0b" # Vimage's vnet interfaces +#vimage_example_exec_start="/bin/sh /etc/rc" # command to execute in vimage for starting +#vimage_example_services="sshd ipfw zfs" # services to start after starting vimage +#vimage_example_exec_afterstart0="/bin/sh command" # command to execute after the one for + # starting the vimage. More than one can + # be specified using a trailing number +#vimage_example_exec_stop="/bin/sh /etc/rc.shutdown" # command to execute in vimage for stopping +#vimage_example_devfs_enable="NO" # mount devfs in the vimage +#vimage_example_devfs_ruleset="ruleset_name" # devfs ruleset to apply to vimage - + # usually you want "devfsrules_jail". +#vimage_example_fdescfs_enable="NO" # mount fdescfs in the vimage +#vimage_example_procfs_enable="NO" # mount procfs in vimage +#vimage_example_mount_enable="NO" # mount/umount vimage's fs +#vimage_example_fstab="" # fstab(5) for mount/umount +#vimage_example_flags="-l -U root" # flags for jail(8) + +############################################################## ### Define source_rc_confs, the mechanism used by /etc/rc.* ## ### scripts to source rc_conf_files overrides safely. ## ############################################################## --- etc/rc.d/vimage.orig Sat Aug 27 10:26:53 2011 +++ etc/rc.d/vimage Sat Aug 27 10:36:03 2011 @@ -0,0 +1,551 @@ +#!/bin/sh +# +# $FreeBSD$ +# + +# PROVIDE: vimage +# REQUIRE: LOGIN cleanvar +# BEFORE: securelevel +# KEYWORD: nojail shutdown + +# WARNING: This script deals with untrusted data (the data and +# processes inside the vimage) and care must be taken when changing the +# code related to this! If you have any doubt whether a change is +# correct and have security impact, please get the patch reviewed by +# the FreeBSD Security Team prior to commit. + +. /etc/rc.subr + +name="vimage" +rcvar=`set_rcvar` + +start_precmd="vimage_prestart" +start_cmd="vimage_start" +stop_cmd="vimage_stop" + +# init_variables _v +# Initialize the various vimage variables for vimage _v. +# +init_variables() +{ + _v="$1" + + if [ -z "$_v" ]; then + warn "init_variables: you must specify a vimage" + return + fi + + eval _rootdir=\"\$vimage_${_v}_rootdir\" + _devdir="${_rootdir}/dev" + _fdescdir="${_devdir}/fd" + _procdir="${_rootdir}/proc" + eval _hostname=\"\$vimage_${_v}_hostname\" + eval _vnets=\"\$vimage_${_v}_vnets\" + eval _exec=\"\$vimage_${_v}_exec\" + + i=0 + while : ; do + eval _exec_prestart${i}=\"\${vimage_${_v}_exec_prestart${i}:-\${vimage_exec_prestart${i}}}\" + [ -z "$(eval echo \"\$_exec_prestart${i}\")" ] && break + i=$((i + 1)) + done + + eval _exec_start=\"\${vimage_${_v}_exec_start:-${vimage_exec_start}}\" + eval _services=\"\${vimage_${_v}_services:-${vimage_services}}\" + + i=1 + while : ; do + eval _exec_afterstart${i}=\"\${vimage_${_v}_exec_afterstart${i}:-\${vimage_exec_afterstart${i}}}\" + [ -z "$(eval echo \"\$_exec_afterstart${i}\")" ] && break + i=$((i + 1)) + done + + i=0 + while : ; do + eval _exec_poststart${i}=\"\${vimage_${_v}_exec_poststart${i}:-\${vimage_exec_poststart${i}}}\" + [ -z "$(eval echo \"\$_exec_poststart${i}\")" ] && break + i=$((i + 1)) + done + + i=0 + while : ; do + eval _exec_prestop${i}=\"\${vimage_${_v}_exec_prestop${i}:-\${vimage_exec_prestop${i}}}\" + [ -z "$(eval echo \"\$_exec_prestop${i}\")" ] && break + i=$((i + 1)) + done + + eval _exec_stop=\"\${vimage_${_v}_exec_stop:-${vimage_exec_stop}}\" + + i=0 + while : ; do + eval _exec_poststop${i}=\"\${vimage_${_v}_exec_poststop${i}:-\${vimage_exec_poststop${i}}}\" + [ -z "$(eval echo \"\$_exec_poststop${i}\")" ] && break + i=$((i + 1)) + done + + if [ -n "${_exec}" ]; then + # simple/backward-compatible execution + _exec_start="${_exec}" + _exec_stop="" + else + # flexible execution + if [ -z "${_exec_start}" ]; then + _exec_start="/bin/sh /etc/rc" + if [ -z "${_exec_stop}" ]; then + _exec_stop="/bin/sh /etc/rc.shutdown" + fi + fi + fi + + # The default jail ruleset will be used by rc.subr if none is specified. + eval _ruleset=\"\${vimage_${_v}_devfs_ruleset:-${vimage_devfs_ruleset}}\" + eval _devfs=\"\${vimage_${_v}_devfs_enable:-${vimage_devfs_enable}}\" + [ -z "${_devfs}" ] && _devfs="NO" + eval _fdescfs=\"\${vimage_${_v}_fdescfs_enable:-${vimage_fdescfs_enable}}\" + [ -z "${_fdescfs}" ] && _fdescfs="NO" + eval _procfs=\"\${vimage_${_v}_procfs_enable:-${vimage_procfs_enable}}\" + [ -z "${_procfs}" ] && _procfs="NO" + + eval _mount=\"\${vimage_${_v}_mount_enable:-${vimage_mount_enable}}\" + [ -z "${_mount}" ] && _mount="NO" + # "/etc/fstab.${_v}" will be used for {,u}mount(8) if none is specified. + eval _fstab=\"\${vimage_${_v}_fstab:-${vimage_fstab}}\" + [ -z "${_fstab}" ] && _fstab="/etc/fstab.${_v}" + eval _flags=\"\${vimage_${_v}_flags:-${vimage_flags}}\" + [ -z "${_flags}" ] && _flags="-l -U root" + eval _consolelog=\"\${vimage_${_v}_consolelog:-${vimage_consolelog}}\" + [ -z "${_consolelog}" ] && _consolelog="/var/log/vimage_${_v}_console.log" + + # Debugging aid + # + debug "$_v devfs enable: $_devfs" + debug "$_v fdescfs enable: $_fdescfs" + debug "$_v procfs enable: $_procfs" + debug "$_v mount enable: $_mount" + debug "$_v hostname: $_hostname" + debug "$_v vnets: $_vnets" + debug "$_v services: $_services" + debug "$_v root: $_rootdir" + debug "$_v devdir: $_devdir" + debug "$_v fdescdir: $_fdescdir" + debug "$_v procdir: $_procdir" + debug "$_v ruleset: $_ruleset" + debug "$_v fstab: $_fstab" + + i=0 + while : ; do + eval out=\"\${_exec_prestart${i}:-''}\" + if [ -z "$out" ]; then + break + fi + debug "$_v exec pre-start #${i}: ${out}" + i=$((i + 1)) + done + + debug "$_v exec start: $_exec_start" + + i=1 + while : ; do + eval out=\"\${_exec_afterstart${i}:-''}\" + + if [ -z "$out" ]; then + break; + fi + + debug "$_v exec after start #${i}: ${out}" + i=$((i + 1)) + done + + i=0 + while : ; do + eval out=\"\${_exec_poststart${i}:-''}\" + if [ -z "$out" ]; then + break + fi + debug "$_v exec post-start #${i}: ${out}" + i=$((i + 1)) + done + + i=0 + while : ; do + eval out=\"\${_exec_prestop${i}:-''}\" + if [ -z "$out" ]; then + break + fi + debug "$_v exec pre-stop #${i}: ${out}" + i=$((i + 1)) + done + + debug "$_v exec stop: $_exec_stop" + + i=0 + while : ; do + eval out=\"\${_exec_poststop${i}:-''}\" + if [ -z "$out" ]; then + break + fi + debug "$_v exec post-stop #${i}: ${out}" + i=$((i + 1)) + done + + debug "$_v flags: $_flags" + debug "$_v consolelog: $_consolelog" + + if [ -z "${_hostname}" ]; then + err 3 "$name: No hostname has been defined for ${_v}" + fi + if [ -z "${_rootdir}" ]; then + err 3 "$name: No root directory has been defined for ${_v}" + fi +} + +# set_sysctl rc_knob mib msg +# If the mib sysctl is set according to what rc_knob +# specifies, this function does nothing. However if +# rc_knob is set differently than mib, then the mib +# is set accordingly and msg is displayed followed by +# an '=" sign and the word 'YES' or 'NO'. +# +set_sysctl() +{ + _knob="$1" + _mib="$2" + _msg="$3" + + _current=`${SYSCTL} -n $_mib 2>/dev/null` + if checkyesno $_knob ; then + if [ "$_current" -ne 1 ]; then + echo -n " ${_msg}=YES" + ${SYSCTL} 1>/dev/null ${_mib}=1 + fi + else + if [ "$_current" -ne 0 ]; then + echo -n " ${_msg}=NO" + ${SYSCTL} 1>/dev/null ${_mib}=0 + fi + fi +} + +# is_current_mountpoint() +# Is the directory mount point for a currently mounted file +# system? +# +is_current_mountpoint() +{ + local _dir _dir2 + + _dir=$1 + + _dir=`echo $_dir | sed -Ee 's#//+#/#g' -e 's#/$##'` + [ ! -d "${_dir}" ] && return 1 + _dir2=`df ${_dir} | tail +2 | awk '{ print $6 }'` + [ "${_dir}" = "${_dir2}" ] + return $? +} + +# is_symlinked_mountpoint() +# Is a mount point, or any of its parent directories, a symlink? +# +is_symlinked_mountpoint() +{ + local _dir + + _dir=$1 + + [ -L "$_dir" ] && return 0 + [ "$_dir" = "/" ] && return 1 + is_symlinked_mountpoint `dirname $_dir` + return $? +} + +# secure_umount +# Try to unmount a mount point without being vulnerable to +# symlink attacks. +# +secure_umount() +{ + local _dir + + _dir=$1 + + if is_current_mountpoint ${_dir}; then + umount -f ${_dir} >/dev/null 2>&1 + else + debug "Nothing mounted on ${_dir} - not unmounting" + fi +} + + +# vimage_umount_fs +# This function unmounts certain special filesystems in the +# currently selected vimage. The caller must call the init_variables() +# routine before calling this one. +# +vimage_umount_fs() +{ + local _device _mountpt _rest + + if checkyesno _fdescfs; then + if [ -d "${_fdescdir}" ] ; then + secure_umount ${_fdescdir} + fi + fi + if checkyesno _devfs; then + if [ -d "${_devdir}" ] ; then + secure_umount ${_devdir} + fi + fi + if checkyesno _procfs; then + if [ -d "${_procdir}" ] ; then + secure_umount ${_procdir} + fi + fi + if checkyesno _mount; then + [ -f "${_fstab}" ] || warn "${_fstab} does not exist" + tail -r ${_fstab} | while read _device _mountpt _rest; do + case ":${_device}" in + :#* | :) + continue + ;; + esac + secure_umount ${_mountpt} + done + fi +} + +# vimage_mount_fstab() +# Mount file systems from a per vimage fstab while trying to +# secure against symlink attacks at the mount points. +# +# If we are certain we cannot secure against symlink attacks we +# do not mount all of the file systems (since we cannot just not +# mount the file system with the problematic mount point). +# +# The caller must call the init_variables() routine before +# calling this one. +# +vimage_mount_fstab() +{ + local _device _mountpt _rest + + while read _device _mountpt _rest; do + case ":${_device}" in + :#* | :) + continue + ;; + esac + if is_symlinked_mountpoint ${_mountpt}; then + warn "${_mountpt} has symlink as parent - not mounting from ${_fstab}" + return + fi + done <${_fstab} + mount -a -F "${_fstab}" +} + +vimage_prestart() +{ + if checkyesno vimage_parallel_start; then + command_args="&" + fi +} + +vimage_start() +{ + echo -n 'Configuring vimages:' + set_sysctl vimage_set_hostname_allow \ + security.jail.set_hostname_allowed \ + set_hostname_allow + set_sysctl vimage_socket_unixiproute_only \ + security.jail.socket_unixiproute_only unixiproute_only + set_sysctl vimage_sysvipc_allow security.jail.sysvipc_allowed \ + sysvipc_allow + echo '.' + + echo -n 'Starting vimages:' + _tmp_dir=`mktemp -d /tmp/vimage.XXXXXXXX` || \ + err 3 "$name: Can't create temp dir, exiting..." + for _vimage in ${vimage_list} + do + init_variables $_vimage + if [ -f /var/run/vimage_${_vimage}.id ]; then + echo -n " [${_hostname} already running (/var/run/vimage_${_vimage}.id exists)]" + continue; + fi + if checkyesno _mount; then + info "Mounting fstab for vimage ${_vimage} (${_fstab})" + if [ ! -f "${_fstab}" ]; then + err 3 "$name: ${_fstab} does not exist" + fi + vimage_mount_fstab + fi + if checkyesno _devfs; then + # If devfs is already mounted here, skip it. + df -t devfs "${_devdir}" >/dev/null + if [ $? -ne 0 ]; then + if is_symlinked_mountpoint ${_devdir}; then + warn "${_devdir} has symlink as parent - not starting vimage ${_vimage}" + continue + fi + info "Mounting devfs on ${_devdir}" + devfs_mount_jail "${_devdir}" ${_ruleset} + # Transitional symlink for old binaries + if [ ! -L "${_devdir}/log" ]; then + __pwd="`pwd`" + cd "${_devdir}" + ln -sf ../var/run/log log + cd "$__pwd" + fi + fi + + # XXX - It seems symlinks don't work when there + # is a devfs(5) device of the same name. + # Jail console output + # __pwd="`pwd`" + # cd "${_devdir}" + # ln -sf ../var/log/console console + # cd "$__pwd" + fi + if checkyesno _fdescfs; then + if is_symlinked_mountpoint ${_fdescdir}; then + warn "${_fdescdir} has symlink as parent, not mounting" + else + info "Mounting fdescfs on ${_fdescdir}" + mount -t fdescfs fdesc "${_fdescdir}" + fi + fi + if checkyesno _procfs; then + if is_symlinked_mountpoint ${_procdir}; then + warn "${_procdir} has symlink as parent, not mounting" + else + info "Mounting procfs onto ${_procdir}" + if [ -d "${_procdir}" ] ; then + mount -t procfs proc "${_procdir}" + fi + fi + fi + _tmp_vimage=${_tmp_dir}/vimage.$$ + + i=0 + while : ; do + eval out=\"\${_exec_prestart${i}:-''}\" + [ -z "$out" ] && break + ${out} + i=$((i + 1)) + done + + eval jail ${_flags} -i -c vnet name=\"${_vimage}\" \ + host.hostname=\"${_hostname}\" \ + path=\"${_rootdir}\" persist > ${_tmp_vimage} 2>&1 + + if [ "$?" -eq 0 ] ; then + _vimage_id=$(head -1 ${_tmp_vimage}) + + for _vnet in ${_vnets}; do + ifconfig ${_vnet} vnet "${_vimage_id}" \ + > /dev/null 2>&1 + + case ${_vnet} in + epair[0-9]*[ab]) + ifconfig ${_vnet%?}a up \ + > /dev/null 2>&1;; + esac + done + + eval jexec \"${_vimage_id}\" \ + ${_exec_start} >> ${_tmp_vimage} 2>&1 + + for _service in netif routing ${_services}; do + eval jexec \"${_vimage_id}\" /bin/sh \ + /usr/sbin/service ${_service} start \ + >> ${_tmp_vimage} 2>&1 + done + + i=1 + while : ; do + eval out=\"\${_exec_afterstart${i}:-''}\" + + if [ -z "$out" ]; then + break; + fi + + jexec "${_vimage_id}" ${out} + i=$((i + 1)) + done + + echo -n " $_hostname" + tail +2 ${_tmp_vimage} >${_consolelog} + echo ${_vimage_id} > /var/run/vimage_${_vimage}.id + + i=0 + while : ; do + eval out=\"\${_exec_poststart${i}:-''}\" + [ -z "$out" ] && break + ${out} + i=$((i + 1)) + done + else + vimage_umount_fs + echo " cannot start vimage \"${_vimage}\": " + tail +2 ${_tmp_vimage} + fi + rm -f ${_tmp_vimage} + done + rmdir ${_tmp_dir} + echo '.' +} + +vimage_stop() +{ + echo -n 'Stopping vimages:' + for _vimage in ${vimage_list} + do + if [ -f "/var/run/vimage_${_vimage}.id" ]; then + _vimage_id=$(cat /var/run/vimage_${_vimage}.id) + if [ ! -z "${_vimage_id}" ]; then + init_variables $_vimage + + i=0 + while : ; do + eval out=\"\${_exec_prestop${i}:-''}\" + [ -z "$out" ] && break + ${out} + i=$((i + 1)) + done + + if [ -n "${_exec_stop}" ]; then + eval env -i /usr/sbin/jexec ${_vimage_id} ${_exec_stop} \ + >> ${_consolelog} 2>&1 + fi + killall -j ${_vimage_id} -TERM > /dev/null 2>&1 + sleep 1 + killall -j ${_vimage_id} -KILL > /dev/null 2>&1 + vimage_umount_fs + echo -n " $_hostname" + + i=0 + while : ; do + eval out=\"\${_exec_poststop${i}:-''}\" + [ -z "$out" ] && break + ${out} + i=$((i + 1)) + done + fi + rm /var/run/vimage_${_vimage}.id + jail -r ${_vimage} + else + echo " cannot stop vimage ${_vimage}. No vimage id in /var/run" + fi + done + echo '.' +} + +load_rc_config $name +cmd="$1" +if [ $# -gt 0 ]; then + shift +fi +if [ -n "$*" ]; then + vimage_list="$*" +fi + +run_rc_command "${cmd}" --Apple-Mail-25--994899661 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Essentially, a hand-tweaked version of /etc/rc.d/jail with added/removed = features. Here's how we're using it in /etc/rc.conf to successfully start up = vimage jails at boot time: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BEGIN EXCERPT =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= # # Vimages # vimage_enable=3D"YES" # Set to NO to disable starting of any vimages vimage_list=3D" vnettest " # Space-separated list of names of vimages clone_interfaces=3D"" # Initialize list of epair/bridge interfaces = to create # # Global settings for all Vimages # vimage_services=3D"sshd" ####################### VIMAGE: vnettest cloned_interfaces=3D"$cloned_interfaces epair0 bridge0" ifconfig_bridge0=3D"addm fxp0 addm epair0a" vimage_vnettest_rootdir=3D"/usr/jails/vnettest" # root = directory vimage_vnettest_hostname=3D"vnettest.jbsd.vicor.com" # hostname vimage_vnettest_devfs_enable=3D"YES" # mount devfs vimage_vnettest_vnets=3D"epair0b" # network = interfaces ####################### VIMAGE: {name} #cloned_interfaces=3D"$cloned_interfaces epair{N} bridge{N}" #ifconfig_bridge{N}=3D"addm {iface} addm epair{N}a" #vimage_{name}_rootdir=3D"/usr/jails/{name}" # root = directory #vimage_{name}_hostname=3D"{hostname}" # hostname #vimage_{name}_devfs_enable=3D"YES" # mount devfs #vimage_{name}_vnets=3D"epair{N}b" # network = interfaces =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D END EXCERPT =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --=20 Cheers, Devin= --Apple-Mail-25--994899661-- From owner-freebsd-jail@FreeBSD.ORG Sat Aug 27 17:59:05 2011 Return-Path: Delivered-To: freebsd-jail@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F6A9106564A for ; Sat, 27 Aug 2011 17:59:05 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id DF6F98FC18 for ; Sat, 27 Aug 2011 17:59:04 +0000 (UTC) Received: from sbhfislrext01.fnfis.com ([192.168.249.167]) by SCSFISLTC01 (8.14.3/8.14.3) with ESMTP id p7RH5IDt026791 for ; Sat, 27 Aug 2011 12:05:18 -0500 Received: from sbhfisltcgw01.FNFIS.COM (Not Verified[10.132.248.121]) by sbhfislrext01.fnfis.com with MailMarshal (v6, 5, 4, 7535) id ; Sat, 27 Aug 2011 12:05:14 -0500 Received: from smtp.fisglobal.com ([10.132.206.31]) by sbhfisltcgw01.FNFIS.COM with Microsoft SMTPSVC(6.0.3790.4675); Sat, 27 Aug 2011 12:05:17 -0500 Received: from [10.0.0.104] (10.14.152.54) by smtp.fisglobal.com (10.132.206.31) with Microsoft SMTP Server (TLS) id 14.1.289.1; Sat, 27 Aug 2011 12:05:11 -0500 From: Devin Teske Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Sat, 27 Aug 2011 10:05:15 -0700 Message-ID: To: FreeBSD Jail MIME-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) X-Originating-IP: [10.14.152.54] X-OriginalArrivalTime: 27 Aug 2011 17:05:17.0682 (UTC) FILETIME=[7EEA9520:01CC64DB] Cc: Dave Robison Subject: VIMAGE versus Jail w/respect to SYSCTL security_jail OIDs X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Aug 2011 17:59:05 -0000 I'm finding a systemic problem with VIMAGE jails in comparison to regular j= ails in FreeBSD-8.1. All of the following sysctl's appear to correctly affect regular jails (eit= her created via /etc/rc.d/jail or manually via jail(8)): security.jail.mount_allowed security.jail.chflags_allowed security.jail.allow_raw_sockets security.jail.sysvipc_allowed security.jail.socket_unixiproute_only security.jail.set_hostname_allowed security.jail.jail_max_af_ips Indeed, when interrogated within the jail, they show the value that was inh= erited from the underlying host at jail startup. However, none of the above sysctl's appear to be inherited by vnet jails. These would be jails that are created with the "jail -c vnet ..." syntax of= jail(8) with VIMAGE enabled in the kernel. Interrogating any of the above sysctl's from within a vnet jail always prod= uces the following default values, regardless of what you set the host valu= es to and regardless of how many times you bounce the vimage: vnettest# sysctl security.jail | grep -v param security.jail.enforce_statfs: 1 security.jail.mount_allowed: 1 security.jail.chflags_allowed: 0 security.jail.allow_raw_sockets: 0 security.jail.sysvipc_allowed: 0 security.jail.socket_unixiproute_only: 1 security.jail.set_hostname_allowed: 1 security.jail.jail_max_af_ips: 255 security.jail.jailed: 1 Any ideas are welcome. I think I'm going to go delve into the jail(8) code now, because I've slogg= ed all through the kernel and can't find anything in the kernel that passes= these values from host to jail (it must be jail(8) that's doing this funct= ionality). --=20 Devin NOTE: This comes on the back of trying to get nfsd running within a vimage = jail. I suspect that the lack of ability to change one or more of the above= sysctl's to be the reason why we can't get nfsd to fire-up. Firing up nfsd= within a vimage jail produces no results (no error status, no error text, = no log entries, nada, zip, zilch, nothing). rpcbind runs, mountd runs, but = nfsd refuses for some reason. _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. _____________