Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Sep 2007 14:34:58 -0400
From:      "Chris Thunes" <cthunes@tqhosting.com>
To:        "Miroslav Lachman" <000.fbsd@quip.cz>
Cc:        cdjones@novusordo.net, freebsd-jail@freebsd.org
Subject:   Re: jtune not showing resource usage - fixed
Message-ID:  <b722430e0709261134j7d60cc3br8bcc628a07ba43a1@mail.gmail.com>
In-Reply-To: <46FA8EEA.4090302@quip.cz>
References:  <b722430e0708131550i12ac1f25ke7197c2f4cb28857@mail.gmail.com> <46FA8EEA.4090302@quip.cz>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
  I do indeed have an idea. I think I should learn how to properly make a
patch :P
All those - should be + instead.

  I've actually made a number of other changes in the past month or so to
fix a
number of problems with page faults from the original patch. These page
faults
were caused by a null pointer in thread stuctures during forking. It seems
to be
fairly easy to demonstrate this issue by executing any program that does a
decent
amount of
forking from inside a jail. I'm not exactly sure why these values are null,
but it seems to be well outside the context of these patches. I added some
simple
checks for this null value and that seems to have cleared things up.
  I've attached an updated (and hopefully functional) patch that is against
the
original version of src/sys/kern/kern_jail.c so you will need to revert this
one file
before applying it. I've done my work on 6.2 so I can't say if this patch
will work on
other versions. This patch also includes the changes to get memory limits to
be
displayed.
  Let me know how it goes and hopefully the patch is correct this time :)

- Chris Thunes

http://www.rootbsd.net


On 9/26/07, Miroslav Lachman <000.fbsd@quip.cz> wrote:
>
> Chris Thunes wrote:
> > On 8/13/07, *Miroslav Lachman* <000.fbsd@quip.cz
> > <mailto:000.fbsd@quip.cz>> wrote:
> >
> >     Chris Thunes wrote:
> >      > Hey all,
> >      >   I've been working with the resource limiting patches on a 6.2
> >     installation
> >      > and haven't been able to get jtune to show memory usage for jails
> >     at all.
> >      >
> >      > [root@virt1] ~ # jtune -j 15 -i
> >      > JID Hostname Memory Used / Limit CPU Shares
> >      > 15 jail0.rootbsd.net <http://jail0.rootbsd.net>; 0 M / 256 M 0
> >      >
> >      > I have the limits enabled in sysctl and really have idea as to
> >     why this
> >      > wouldn't be displaying correctly. If there is anyone who can
> >     point me in the
> >      > right direction the help would be greatly appreciated.
> >
> >     Hi,
> >     I had same question more than month ago, but no answer (2007-06-29).
> So
> >     I think no competent person is subscribed to this list.
> >     [I CCed cdjones now = maybe he knows :)]
> >
> >     Miroslav Lachman
> >
> >
> > I found the problem and was able to fix it and created a small patch for
> > anyone who needs this fixed. A function called prison_memory in
> > sys/kern/kern_jail.c is called to calculate the memory usage for a given
> > jail but this value is never stored back to the corresponding prison
> > object which is used by jtune to check the memory usage. This patch just
> > drops a few lines in at the end of prison_memory to store this value to
> > the structure. If anyone knows any adverse side effects this would cause
> > please let me know.
> >
> > - Chris
>
> Hi,
>
> I tried your patch, but without success. My version of kern_jail.c
> (after patching by cdjones_jail_soc2006.patch) does not have those lines
> which your patch wants to remove. (jtune still showing 0M usage [freebsd
> 6.2])
>
> Any ideas?
>
> Miroslav Lachman
>

[-- Attachment #2 --]
--- src/sys/kern/kern_jail.c	Sat Nov 12 22:12:32 2005
+++ src/sys/kern/kern_jail.c	Thu Sep 20 17:47:21 2007
@@ -5,6 +5,35 @@
  * can do whatever you want with this stuff. If we meet some day, and you think
  * this stuff is worth it, you can buy me a beer in return.   Poul-Henning Kamp
  * ----------------------------------------------------------------------------
+ *
+ *  Portions copyright (c) 2006 Chris Jones
+ *  All rights reserved.
+ *
+ * This software was developed for the FreeBSD Project by Chris Jones
+ * thanks to the support of Google's Summer of Code program and
+ * mentoring by Kip Macy.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE. 
+ *
  */
 
 #include <sys/cdefs.h>
@@ -15,12 +44,19 @@
 #include <sys/param.h>
 #include <sys/types.h>
 #include <sys/kernel.h>
+#include <sys/kthread.h>
 #include <sys/systm.h>
 #include <sys/errno.h>
 #include <sys/sysproto.h>
 #include <sys/mac.h>
 #include <sys/malloc.h>
 #include <sys/proc.h>
+#include <vm/vm.h>
+#include <vm/vm_extern.h>
+#include <vm/vm_page.h>
+#include <vm/vm_object.h>
+#include <vm/vm_map.h>
+#include <vm/vm_pageout.h>
 #include <sys/taskqueue.h>
 #include <sys/jail.h>
 #include <sys/lock.h>
@@ -71,6 +107,17 @@
     &jail_chflags_allowed, 0,
     "Processes in jail can alter system file flags");
 
+int     jail_limit_memory = 0;
+SYSCTL_INT(_security_jail, OID_AUTO, limit_jail_memory, CTLFLAG_RW,
+	   &jail_limit_memory, 0,
+	   "Limit jails' memory usage");
+
+int     jail_memory_pager_interval = 5;
+SYSCTL_INT(_security_jail, OID_AUTO, jail_pager_interval,
+	   CTLTYPE_INT | CTLFLAG_RW,
+	   &jail_memory_pager_interval, 0,
+	   "Interval between jail memory limit checks");
+
 /* allprison, lastprid, and prisoncount are protected by allprison_mtx. */
 struct	prisonlist allprison;
 struct	mtx allprison_mtx;
@@ -92,6 +139,105 @@
 
 SYSINIT(prison, SI_SUB_INTRINSIC, SI_ORDER_ANY, init_prison, NULL);
 
+static void
+jpager_td(void *arg)
+{
+	struct proc *p;
+	struct prison *pr = arg;
+	struct thread *td;
+	long limit, cursize, newsize, usage;
+	int breakout;
+	int flags = J_PAGER_TD_ACTIVE;
+	pr->pr_pager_flags_ptr = &flags;
+	
+	for (;;) {
+		if (flags & J_PAGER_TD_DIE)
+			break;
+	       
+		if (jail_limit_memory && pr->pr_mem_limit) {
+			/* 
+			 * TODO: consider whether it might be better to start
+			 * pushing back when we approach the limit, rather than
+			 * when we hit it.
+			 * 
+			 */
+			limit = prison_memory_limit(pr);
+			
+			sx_slock(&allproc_lock);
+			usage = prison_memory(pr);
+
+			mtx_lock(&pr->pr_mtx);
+			pr->pr_mem_usage = usage;
+			mtx_unlock(&pr->pr_mtx);
+
+			/*
+			 * The logic from vm_daemon() really needs to go here.
+			 * Problem: we want to push things below their rlimits,
+			 * and vm_daemon doesn't do that.  It'd be better to 
+			 * refactor vm_daemon to fit, but this'll do for now.
+			 *
+			 */
+			
+			if ((usage - limit) > 0) {
+
+				LIST_FOREACH(p, &allproc, p_list) {
+					
+				  if ( p->p_ucred == NULL || pr != p->p_ucred->cr_prison)
+						continue;
+					
+					PROC_LOCK(p);
+					if (p->p_flag & (P_SYSTEM | P_WEXIT)) {
+						PROC_UNLOCK(p);
+						continue;
+					}
+					
+					mtx_lock_spin(&sched_lock);
+					breakout = 0;
+					FOREACH_THREAD_IN_PROC(p, td) {
+						if (!TD_ON_RUNQ(td) &&
+						    !TD_IS_RUNNING(td) &&
+						    !TD_IS_SLEEPING(td)) {
+							breakout = 1;
+							break;
+						}
+					}
+					mtx_unlock_spin(&sched_lock);
+					if (breakout) {
+						PROC_UNLOCK(p);
+						continue;
+					}
+					
+					/* NOTE: we differ here from vm_daemon b/c we don't 
+					 * care about the rlimit; things that are exceeding that will
+					 * get caught in due course.  We need, however, to decrease
+					 * the pressure on our permitted memory allocation.  Fortunately, 
+					 * we only care about eventually hitting the limit, so if we
+					 * don't get there right away, it's okay.
+					 */      
+					
+					/* TODO: this arbitrarily reduces each process's space by
+					 * 6.25% (until it's completely swapped out) while
+					 * we're under memory pressure.  A better way would be 
+					 * to either hit large processes first, or to hit the
+					 * least-active processes first, or go proportionally,
+					 * or .... 
+					 */
+					newsize = cursize = vmspace_resident_count(p->p_vmspace);
+					newsize -= newsize / 16;
+					if (cursize < 0)
+						newsize = 0;
+					PROC_UNLOCK(p);
+					vm_pageout_map_deactivate_pages(&p->p_vmspace->vm_map, newsize);
+				} /* end LIST_FOREACH procs */
+			}
+			sx_sunlock(&allproc_lock);
+		}
+		tsleep(pr, 0, "-", jail_memory_pager_interval * hz); 
+	}
+	
+	kthread_exit(0);
+}
+
 /*
  * MPSAFE
  *
@@ -106,6 +252,7 @@
 	struct prison *pr, *tpr;
 	struct jail j;
 	struct jail_attach_args jaa;
+	struct proc *j_pager_proc = NULL;
 	int vfslocked, error, tryprid;
 
 	error = copyin(uap->jail, &j, sizeof(j));
@@ -135,7 +282,9 @@
 		goto e_dropvnref;
 	pr->pr_ip = j.ip_number;
 	pr->pr_linux = NULL;
+	pr->pr_sched_shares = j.sched_shares;
 	pr->pr_securelevel = securelevel;
+	pr->pr_mem_limit = j.mem_limit;
 
 	/* Determine next pr_id and add prison to allprison list. */
 	mtx_lock(&allprison_mtx);
@@ -159,6 +308,11 @@
 	prisoncount++;
 	mtx_unlock(&allprison_mtx);
 
+	if (kthread_create(jpager_td, pr, (void *) j_pager_proc, 0, 0, "jpager %d", pr->pr_id))
+		goto e_dropprref;
+	KASSERT(j_pager_proc != NULL, ("NULL j_pager_proc"));
+	pr->pr_pager = j_pager_proc;
+
 	error = jail_attach(td, &jaa);
 	if (error)
 		goto e_dropprref;
@@ -168,6 +322,10 @@
 	td->td_retval[0] = jaa.jid;
 	return (0);
 e_dropprref:
+	if (j_pager_proc != NULL) {
+		*pr->pr_pager_flags_ptr = J_PAGER_TD_DIE;
+		wakeup(pr);
+	}
 	mtx_lock(&allprison_mtx);
 	LIST_REMOVE(pr, pr_list);
 	prisoncount--;
@@ -282,6 +440,10 @@
 		prisoncount--;
 		mtx_unlock(&allprison_mtx);
 
+                /* Tell scheduler, pager to die.  No need to wait. */
+		*pr->pr_pager_flags_ptr = J_PAGER_TD_DIE;
+		wakeup(pr);
+
 		TASK_INIT(&pr->pr_task, 0, prison_complete, pr);
 		taskqueue_enqueue(taskqueue_thread, &pr->pr_task);
 		return;
@@ -393,6 +555,45 @@
 	return (ok);
 }
 
+/* Given credential, return memory usage in bytes. */
+long
+prison_memory(struct prison *pr)
+{
+	struct proc *p;
+	long mem_used = 0;
+	
+	/* 
+	 * TODO: this is a really bad way of doing the
+	 * search, as we end up going across all processes
+	 * for each jail.  It'd be more efficient to just do 
+	 * this once in a period and update the relevant jail.
+	 *
+	 */
+	FOREACH_PROC_IN_SYSTEM(p) {
+		if ( p->p_ucred == NULL ||
+		     p->p_vmspace == NULL ||
+		     !jailed(p->p_ucred) ||
+		     (pr != p->p_ucred->cr_prison))
+			continue;
+
+		mem_used += vmspace_resident_count(p->p_vmspace);
+	}
+       	mem_used *= PAGE_SIZE;
+
+	return mem_used;
+}
+
+/* Given credential, return permitted memory usage in bytes. */
+long
+prison_memory_limit(struct prison *pr)
+{
+	vm_pindex_t memlimit;
+	mtx_lock(&pr->pr_mtx);
+	memlimit = (vm_pindex_t) pr->pr_mem_limit;
+	mtx_unlock(&pr->pr_mtx);
+	return memlimit;
+}
+
 /*
  * Return 0 if jails permit p1 to frob p2, otherwise ESRCH.
  */
@@ -523,6 +724,52 @@
 	}
 }
 
+/* 
+ * Change resource limit for a prison.
+ * 
+ * unsigned int jid: id of jail to mess with
+ *
+ * int cpushares:  0 -> remove prison from cpu limits
+ *                -1 -> don't change existing shares
+ *                >0 -> set cpu shares
+ *
+ * int memlimit:   0 -> remove prison from mem limits
+ *                -1 -> don't change existing limit
+ *                >1 -> set memory limit (bytes)
+ *
+ * TODO: might this be better handled via a writable 
+ * sysctl than with a new syscall?
+ */
+int
+jail_set_resource_limits(struct thread *td, struct jail_set_resource_limits_args *uap)
+{
+	struct prison *pr;
+	int error;
+
+	error = suser(td);
+	if (error)
+		return (error);
+
+	mtx_lock(&allprison_mtx);
+	LIST_FOREACH(pr, &allprison, pr_list) {
+		if (pr->pr_id == uap->jid)
+			break;
+	}
+	if (NULL == pr) {
+		mtx_unlock(&allprison_mtx);
+		return 1;
+	}
+	
+	mtx_lock(&pr->pr_mtx);
+	if (-1 != uap->cpushares)
+		pr->pr_sched_shares = uap->cpushares;
+	if (-1 != uap->memlimit)
+		pr->pr_mem_limit = uap->memlimit;
+	mtx_unlock(&pr->pr_mtx);
+	mtx_unlock(&allprison_mtx);
+	return 0;
+}
+
 static int
 sysctl_jail_list(SYSCTL_HANDLER_ARGS)
 {
@@ -555,6 +802,10 @@
 		strlcpy(xp->pr_path, pr->pr_path, sizeof(xp->pr_path));
 		strlcpy(xp->pr_host, pr->pr_host, sizeof(xp->pr_host));
 		xp->pr_ip = pr->pr_ip;
+		xp->pr_sched_shares = pr->pr_sched_shares;
+		xp->pr_estcpu = pr->pr_estcpu;
+		xp->pr_mem_limit = pr->pr_mem_limit;
+		xp->pr_mem_usage = pr->pr_mem_usage;
 		mtx_unlock(&pr->pr_mtx);
 		xp++;
 	}
help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b722430e0709261134j7d60cc3br8bcc628a07ba43a1>