From owner-freebsd-arch@FreeBSD.ORG  Wed May 30 11:18:00 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6692E16A46D
	for <freebsd-arch@freebsd.org>; Wed, 30 May 2007 11:18:00 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx01.syd.optusnet.com.au
	(fallbackmx01.syd.optusnet.com.au [211.29.132.93])
	by mx1.freebsd.org (Postfix) with ESMTP id F176913C44B
	for <freebsd-arch@freebsd.org>; Wed, 30 May 2007 11:17:59 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail31.syd.optusnet.com.au (mail31.syd.optusnet.com.au
	[211.29.132.102])
	by fallbackmx01.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with
	ESMTP id l4TKqjSB008093
	for <freebsd-arch@freebsd.org>; Wed, 30 May 2007 06:52:45 +1000
Received: from c211-30-225-63.carlnfd3.nsw.optusnet.com.au
	(c211-30-225-63.carlnfd3.nsw.optusnet.com.au [211.30.225.63])
	by mail31.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l4TKqfA0030283
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 30 May 2007 06:52:43 +1000
Date: Wed, 30 May 2007 06:52:42 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Kip Macy <kip.macy@gmail.com>
In-Reply-To: <b1fa29170705291204t76b9eb95jb6e391c3145455d2@mail.gmail.com>
Message-ID: <20070530062757.L93410@delplex.bde.org>
References: <20070529105856.L661@10.0.0.1> <200705291456.38515.jhb@freebsd.org>
	<b1fa29170705291204t76b9eb95jb6e391c3145455d2@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: rusage breakdown and cpu limits.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 May 2007 11:18:00 -0000

On Tue, 29 May 2007, Kip Macy wrote:

>> I think using a per-process spin lock (or a pool of spin locks) would be a
>> good first step.  I wouldn't do anything more complicated unless the simple
>> approach doesn't work.  The only reason to not move the check into 
>> userret()
>> would be if one is worried about threads chewing up CPU time while they are
>> in the kernel w/o bouncing out to userland.  Also, it matters which one
>> happens more often (userret() vs mi_switch()).  If on average threads 
>> perform
>> multiple system calls during a single time slice (no idea if this is true 
>> or
>> not), then moving the check to userret() would actually hurt performance.
>
> Processes can certainly make numerous system calls within a single
> time slice.

Not many more than a few hundred million syscalls can be made within a
timeslice of 100 mS.  FreeBSD does too many context switches for
interrupts, so the number in practice seem to be mostly in the range of
1-10, but I hope for 100-1000.

> However, in userret it would be protected by a per process
> or per thread blocking mutex as opposed to a global spin mutex. It
> would be surprising if it isn't a net win, although it is quite
> possible that on a 2-way system the extra locking could have an
> adverse effect on some workloads.

Any locking within userret() would be a good pessimization.  There
are none now, but still a lot of bloat.

In this case, correct proc locking isn't even possible, since the
runtime update must occur while something like a global scheduler lock
is held.  When a context switch occurs, the lock must protect at least
the old process and the new process, and somehow prevent interference
from other processes.  The update of the runtime needs essentially the
same lock.  Any locking in userret() would need to use the same lock
as the update to be perfectly correct.  Fortunately, the cpulimit limit
check only needs to be correct to within seconds or even minutes.  A
sloppy unlocked check don't often enough would work OK, at least if
you re-check with correct locking before killing the process.
Alternatively, the sloppiness can be due to delayed updates -- let the
rusage data lag by up to a second or so in the context of the check;
the runtime would accumulate accurately somewhere, but the check
wouldn't see it all the accumulation step would need the full lock
for reading and writing the scattered data and a lesser lock for
updating the accumulated data.  userret() still shouldn't be pessimized
by acquiring the lesser lock.

I still think this misses the point -- the check is the easy part,
and can be done at no extra locking cost while the full lock is held.

Bruce