Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Feb 2008 10:30:14 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Jeff Roberson <jroberson@chesapeake.net>
Cc:        Daniel Eischen <deischen@freebsd.org>, arch@freebsd.org, Andrew Gallatin <gallatin@cs.duke.edu>
Subject:   Re: Linux compatible setaffinity.
Message-ID:  <20080220101348.D44565@fledge.watson.org>
In-Reply-To: <20080219234101.D920@desktop>
References:  <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <Pine.GSO.4.64.0801122240510.15683@sea.ntplx.net> <20080112194521.I957@desktop> <20080219234101.D920@desktop>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 19 Feb 2008, Jeff Roberson wrote:

>> Yes, I would prefer that as well I believe.  So I'll add an extra parameter 
>> and in the linux code we'll use whatever their default is.  Of course the 
>> initial implementation will still only support curthread but I plan on 
>> finishing the rest before 8.0 is done.
>
> So what does everyone think of something like this:
>
> int cpuaffinity(int cmd, long which, int masksize, unsigned *mask);
>
> #define AFFINITY_GET	0x1
> #define	AFFINITY_SET	0x2
> #define	AFFINITY_PID	0x4
> #define	AFFINITY_TID	0x8
>
> I'm not married to any of these names.  If you know of something that would 
> be more regular please comment.
>
> Behavior according to flags would be as such:
>
> Get or set affinity and fetch from or store into mask.  Error if mask is not 
> large enough.  Fill with zeros if it's too large.
>
> If pid is specified on set all threads in the pid are set to the requested 
> affinity.  On get it doesn't make much sense but I guess I'll make it the 
> union of all threads affinities.
>
> If tid is specified the mask applies only to the requested tid.
>
> The mask is always inherited from the creating thread and propagates on 
> fork().
>
> I have these semantics implemented and appearing to work in ULE.  I can 
> implement them in 4BSD but it will be very inefficient in some edge cases 
> since each cpu doesn't have its own run queue.
>
> Binding and pinning are still both supported via the same kernel interfaces 
> as they were.  They are considered to override user specified affinity. 
> This means the kernel can temporarily bind a thread to a cpu that it does 
> not have affinity for.  I may add an assert to verify that we never leave 
> the kernel with binding still set so userspace sees only the cpus it 
> requests.
>
> The thread's affinity is stored in a cpumask variable in the thread 
> structure.  If someone wanted to implement restricting a jail to a 
> particular cpu they could add an affinity cmd that would walk all processes 
> belonging to a jail and restrict their masks appropriately. You'd also want 
> to check a jail mask on each call to affinity().
>
> Linux sched_setaffinity() should be a subset of this functionality and thus 
> easily support.
>
> Comments appreciated.  This will go in late next week.

A few thoughts:

- It would be good to have an interface to request what CPUs are available to
   use, not just what CPUs are in use.

- It would be useful to have a way to have an availability mask for what CPUs
   the thread/process is allowed to use.

The former is simply useful for applications -- in using your previous patch, 
one immediate question you want to ask as an application programmer is "tell 
me what CPUs are available so I can figure out how to distribute work, how 
many threads to start, where to bind them, etc".  The latter is useful for 
system administrators, who may want to say things like "Start apache with the 
following mask of CPUs, and let Apache determine its policy with respect to 
that bound as though the other CPUs don't exist".  It could also be used to 
create a jail bound.

So perhaps this means a slightly more complex API, but not much more complex. 
How about:

int cpuaffinity_get(scope, id, length, mask)
int cpuaffinity_getmax(scope, id, length, mask)
int cpuaffinity_set(scope, id, length, mask)
int cpuaffinity_setmax(scope, id, length, mask)

Scope would be something on the order of process (representing individual 
processes or process groups, potentially), id would be the id in that scope 
namespace, length and mask would be as you propose.  You could imagine adding 
a further field to indicate whether it's the current affinity or the maximum 
affinity, but I'm not sure the details matter all that much.  Here might be 
some application logic, though:

 	cpumask_t max;
 	int cpu, i;

 	(void)cpuaffinity_getmax(CMASK_PROC, getpid(), &max, sizeof(max));
 	for (i = 0; i < CMASK_CPUCOUNT(&max); i++) {
 		cpu = CMASK_CPUINDEX(&max, i);
 		/* Start a thread, bind it to 'cpu'. */
 		/* Or, migrate CPUs sequentially looking at data. */
 	}

In the balance between all-doing system calls and multiple system calls, this 
also makes me a bit happier, and it's not an entirely aesthetic concern. 
Differentiating get and set methods is fairly useful for tracking down 
problems when debugging, or if doing things like masking process system calls 
for security reasons.

There are two things I like from the other systems that I don't believe this 
captures well:

(1) The solaris notion of CPU sets, so that policy can be expressed in terms
     of a global CPU set namespace administered by the system administrator.
     I.e., create a CPU set "Apache", then use a tool to modify the set at
     runtime.

(2) The Darwin notion of defining CPU use policy rather than masks -- i.e., "I
     don't care what CPU it is, but run these threads on the same CPU", or "the
     same core", etc.

I'm happy for us to move ahead with the lower level interface you've defined 
without addressing these concerns, but I think we should be keeping them in 
mind as well.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080220101348.D44565>