Date: Wed, 20 Feb 2008 10:30:14 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Jeff Roberson <jroberson@chesapeake.net> Cc: Daniel Eischen <deischen@freebsd.org>, arch@freebsd.org, Andrew Gallatin <gallatin@cs.duke.edu> Subject: Re: Linux compatible setaffinity. Message-ID: <20080220101348.D44565@fledge.watson.org> In-Reply-To: <20080219234101.D920@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> <20080112170831.A957@desktop> <Pine.GSO.4.64.0801122240510.15683@sea.ntplx.net> <20080112194521.I957@desktop> <20080219234101.D920@desktop>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 19 Feb 2008, Jeff Roberson wrote: >> Yes, I would prefer that as well I believe. So I'll add an extra parameter >> and in the linux code we'll use whatever their default is. Of course the >> initial implementation will still only support curthread but I plan on >> finishing the rest before 8.0 is done. > > So what does everyone think of something like this: > > int cpuaffinity(int cmd, long which, int masksize, unsigned *mask); > > #define AFFINITY_GET 0x1 > #define AFFINITY_SET 0x2 > #define AFFINITY_PID 0x4 > #define AFFINITY_TID 0x8 > > I'm not married to any of these names. If you know of something that would > be more regular please comment. > > Behavior according to flags would be as such: > > Get or set affinity and fetch from or store into mask. Error if mask is not > large enough. Fill with zeros if it's too large. > > If pid is specified on set all threads in the pid are set to the requested > affinity. On get it doesn't make much sense but I guess I'll make it the > union of all threads affinities. > > If tid is specified the mask applies only to the requested tid. > > The mask is always inherited from the creating thread and propagates on > fork(). > > I have these semantics implemented and appearing to work in ULE. I can > implement them in 4BSD but it will be very inefficient in some edge cases > since each cpu doesn't have its own run queue. > > Binding and pinning are still both supported via the same kernel interfaces > as they were. They are considered to override user specified affinity. > This means the kernel can temporarily bind a thread to a cpu that it does > not have affinity for. I may add an assert to verify that we never leave > the kernel with binding still set so userspace sees only the cpus it > requests. > > The thread's affinity is stored in a cpumask variable in the thread > structure. If someone wanted to implement restricting a jail to a > particular cpu they could add an affinity cmd that would walk all processes > belonging to a jail and restrict their masks appropriately. You'd also want > to check a jail mask on each call to affinity(). > > Linux sched_setaffinity() should be a subset of this functionality and thus > easily support. > > Comments appreciated. This will go in late next week. A few thoughts: - It would be good to have an interface to request what CPUs are available to use, not just what CPUs are in use. - It would be useful to have a way to have an availability mask for what CPUs the thread/process is allowed to use. The former is simply useful for applications -- in using your previous patch, one immediate question you want to ask as an application programmer is "tell me what CPUs are available so I can figure out how to distribute work, how many threads to start, where to bind them, etc". The latter is useful for system administrators, who may want to say things like "Start apache with the following mask of CPUs, and let Apache determine its policy with respect to that bound as though the other CPUs don't exist". It could also be used to create a jail bound. So perhaps this means a slightly more complex API, but not much more complex. How about: int cpuaffinity_get(scope, id, length, mask) int cpuaffinity_getmax(scope, id, length, mask) int cpuaffinity_set(scope, id, length, mask) int cpuaffinity_setmax(scope, id, length, mask) Scope would be something on the order of process (representing individual processes or process groups, potentially), id would be the id in that scope namespace, length and mask would be as you propose. You could imagine adding a further field to indicate whether it's the current affinity or the maximum affinity, but I'm not sure the details matter all that much. Here might be some application logic, though: cpumask_t max; int cpu, i; (void)cpuaffinity_getmax(CMASK_PROC, getpid(), &max, sizeof(max)); for (i = 0; i < CMASK_CPUCOUNT(&max); i++) { cpu = CMASK_CPUINDEX(&max, i); /* Start a thread, bind it to 'cpu'. */ /* Or, migrate CPUs sequentially looking at data. */ } In the balance between all-doing system calls and multiple system calls, this also makes me a bit happier, and it's not an entirely aesthetic concern. Differentiating get and set methods is fairly useful for tracking down problems when debugging, or if doing things like masking process system calls for security reasons. There are two things I like from the other systems that I don't believe this captures well: (1) The solaris notion of CPU sets, so that policy can be expressed in terms of a global CPU set namespace administered by the system administrator. I.e., create a CPU set "Apache", then use a tool to modify the set at runtime. (2) The Darwin notion of defining CPU use policy rather than masks -- i.e., "I don't care what CPU it is, but run these threads on the same CPU", or "the same core", etc. I'm happy for us to move ahead with the lower level interface you've defined without addressing these concerns, but I think we should be keeping them in mind as well. Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080220101348.D44565>