From owner-freebsd-arch@FreeBSD.ORG Sun Jan 13 03:12:24 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 90C8416A418; Sun, 13 Jan 2008 03:12:24 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id 776DB13C44B; Sun, 13 Jan 2008 03:12:24 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.107] (cpe-24-94-75-93.hawaii.res.rr.com [24.94.75.93]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id m0D3CJg2069299; Sat, 12 Jan 2008 22:12:21 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sat, 12 Jan 2008 17:14:40 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Robert Watson In-Reply-To: <20080112182948.F36731@fledge.watson.org> Message-ID: <20080112170831.A957@desktop> References: <20071219211025.T899@desktop> <18311.49715.457070.397815@grasshopper.cs.duke.edu> <20080112182948.F36731@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org, Andrew Gallatin Subject: Re: Linux compatible setaffinity. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2008 03:12:24 -0000 On Sat, 12 Jan 2008, Robert Watson wrote: > > On Fri, 11 Jan 2008, Andrew Gallatin wrote: > >> I'm somewhat surprised that this has not hit the tree yet. What happened? >> Wasn't the consensus that it was a good thing? > > I think Jeff just got busy with other stuff. > >> FWIW, I was too busy to reply at the time, but I agree that the Apple >> interface is nice. However, sometimes one needs a hard CPU binding >> interface like this one, and I don't see any reason to defer adding this >> interface in favor of the Apple one, since they are somewhat orthogonal. >> I'd be strongly in favor of having a hard CPU binding interface. > > The Apple API is nice in terms of capabilities, but we wouldn't be able to > use it directly as it Mach-esque (as I understand it). Of course, Jeff's > implementation of the Linux API doesn't actually fully implement the API (it > doesn't support constraining the CPU set vs. binding to one CPU, and the > patch as-provided didn't support querying the binding). I agree I'd like to > see if in the tree, if only because it would let me eliminate local hacks I > have that do the same thing, but we should think about other interfaces that > are more expressive in the longer term. > > For example, one thing I like about the Apple interface is the ability to > specify general strategies for affinity rather than specific affinities -- > "these threads like to be together, but they don't mind where that is". > Likewise, the Solaris facility to be able to change a CPU set and have all > the things pinned to it follow the centrally-administered set is a nice match > for our concept of Jail. Finally, if we do want it to work well with Jail, > and we want Jails to be able to be pinned to sets of CPUs, we also need a > nested concept of how to handle affinity, in the event that the set of CPUs a > Jail is running on changes, in which case perhaps you want relative numbering > within the jail, or some other similar notion. > > Sounds like a nice whiteboard session at the BSDCan developer summit... Robert raises some excellent points. I really see three different components here that may not fit within the same api. 1) Binding to a specific cpu or specific set of cpus. This is well handled by the linux api and is necessary under some circumstances. 2) CPU sets and resource management for jails or provisioning. 3) An interface for cache aware applications to hint the scheduler. Sort of like madvise() for cpus. Hopefully most applications would use 3 and 2 would be more of a system level configuration item. The cache aware scheduling would just look at the possible set of cpus for a thread and make some decision about where to put it. There are cerainly a lot of edge cases as robert mentioned. Now, there is one problem with the linux api that I want to discuss before I commit it. The current patch always works on curthread. However, the api allows for setting the binding of a pid. I believe, although I'm not certain, that pids and tids in linux are in the same number space. It's not clear to me whether you can set an affinity for an entire process and have it effect an individual thread or whether you set it on a thread by thread basis. When supplying a non-curproc pid do you bind all threads in the target process? Are our tids and pids in the same number space? And are they available to application programmers? I haven't followed that very carefully. Regardless of what linux does we should figure out what we want to do and hopefully implement that in a way that works with the linuxulator or minimum (hopefully none) application porting effort. Thanks, Jeff > > Robert N M Watson > Computer Laboratory > University of Cambridge >