From owner-freebsd-hackers@FreeBSD.ORG Fri Jun 6 22:12:07 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CB94FF3B; Fri, 6 Jun 2014 22:12:07 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 631A4254A; Fri, 6 Jun 2014 22:12:07 +0000 (UTC) Received: from secured.by.ipfw.ru ([95.143.220.47] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1WsyRq-0003kD-Te; Fri, 06 Jun 2014 22:01:02 +0400 Message-ID: <53923C09.6020302@FreeBSD.org> Date: Sat, 07 Jun 2014 02:09:13 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: John Baldwin Subject: Re: Permit init(8) use its own cpuset group. References: <538C8F9A.4020301@FreeBSD.org> <201406051009.59432.jhb@freebsd.org> <5390C907.1070405@FreeBSD.org> <201406051559.11274.jhb@freebsd.org> <5390DB64.6010704@FreeBSD.org> In-Reply-To: <5390DB64.6010704@FreeBSD.org> X-Enigmail-Version: 1.6 Content-Type: multipart/mixed; boundary="------------000900090005030503080000" Cc: Konstantin Belousov , freebsd-hackers@freebsd.org, hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jun 2014 22:12:07 -0000 This is a multi-part message in MIME format. --------------000900090005030503080000 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit On 06.06.2014 01:04, Alexander V. Chernikov wrote: > On 05.06.2014 23:59, John Baldwin wrote: >> On Thursday, June 05, 2014 3:46:15 pm Alexander V. Chernikov wrote: >>> On 05.06.2014 18:09, John Baldwin wrote: >>>> On Wednesday, June 04, 2014 3:16:59 pm Alexander V. Chernikov wrote: >>>>> On 04.06.2014 19:06, John Baldwin wrote: >>>>>> On Monday, June 02, 2014 12:48:50 pm Konstantin Belousov wrote: >>>>>>> On Mon, Jun 02, 2014 at 06:52:10PM +0400, Alexander V. Chernikov wrote: >>>>>>>> Hello list! >>>>>>>> >>>>>>>> Currently init(8) uses group 1 which is root group. >>>>>>>> Modifications of this group affects both kernel and userland threads. >>>>>>>> Additionally, such modifications are impossible, for example, in >>>> presence >>>>>>>> of multi-queue NIC drivers (like igb or ixgbe) which binds their >> threads >>>>>> to >>>>>>>> particular cpus. >>>>>>>> >>>>>>>> Proposed change ("init_cpuset" loader tunable) permits changing cpu >>>>>>>> masks for >>>>>>>> userland more easily. Restricting user processes to migrate to/from >> CPU >>>>>>>> cores >>>>>>>> used for network traffic processing is one of the cases. >>>>>>>> >>>>>>>> Phabricator: https://phabric.freebsd.org/D141 (the same version >> attached >>>>>>>> inline) >>>>>>>> >>>>>>>> If there are no objections, I'll commit this next week. >>>>>>> Why is the tunable needed ? >>>>>> Because some people already depend on doing 'cpuset -l 0 -s 1'. It is >>>> also >>>>>> documented in our manpages that processes start in cpuset 1 by default >> so >>>>>> that you can use 'cpuset -l 0 -s 1' to move all processes, etc. >>>>>> >>>>>> For the stated problem (bound ithreads in drivers), I would actually >> like >>>> to >>>>>> fix ithreads that are bound to a specific CPU to create a different >> cpuset >>>>>> instead so they don't conflict with set 1. >>>>> Yes, this seem to be much better approach. >>>>> Please take a look on the new patch (here or in the phabricator). >>>>> Comments: >>>>> >>>>> Use different approach for modifyable root set: >>>>> >>>>> * Make sets in 0..15 internal >>>>> * Define CPUSET_SET1 & CPUSET_ITHREAD in that range >>>>> * Add cpuset_lookup_builtin() to retrieve such cpu sets by id >>>>> * Create additional root set for ithreads >>>>> * Use this set in ithread_create() >>>>> * Change intr_setaffinity() to use cpuset_iroot (do we really need >> this)? >>>>> >>>>> We can probably do the same for kprocs, but I'm unsure if we really need >> it. >>>> >>>> I imagined something a bit simpler. Just create a new set in >> intr_event_bind >>>> and bind the ithread to the new set. No need to have more magic set ids, >> etc. >>> Well, we also have userland which can modify given changes via `cpuset >>> -x', so we need to be able to add some more logic on set >>> allocation/keeping. Additionally, we can try to do the same via `cpuset >>> -t', so introducing something like cpuset_setIthread() and hooking into >>> intr_event_bind() won't probably be enough. At least I can't think out a >>> quick and easy way to do this. >> >> cpuset -x calls intr_event_bind(). If you just do it there you fix both >> places. Yup. I was wrong. This version seems to be simpler while doing the right thing. > 1:04 [0] ra# procstat -t 12 | grep irq275 > 12 100121 intr irq275: ix0:qu1 2 127 wait - > 1:04 [0] ra# cpuset -g -x 275 > irq 275 mask: 2 > 1:04 [0] ra# cpuset -g -t 100121 > tid 100121 mask: 2 > 1:04 [0] ra# cpuset -l 3 -t 100121 > -------------------------^^^------ > 1:05 [0] ra# cpuset -g -t 100121 > tid 100121 mask: 3 > >> >>>> That also means that an ithread that isn't bound to a specific CPU via >> either >>>> 'cpuset -x' or BUS_BIND_INTR() will honor 'cpuset -s 1' like other >>>> kernel processes. I think that's probably fine and sensible. The issue >> is >>> Well, it is questionable. Kernel threads are a bit different in terms of >>> TLB changes, memory working set and so on. (Personally I'd prefer to >>> separate user / kthreads / ithreads to different sets in HEAD but that's >>> another story). >>> >>> Anyway, we probably can (and should) MFC a bit different version which >>> tries to change several sets at once if user supplied set 1 as argument. >> >> No, I don't think we need umpteen special sets. I think we just need to fix >> this one specific case of bound ithreads and everything else will work fine. >> If someone wants to move kprocs out of set 1, they can already do that with >> the existing tools via cpuset -C, etc. >> > --------------000900090005030503080000 Content-Type: text/x-patch; name="D141_6.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="D141_6.diff" Index: sys/kern/kern_cpuset.c =================================================================== --- sys/kern/kern_cpuset.c +++ sys/kern/kern_cpuset.c @@ -716,6 +716,66 @@ } /* + * Apply new cpumask to the ithread. + */ +int +cpuset_setithread(lwpid_t id, cpuset_t *mask) +{ + struct cpuset *nset, *rset; + struct cpuset *parent, *old_set; + struct thread *td; + struct proc *p; + cpusetid_t cs_id; + int error; + + nset = uma_zalloc(cpuset_zone, M_WAITOK); + rset = uma_zalloc(cpuset_zone, M_WAITOK); + error = cpuset_which(CPU_WHICH_TID, id, &p, &td, &old_set); + if (((cs_id = alloc_unr(cpuset_unr)) == CPUSET_INVALID) || error != 0) + goto out; + thread_lock(td); + old_set = td->td_cpuset; + if (old_set->cs_id == 1 || (old_set->cs_id == CPUSET_INVALID && + old_set->cs_parent->cs_id == 1)) { + /* Default mask, we need to use new root set */ + error = _cpuset_create(rset, cpuset_zero, + &cpuset_zero->cs_mask, cs_id); + if (error != 0) { + PROC_UNLOCK(p); + goto out; + } + rset->cs_flags |= CPU_SET_ROOT; + parent = rset; + rset = NULL; + cs_id = CPUSET_INVALID; + } else { + /* Assume existing set was already allocated by previous call */ + parent = td->td_cpuset; + old_set = NULL; + } + + error = cpuset_shadow(parent, nset, mask); + if (error == 0) { + td->td_cpuset = nset; + sched_affinity(td); + nset = NULL; + } + thread_unlock(td); + PROC_UNLOCK(p); + if (old_set != NULL) + cpuset_rel(old_set); +out: + if (nset != NULL) + uma_zfree(cpuset_zone, nset); + if (rset != NULL) + uma_zfree(cpuset_zone, rset); + if (cs_id != CPUSET_INVALID) + free_unr(cpuset_unr, cs_id); + return (error); +} + + +/* * Creates the cpuset for thread0. We make two sets: * * 0 - The root set which should represent all valid processors in the Index: sys/kern/kern_intr.c =================================================================== --- sys/kern/kern_intr.c +++ sys/kern/kern_intr.c @@ -323,7 +323,7 @@ CPU_SET(cpu, &mask); id = ie->ie_thread->it_thread->td_tid; mtx_unlock(&ie->ie_lock); - error = cpuset_setthread(id, &mask); + error = cpuset_setithread(id, &mask); if (error) return (error); } else @@ -339,7 +339,7 @@ CPU_SET(ie->ie_cpu, &mask); id = ie->ie_thread->it_thread->td_tid; mtx_unlock(&ie->ie_lock); - (void)cpuset_setthread(id, &mask); + (void)cpuset_setithread(id, &mask); } else mtx_unlock(&ie->ie_lock); return (error); Index: sys/sys/cpuset.h =================================================================== --- sys/sys/cpuset.h +++ sys/sys/cpuset.h @@ -118,6 +118,7 @@ struct cpuset *cpuset_ref(struct cpuset *); void cpuset_rel(struct cpuset *); int cpuset_setthread(lwpid_t id, cpuset_t *); +int cpuset_setithread(lwpid_t id, cpuset_t *); int cpuset_create_root(struct prison *, struct cpuset **); int cpuset_setproc_update_set(struct proc *, struct cpuset *); char *cpusetobj_strprint(char *, const cpuset_t *); --------------000900090005030503080000--