From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 11 16:54:05 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EEF35963; Wed, 11 Jun 2014 16:54:05 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7F16B2C64; Wed, 11 Jun 2014 16:54:05 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1Wuhre-000Opg-5C; Wed, 11 Jun 2014 16:42:50 +0400 Message-ID: <53988933.5010902@FreeBSD.org> Date: Wed, 11 Jun 2014 20:52:03 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: John Baldwin Subject: Re: Permit init(8) use its own cpuset group. References: <538C8F9A.4020301@FreeBSD.org> <5390DB64.6010704@FreeBSD.org> <53923C09.6020302@FreeBSD.org> <201406091343.39584.jhb@freebsd.org> In-Reply-To: <201406091343.39584.jhb@freebsd.org> Content-Type: multipart/mixed; boundary="------------030603020702090205010604" Cc: Konstantin Belousov , freebsd-hackers@freebsd.org, hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jun 2014 16:54:06 -0000 This is a multi-part message in MIME format. --------------030603020702090205010604 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit On 09.06.2014 21:43, John Baldwin wrote: > On Friday, June 06, 2014 6:09:13 pm Alexander V. Chernikov wrote: >> On 06.06.2014 01:04, Alexander V. Chernikov wrote: >>> On 05.06.2014 23:59, John Baldwin wrote: >>>> On Thursday, June 05, 2014 3:46:15 pm Alexander V. Chernikov wrote: >>>>> On 05.06.2014 18:09, John Baldwin wrote: >>>>>> On Wednesday, June 04, 2014 3:16:59 pm Alexander V. Chernikov wrote: >>>>>>> On 04.06.2014 19:06, John Baldwin wrote: >>>>>>>> On Monday, June 02, 2014 12:48:50 pm Konstantin Belousov wrote: >>>>>>>>> On Mon, Jun 02, 2014 at 06:52:10PM +0400, Alexander V. Chernikov wrote: >>>>>>>>>> Hello list! >>>>>>>>>> >>>>>>>>>> Currently init(8) uses group 1 which is root group. >>>>>>>>>> Modifications of this group affects both kernel and userland threads. >>>>>>>>>> Additionally, such modifications are impossible, for example, in >>>>>> presence >>>>>>>>>> of multi-queue NIC drivers (like igb or ixgbe) which binds their >>>> threads >>>>>>>> to >>>>>>>>>> particular cpus. >>>>>>>>>> >>>>>>>>>> Proposed change ("init_cpuset" loader tunable) permits changing cpu >>>>>>>>>> masks for >>>>>>>>>> userland more easily. Restricting user processes to migrate to/from >>>> CPU >>>>>>>>>> cores >>>>>>>>>> used for network traffic processing is one of the cases. >>>>>>>>>> >>>>>>>>>> Phabricator: https://phabric.freebsd.org/D141 (the same version >>>> attached >>>>>>>>>> inline) >>>>>>>>>> >>>>>>>>>> If there are no objections, I'll commit this next week. >>>>>>>>> Why is the tunable needed ? >>>>>>>> Because some people already depend on doing 'cpuset -l 0 -s 1'. It is >>>>>> also >>>>>>>> documented in our manpages that processes start in cpuset 1 by default >>>> so >>>>>>>> that you can use 'cpuset -l 0 -s 1' to move all processes, etc. >>>>>>>> >>>>>>>> For the stated problem (bound ithreads in drivers), I would actually >>>> like >>>>>> to >>>>>>>> fix ithreads that are bound to a specific CPU to create a different >>>> cpuset >>>>>>>> instead so they don't conflict with set 1. >>>>>>> Yes, this seem to be much better approach. >>>>>>> Please take a look on the new patch (here or in the phabricator). >>>>>>> Comments: >>>>>>> >>>>>>> Use different approach for modifyable root set: >>>>>>> >>>>>>> * Make sets in 0..15 internal >>>>>>> * Define CPUSET_SET1 & CPUSET_ITHREAD in that range >>>>>>> * Add cpuset_lookup_builtin() to retrieve such cpu sets by id >>>>>>> * Create additional root set for ithreads >>>>>>> * Use this set in ithread_create() >>>>>>> * Change intr_setaffinity() to use cpuset_iroot (do we really need >>>> this)? >>>>>>> We can probably do the same for kprocs, but I'm unsure if we really need >>>> it. >>>>>> I imagined something a bit simpler. Just create a new set in >>>> intr_event_bind >>>>>> and bind the ithread to the new set. No need to have more magic set ids, >>>> etc. >>>>> Well, we also have userland which can modify given changes via `cpuset >>>>> -x', so we need to be able to add some more logic on set >>>>> allocation/keeping. Additionally, we can try to do the same via `cpuset >>>>> -t', so introducing something like cpuset_setIthread() and hooking into >>>>> intr_event_bind() won't probably be enough. At least I can't think out a >>>>> quick and easy way to do this. >>>> cpuset -x calls intr_event_bind(). If you just do it there you fix both >>>> places. >> Yup. I was wrong. This version seems to be simpler while doing the right >> thing. > Hmm, I do think this patch is doing the right thing, but I'm worried you > might have weird effects, e.g. I worry that because the 'intr' proc is in > set 1 still, the thread masks will be (incorrectly) checked when changing Well, as far as I understand we don't have any cpu sets bound to program ("pid" search returns set of first thread). Anyway, each original ithread mask is released via cpuset_rel() so I'm a bit unsure about remaining thread masks. Playing with cpuset and ixgbe driver, does not reveal given problem: 12:59 [0] test15# cpuset -g -s 1 cpuset 1 mask: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 13:00 [0] test15# kldload ixgbe 13:00 [0] test15# vmstat -i | grep ix irq263: ix0:que 0 116 0 irq264: ix0:que 1 108 0 irq265: ix0:que 2 108 0 irq266: ix0:que 3 108 0 irq267: ix0:que 4 108 0 irq268: ix0:que 5 108 0 irq269: ix0:que 6 108 0 irq270: ix0:que 7 108 0 ... 13:00 [0] test15# cpuset -g -x 263 irq 263 mask: 0 13:00 [0] test15# cpuset -g -x 264 irq 264 mask: 1 13:00 [0] test15# cpuset -l 21,22,23 -s 1 13:01 [0] test15# cpuset -g -s 1 cpuset 1 mask: 21, 22, 23 13:01 [0] test15# cpuset -g -x 263 irq 263 mask: 0 13:01 [0] test15# cpuset -g -x 264 irq 264 mask: 1 13:01 [0] test15# cpuset -l 12 -x 264 13:01 [0] test15# cpuset -g -x 264 irq 264 mask: 12 13:01 [0] test15# cpuset -l 22,23 -s 1 13:01 [0] test15# echo $? 0 13:01 [0] test15# cpuset -g -x 264 irq 264 mask: 12 13:01 [0] test15# cpuset -g -s 1 cpuset 1 mask: 22, 23 > set 1 and you still won't be able to 'cpuset -l 0 -s 1'. > > Also, strictly speaking, it would probably be best to return threads to > set 1 when NOCPU is passed to intr_event_bind() to unbind an interrupt. Yup. > --------------030603020702090205010604 Content-Type: text/x-patch; name="D141_7.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="D141_7.diff" Index: sys/kern/kern_cpuset.c =================================================================== --- sys/kern/kern_cpuset.c +++ sys/kern/kern_cpuset.c @@ -106,7 +106,7 @@ static struct mtx cpuset_lock; static struct setlist cpuset_ids; static struct unrhdr *cpuset_unr; -static struct cpuset *cpuset_zero; +static struct cpuset *cpuset_zero, *cpuset_default; /* Return the size of cpuset_t at the kernel level */ SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD, @@ -716,6 +716,89 @@ } /* + * Apply new cpumask to the ithread. + */ +int +cpuset_setithread(lwpid_t id, u_char cpu) +{ + struct cpuset *nset, *rset; + struct cpuset *parent, *old_set; + struct thread *td; + struct proc *p; + cpusetid_t cs_id; + cpuset_t mask; + int error; + + nset = uma_zalloc(cpuset_zone, M_WAITOK); + rset = uma_zalloc(cpuset_zone, M_WAITOK); + + CPU_ZERO(&mask); + if (cpu == NOCPU) + CPU_COPY(cpuset_root, &mask); + else + CPU_SET(cpu, &mask); + + error = cpuset_which(CPU_WHICH_TID, id, &p, &td, &old_set); + if (((cs_id = alloc_unr(cpuset_unr)) == CPUSET_INVALID) || error != 0) + goto out; + + thread_lock(td); + old_set = td->td_cpuset; + + if (cpu == NOCPU) { + /* + * roll back to default set. We're not using cpuset_shadow() + * here because we can fail CPU_SUBSET() check. This can happen + * if default set does not contain all CPUs. + */ + error = _cpuset_create(nset, cpuset_default, &mask, + CPUSET_INVALID); + + goto applyset; + } + + if (old_set->cs_id == 1 || (old_set->cs_id == CPUSET_INVALID && + old_set->cs_parent->cs_id == 1)) { + /* Default mask, we need to use new root set */ + error = _cpuset_create(rset, cpuset_zero, + &cpuset_zero->cs_mask, cs_id); + if (error != 0) { + PROC_UNLOCK(p); + goto out; + } + rset->cs_flags |= CPU_SET_ROOT; + parent = rset; + rset = NULL; + cs_id = CPUSET_INVALID; + } else { + /* Assume existing set was already allocated by previous call */ + parent = td->td_cpuset; + old_set = NULL; + } + + error = cpuset_shadow(parent, nset, &mask); +applyset: + if (error == 0) { + td->td_cpuset = nset; + sched_affinity(td); + nset = NULL; + } + thread_unlock(td); + PROC_UNLOCK(p); + if (old_set != NULL) + cpuset_rel(old_set); +out: + if (nset != NULL) + uma_zfree(cpuset_zone, nset); + if (rset != NULL) + uma_zfree(cpuset_zone, rset); + if (cs_id != CPUSET_INVALID) + free_unr(cpuset_unr, cs_id); + return (error); +} + + +/* * Creates the cpuset for thread0. We make two sets: * * 0 - The root set which should represent all valid processors in the @@ -735,6 +818,7 @@ cpuset_zone = uma_zcreate("cpuset", sizeof(struct cpuset), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); mtx_init(&cpuset_lock, "cpuset", NULL, MTX_SPIN | MTX_RECURSE); + /* * Create the root system set for the whole machine. Doesn't use * cpuset_create() due to NULL parent. @@ -747,12 +831,15 @@ set->cs_flags = CPU_SET_ROOT; cpuset_zero = set; cpuset_root = &set->cs_mask; + /* * Now derive a default, modifiable set from that to give out. */ set = uma_zalloc(cpuset_zone, M_WAITOK); error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1); KASSERT(error == 0, ("Error creating default set: %d\n", error)); + cpuset_default = set; + /* * Initialize the unit allocator. 0 and 1 are allocated above. */ Index: sys/kern/kern_intr.c =================================================================== --- sys/kern/kern_intr.c +++ sys/kern/kern_intr.c @@ -295,7 +295,6 @@ int intr_event_bind(struct intr_event *ie, u_char cpu) { - cpuset_t mask; lwpid_t id; int error; @@ -316,30 +315,21 @@ */ mtx_lock(&ie->ie_lock); if (ie->ie_thread != NULL) { - CPU_ZERO(&mask); - if (cpu == NOCPU) - CPU_COPY(cpuset_root, &mask); - else - CPU_SET(cpu, &mask); id = ie->ie_thread->it_thread->td_tid; mtx_unlock(&ie->ie_lock); - error = cpuset_setthread(id, &mask); + error = cpuset_setithread(id, cpu); if (error) return (error); } else mtx_unlock(&ie->ie_lock); error = ie->ie_assign_cpu(ie->ie_source, cpu); if (error) { mtx_lock(&ie->ie_lock); if (ie->ie_thread != NULL) { - CPU_ZERO(&mask); - if (ie->ie_cpu == NOCPU) - CPU_COPY(cpuset_root, &mask); - else - CPU_SET(ie->ie_cpu, &mask); + cpu = ie->ie_cpu; id = ie->ie_thread->it_thread->td_tid; mtx_unlock(&ie->ie_lock); - (void)cpuset_setthread(id, &mask); + (void)cpuset_setithread(id, cpu); } else mtx_unlock(&ie->ie_lock); return (error); Index: sys/sys/cpuset.h =================================================================== --- sys/sys/cpuset.h +++ sys/sys/cpuset.h @@ -118,6 +118,7 @@ struct cpuset *cpuset_ref(struct cpuset *); void cpuset_rel(struct cpuset *); int cpuset_setthread(lwpid_t id, cpuset_t *); +int cpuset_setithread(lwpid_t id, u_char cpu); int cpuset_create_root(struct prison *, struct cpuset **); int cpuset_setproc_update_set(struct proc *, struct cpuset *); char *cpusetobj_strprint(char *, const cpuset_t *); --------------030603020702090205010604--