From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 4 19:18:39 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 14123FC1; Wed, 4 Jun 2014 19:18:39 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 985D32613; Wed, 4 Jun 2014 19:18:38 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1WsCmx-0002I5-6i; Wed, 04 Jun 2014 19:07:39 +0400 Message-ID: <538F70AB.5030701@FreeBSD.org> Date: Wed, 04 Jun 2014 23:16:59 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: John Baldwin , freebsd-hackers@freebsd.org Subject: Re: Permit init(8) use its own cpuset group. References: <538C8F9A.4020301@FreeBSD.org> <20140602164850.GS3991@kib.kiev.ua> <201406041106.11659.jhb@freebsd.org> In-Reply-To: <201406041106.11659.jhb@freebsd.org> Content-Type: multipart/mixed; boundary="------------090404000700090306000808" X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: Konstantin Belousov , hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Jun 2014 19:18:39 -0000 This is a multi-part message in MIME format. --------------090404000700090306000808 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit On 04.06.2014 19:06, John Baldwin wrote: > On Monday, June 02, 2014 12:48:50 pm Konstantin Belousov wrote: >> On Mon, Jun 02, 2014 at 06:52:10PM +0400, Alexander V. Chernikov wrote: >>> Hello list! >>> >>> Currently init(8) uses group 1 which is root group. >>> Modifications of this group affects both kernel and userland threads. >>> Additionally, such modifications are impossible, for example, in presence >>> of multi-queue NIC drivers (like igb or ixgbe) which binds their threads > to >>> particular cpus. >>> >>> Proposed change ("init_cpuset" loader tunable) permits changing cpu >>> masks for >>> userland more easily. Restricting user processes to migrate to/from CPU >>> cores >>> used for network traffic processing is one of the cases. >>> >>> Phabricator: https://phabric.freebsd.org/D141 (the same version attached >>> inline) >>> >>> If there are no objections, I'll commit this next week. >> Why is the tunable needed ? > Because some people already depend on doing 'cpuset -l 0 -s 1'. It is also > documented in our manpages that processes start in cpuset 1 by default so > that you can use 'cpuset -l 0 -s 1' to move all processes, etc. > > For the stated problem (bound ithreads in drivers), I would actually like to > fix ithreads that are bound to a specific CPU to create a different cpuset > instead so they don't conflict with set 1. Yes, this seem to be much better approach. Please take a look on the new patch (here or in the phabricator). Comments: Use different approach for modifyable root set: * Make sets in 0..15 internal * Define CPUSET_SET1 & CPUSET_ITHREAD in that range * Add cpuset_lookup_builtin() to retrieve such cpu sets by id * Create additional root set for ithreads * Use this set in ithread_create() * Change intr_setaffinity() to use cpuset_iroot (do we really need this)? We can probably do the same for kprocs, but I'm unsure if we really need it. > --------------090404000700090306000808 Content-Type: text/x-patch; name="D141_3.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="D141_3.diff" Index: sys/kern/kern_cpuset.c =================================================================== --- sys/kern/kern_cpuset.c +++ sys/kern/kern_cpuset.c @@ -112,7 +112,7 @@ SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD, 0, sizeof(cpuset_t), "sizeof(cpuset_t)"); -cpuset_t *cpuset_root; +cpuset_t *cpuset_root, *cpuset_iroot; /* * Acquire a reference to a cpuset, all pointers must be tracked with refs. @@ -213,7 +213,7 @@ * Find a set based on an id. Returns it with a ref. */ static struct cpuset * -cpuset_lookup(cpusetid_t setid, struct thread *td) +cpuset_lookup_id(cpusetid_t setid) { struct cpuset *set; @@ -227,6 +227,17 @@ cpuset_ref(set); mtx_unlock_spin(&cpuset_lock); + return (set); +} + +static struct cpuset * +cpuset_lookup(cpusetid_t setid, struct thread *td) +{ + struct cpuset *set; + + set = cpuset_lookup_id(setid); + + KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__)); if (set != NULL && jailed(td->td_ucred)) { struct cpuset *jset, *tset; @@ -245,6 +256,25 @@ } /* + * Find a set based on an id. Returns it with a ref. + */ +struct cpuset * +cpuset_lookup_builtin(cpusetid_t setid) +{ + struct cpuset *set; + + KASSERT(setid <= CPUSET_RESERVED_MAX, + ("[%s:%d] wrong set id: %d", __func__, __LINE__, setid)); + + set = cpuset_lookup_id(setid); + + KASSERT(set != NULL, + ("[%s:%d] set id %d not found", __func__, __LINE__, setid)); + + return (set); +} + +/* * Create a set in the space provided in 'set' with the provided parameters. * The set is returned with a single ref. May return EDEADLK if the set * will have no valid cpu based on restrictions from the parent. @@ -725,9 +755,10 @@ * 1 - The default set which all processes are a member of until changed. * This allows an administrator to move all threads off of given cpus to * dedicate them to high priority tasks or save power etc. + * 2 - The root set which should be used for all interruot/ithreads. */ -struct cpuset * -cpuset_thread0(void) +static void +cpuset_init_sets(void) { struct cpuset *set; int error; @@ -751,14 +782,31 @@ * Now derive a default, modifiable set from that to give out. */ set = uma_zalloc(cpuset_zone, M_WAITOK); - error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1); + error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, + CPUSET_SET1); KASSERT(error == 0, ("Error creating default set: %d\n", error)); /* - * Initialize the unit allocator. 0 and 1 are allocated above. + * Create another root set for interrupt bindings. */ - cpuset_unr = new_unrhdr(2, INT_MAX, NULL); + set = uma_zalloc(cpuset_zone, M_WAITOK); + error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, + CPUSET_ITHREAD); + KASSERT(error == 0, ("Error creating ithread cpuset: %d\n", error)); + set->cs_flags |= CPU_SET_ROOT; + cpuset_iroot = &set->cs_mask; + /* + * Initialize the unit allocator. Reserve 0..15 for special sets. + */ + cpuset_unr = new_unrhdr(CPUSET_RESERVED_MAX + 1, INT_MAX, NULL); +} - return (set); +struct cpuset * +cpuset_thread0(void) +{ + + cpuset_init_sets(); + + return (cpuset_lookup_builtin(CPUSET_SET1)); } /* Index: sys/kern/kern_intr.c =================================================================== --- sys/kern/kern_intr.c +++ sys/kern/kern_intr.c @@ -318,7 +318,7 @@ if (ie->ie_thread != NULL) { CPU_ZERO(&mask); if (cpu == NOCPU) - CPU_COPY(cpuset_root, &mask); + CPU_COPY(cpuset_iroot, &mask); else CPU_SET(cpu, &mask); id = ie->ie_thread->it_thread->td_tid; @@ -334,7 +334,7 @@ if (ie->ie_thread != NULL) { CPU_ZERO(&mask); if (ie->ie_cpu == NOCPU) - CPU_COPY(cpuset_root, &mask); + CPU_COPY(cpuset_iroot, &mask); else CPU_SET(ie->ie_cpu, &mask); id = ie->ie_thread->it_thread->td_tid; @@ -381,7 +381,7 @@ * If we're setting all cpus we can unbind. Otherwise make sure * only one cpu is in the set. */ - if (CPU_CMP(cpuset_root, mask)) { + if (CPU_CMP(cpuset_iroot, mask)) { for (n = 0; n < CPU_SETSIZE; n++) { if (!CPU_ISSET(n, mask)) continue; @@ -409,7 +409,7 @@ CPU_ZERO(mask); mtx_lock(&ie->ie_lock); if (ie->ie_cpu == NOCPU) - CPU_COPY(cpuset_root, mask); + CPU_COPY(cpuset_iroot, mask); else CPU_SET(ie->ie_cpu, mask); mtx_unlock(&ie->ie_lock); @@ -461,6 +461,7 @@ TD_SET_IWAIT(td); thread_unlock(td); td->td_pflags |= TDP_ITHREAD; + td->td_cpuset = cpuset_lookup_builtin(CPUSET_ITHREAD); ithd->it_thread = td; CTR2(KTR_INTR, "%s: created %s", __func__, name); return (ithd); @@ -485,6 +486,7 @@ TD_SET_IWAIT(td); thread_unlock(td); td->td_pflags |= TDP_ITHREAD; + td->td_cpuset = cpuset_lookup_builtin(CPUSET_ITHREAD); ithd->it_thread = td; CTR2(KTR_INTR, "%s: created %s", __func__, name); return (ithd); Index: sys/sys/cpuset.h =================================================================== --- sys/sys/cpuset.h +++ sys/sys/cpuset.h @@ -81,6 +81,10 @@ */ #define CPUSET_INVALID -1 #define CPUSET_DEFAULT 0 +#define CPUSET_SET1 1 +#define CPUSET_ITHREAD 2 + +#define CPUSET_RESERVED_MAX 15 #ifdef _KERNEL LIST_HEAD(setlist, cpuset); @@ -110,11 +114,12 @@ #define CPU_SET_ROOT 0x0001 /* Set is a root set. */ #define CPU_SET_RDONLY 0x0002 /* No modification allowed. */ -extern cpuset_t *cpuset_root; +extern cpuset_t *cpuset_root, *cpuset_iroot; struct prison; struct proc; struct cpuset *cpuset_thread0(void); +struct cpuset *cpuset_lookup_builtin(cpusetid_t setid); struct cpuset *cpuset_ref(struct cpuset *); void cpuset_rel(struct cpuset *); int cpuset_setthread(lwpid_t id, cpuset_t *); --------------090404000700090306000808--