From owner-freebsd-hackers@FreeBSD.ORG Thu Jun 5 09:15:31 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CD77517A; Thu, 5 Jun 2014 09:15:31 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4F7422227; Thu, 5 Jun 2014 09:15:31 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1WsPqo-0007dD-QB; Thu, 05 Jun 2014 09:04:30 +0400 Message-ID: <539034CD.6080002@FreeBSD.org> Date: Thu, 05 Jun 2014 13:13:49 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Permit init(8) use its own cpuset group. References: <538C8F9A.4020301@FreeBSD.org> <20140602164850.GS3991@kib.kiev.ua> <201406041106.11659.jhb@freebsd.org> <538F70AB.5030701@FreeBSD.org> <20140605003122.GF3991@kib.kiev.ua> In-Reply-To: <20140605003122.GF3991@kib.kiev.ua> Content-Type: multipart/mixed; boundary="------------090403010601010707050801" Cc: freebsd-hackers@freebsd.org, hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jun 2014 09:15:31 -0000 This is a multi-part message in MIME format. --------------090403010601010707050801 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 05.06.2014 04:31, Konstantin Belousov wrote: > On Wed, Jun 04, 2014 at 11:16:59PM +0400, Alexander V. Chernikov wrote: >> On 04.06.2014 19:06, John Baldwin wrote: >>> On Monday, June 02, 2014 12:48:50 pm Konstantin Belousov wrote: >>>> On Mon, Jun 02, 2014 at 06:52:10PM +0400, Alexander V. Chernikov wrote: >>>>> Hello list! >>>>> >>>>> Currently init(8) uses group 1 which is root group. >>>>> Modifications of this group affects both kernel and userland threads. >>>>> Additionally, such modifications are impossible, for example, in presence >>>>> of multi-queue NIC drivers (like igb or ixgbe) which binds their threads >>> to >>>>> particular cpus. >>>>> >>>>> Proposed change ("init_cpuset" loader tunable) permits changing cpu >>>>> masks for >>>>> userland more easily. Restricting user processes to migrate to/from CPU >>>>> cores >>>>> used for network traffic processing is one of the cases. >>>>> >>>>> Phabricator: https://phabric.freebsd.org/D141 (the same version attached >>>>> inline) >>>>> >>>>> If there are no objections, I'll commit this next week. >>>> Why is the tunable needed ? >>> Because some people already depend on doing 'cpuset -l 0 -s 1'. It is also >>> documented in our manpages that processes start in cpuset 1 by default so >>> that you can use 'cpuset -l 0 -s 1' to move all processes, etc. >>> >>> For the stated problem (bound ithreads in drivers), I would actually like to >>> fix ithreads that are bound to a specific CPU to create a different cpuset >>> instead so they don't conflict with set 1. >> Yes, this seem to be much better approach. >> Please take a look on the new patch (here or in the phabricator). >> Comments: >> >> Use different approach for modifyable root set: >> >> * Make sets in 0..15 internal >> * Define CPUSET_SET1 & CPUSET_ITHREAD in that range >> * Add cpuset_lookup_builtin() to retrieve such cpu sets by id >> * Create additional root set for ithreads >> * Use this set in ithread_create() >> * Change intr_setaffinity() to use cpuset_iroot (do we really need this)? >> >> We can probably do the same for kprocs, but I'm unsure if we really need it. > I think this is fine. See below for the style comments. Patch updated. > > With the proliferation of the special cpuset ids, it is probably time > to add an assert to cpuset_rel() and cpuset_rel_defer() that it never > try to free the preallocated set. Added. It seems that cpuset_rel() / _cpuset_create() pair is racy: Let's imagine we have set X with single reference and the following 2 processes are going to happen: 1) cpuset_rel() is called (for example, while changing process CPU set from userland) 2) given process is going to fork creating shadow set Then, the folowing can happen: 1 calls refcount_release returning 0 2 calls cpuset_shadow() -> _cpuset_create() which finishes successfully adding another reference to set X and returning to caller 1 proceeds for set deletion removing it from list, releasing index and freeing it via uma_zfree() I'm not sure how we can fix this easily ( well, we can read current refcount value directly inside cpuset_lock, return if != 0 which is more or less safe, but more complex cases like multiple cpuset_rel() running on the same time (and calling refcount_release()) are still possible ) > > Since you use static pre-known set id for ithreads, it should be documented > along with the set 1 in the man page ? Yes, I'll do that. > >> >> Index: sys/kern/kern_cpuset.c >> =================================================================== >> --- sys/kern/kern_cpuset.c >> +++ sys/kern/kern_cpuset.c >> @@ -112,7 +112,7 @@ >> SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD, >> 0, sizeof(cpuset_t), "sizeof(cpuset_t)"); >> >> -cpuset_t *cpuset_root; >> +cpuset_t *cpuset_root, *cpuset_iroot; >> >> /* >> * Acquire a reference to a cpuset, all pointers must be tracked with refs. >> @@ -213,7 +213,7 @@ >> * Find a set based on an id. Returns it with a ref. >> */ >> static struct cpuset * >> -cpuset_lookup(cpusetid_t setid, struct thread *td) >> +cpuset_lookup_id(cpusetid_t setid) >> { >> struct cpuset *set; >> >> @@ -227,6 +227,17 @@ >> cpuset_ref(set); >> mtx_unlock_spin(&cpuset_lock); >> >> + return (set); >> +} >> + >> +static struct cpuset * >> +cpuset_lookup(cpusetid_t setid, struct thread *td) >> +{ >> + struct cpuset *set; >> + >> + set = cpuset_lookup_id(setid); >> + >> + > Too many emtpy lines. The statement above should probably go after the > KASSERT() line below. > >> KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__)); >> if (set != NULL && jailed(td->td_ucred)) { >> struct cpuset *jset, *tset; >> @@ -245,6 +256,25 @@ >> } >> >> /* >> + * Find a set based on an id. Returns it with a ref. >> + */ >> +struct cpuset * >> +cpuset_lookup_builtin(cpusetid_t setid) >> +{ >> + struct cpuset *set; >> + >> + KASSERT(setid <= CPUSET_RESERVED_MAX, >> + ("[%s:%d] wrong set id: %d", __func__, __LINE__, setid)); >> + >> + set = cpuset_lookup_id(setid); >> + >> + KASSERT(set != NULL, >> + ("[%s:%d] set id %d not found", __func__, __LINE__, setid)); >> + >> + return (set); > Again the empty lines are not needed. > >> +} >> + >> +/* >> * Create a set in the space provided in 'set' with the provided parameters. >> * The set is returned with a single ref. May return EDEADLK if the set >> * will have no valid cpu based on restrictions from the parent. >> @@ -725,9 +755,10 @@ >> * 1 - The default set which all processes are a member of until changed. >> * This allows an administrator to move all threads off of given cpus to >> * dedicate them to high priority tasks or save power etc. >> + * 2 - The root set which should be used for all interruot/ithreads. > Typo in the 'interrupt'. > >> */ >> -struct cpuset * >> -cpuset_thread0(void) >> +static void >> +cpuset_init_sets(void) >> { >> struct cpuset *set; >> int error; >> @@ -751,14 +782,31 @@ >> * Now derive a default, modifiable set from that to give out. >> */ >> set = uma_zalloc(cpuset_zone, M_WAITOK); >> - error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1); >> + error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, >> + CPUSET_SET1); >> KASSERT(error == 0, ("Error creating default set: %d\n", error)); >> /* >> - * Initialize the unit allocator. 0 and 1 are allocated above. >> + * Create another root set for interrupt bindings. >> */ >> - cpuset_unr = new_unrhdr(2, INT_MAX, NULL); >> + set = uma_zalloc(cpuset_zone, M_WAITOK); >> + error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, >> + CPUSET_ITHREAD); >> + KASSERT(error == 0, ("Error creating ithread cpuset: %d\n", error)); >> + set->cs_flags |= CPU_SET_ROOT; >> + cpuset_iroot = &set->cs_mask; >> + /* >> + * Initialize the unit allocator. Reserve 0..15 for special sets. >> + */ >> + cpuset_unr = new_unrhdr(CPUSET_RESERVED_MAX + 1, INT_MAX, NULL); > Style requests that multi-line block comments have empty line before. > Spelling CPUSET_RESERVED_MAX as 15 in the comment looks strange. > >> +} >> >> - return (set); >> +struct cpuset * >> +cpuset_thread0(void) >> +{ >> + >> + cpuset_init_sets(); >> + > Empty line. > >> + return (cpuset_lookup_builtin(CPUSET_SET1)); >> } >> >> /* >> Index: sys/kern/kern_intr.c >> =================================================================== >> --- sys/kern/kern_intr.c >> +++ sys/kern/kern_intr.c >> @@ -318,7 +318,7 @@ >> if (ie->ie_thread != NULL) { >> CPU_ZERO(&mask); >> if (cpu == NOCPU) >> - CPU_COPY(cpuset_root, &mask); >> + CPU_COPY(cpuset_iroot, &mask); >> else >> CPU_SET(cpu, &mask); >> id = ie->ie_thread->it_thread->td_tid; >> @@ -334,7 +334,7 @@ >> if (ie->ie_thread != NULL) { >> CPU_ZERO(&mask); >> if (ie->ie_cpu == NOCPU) >> - CPU_COPY(cpuset_root, &mask); >> + CPU_COPY(cpuset_iroot, &mask); >> else >> CPU_SET(ie->ie_cpu, &mask); >> id = ie->ie_thread->it_thread->td_tid; >> @@ -381,7 +381,7 @@ >> * If we're setting all cpus we can unbind. Otherwise make sure >> * only one cpu is in the set. >> */ >> - if (CPU_CMP(cpuset_root, mask)) { >> + if (CPU_CMP(cpuset_iroot, mask)) { >> for (n = 0; n < CPU_SETSIZE; n++) { >> if (!CPU_ISSET(n, mask)) >> continue; >> @@ -409,7 +409,7 @@ >> CPU_ZERO(mask); >> mtx_lock(&ie->ie_lock); >> if (ie->ie_cpu == NOCPU) >> - CPU_COPY(cpuset_root, mask); >> + CPU_COPY(cpuset_iroot, mask); >> else >> CPU_SET(ie->ie_cpu, mask); >> mtx_unlock(&ie->ie_lock); >> @@ -461,6 +461,7 @@ >> TD_SET_IWAIT(td); >> thread_unlock(td); >> td->td_pflags |= TDP_ITHREAD; >> + td->td_cpuset = cpuset_lookup_builtin(CPUSET_ITHREAD); >> ithd->it_thread = td; >> CTR2(KTR_INTR, "%s: created %s", __func__, name); >> return (ithd); >> @@ -485,6 +486,7 @@ >> TD_SET_IWAIT(td); >> thread_unlock(td); >> td->td_pflags |= TDP_ITHREAD; >> + td->td_cpuset = cpuset_lookup_builtin(CPUSET_ITHREAD); >> ithd->it_thread = td; >> CTR2(KTR_INTR, "%s: created %s", __func__, name); >> return (ithd); >> Index: sys/sys/cpuset.h >> =================================================================== >> --- sys/sys/cpuset.h >> +++ sys/sys/cpuset.h >> @@ -81,6 +81,10 @@ >> */ >> #define CPUSET_INVALID -1 >> #define CPUSET_DEFAULT 0 >> +#define CPUSET_SET1 1 >> +#define CPUSET_ITHREAD 2 >> + >> +#define CPUSET_RESERVED_MAX 15 >> >> #ifdef _KERNEL >> LIST_HEAD(setlist, cpuset); >> @@ -110,11 +114,12 @@ >> #define CPU_SET_ROOT 0x0001 /* Set is a root set. */ >> #define CPU_SET_RDONLY 0x0002 /* No modification allowed. */ >> >> -extern cpuset_t *cpuset_root; >> +extern cpuset_t *cpuset_root, *cpuset_iroot; >> struct prison; >> struct proc; >> >> struct cpuset *cpuset_thread0(void); >> +struct cpuset *cpuset_lookup_builtin(cpusetid_t setid); >> struct cpuset *cpuset_ref(struct cpuset *); >> void cpuset_rel(struct cpuset *); >> int cpuset_setthread(lwpid_t id, cpuset_t *); --------------090403010601010707050801 Content-Type: text/x-patch; name="D141_4.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="D141_4.diff" Index: sys/kern/kern_cpuset.c =================================================================== --- sys/kern/kern_cpuset.c +++ sys/kern/kern_cpuset.c @@ -112,7 +112,7 @@ SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD, 0, sizeof(cpuset_t), "sizeof(cpuset_t)"); -cpuset_t *cpuset_root; +cpuset_t *cpuset_root, *cpuset_iroot; /* * Acquire a reference to a cpuset, all pointers must be tracked with refs. @@ -167,9 +167,11 @@ if (refcount_release(&set->cs_ref) == 0) return; + id = set->cs_id; + KASSERT(id > CPUSET_RESERVED_MAX || id == CPUSET_INVALID, + ("[%s:%d] freeing reserved set id: %d", __func__, __LINE__, id)); mtx_lock_spin(&cpuset_lock); LIST_REMOVE(set, cs_siblings); - id = set->cs_id; if (id != CPUSET_INVALID) LIST_REMOVE(set, cs_link); mtx_unlock_spin(&cpuset_lock); @@ -186,12 +188,16 @@ static void cpuset_rel_defer(struct setlist *head, struct cpuset *set) { + cpusetid_t id; if (refcount_release(&set->cs_ref) == 0) return; + id = set->cs_id; + KASSERT(id > CPUSET_RESERVED_MAX || id == CPUSET_INVALID, + ("[%s:%d] freeing reserved set id: %d", __func__, __LINE__, id)); mtx_lock_spin(&cpuset_lock); LIST_REMOVE(set, cs_siblings); - if (set->cs_id != CPUSET_INVALID) + if (id != CPUSET_INVALID) LIST_REMOVE(set, cs_link); LIST_INSERT_HEAD(head, set, cs_link); mtx_unlock_spin(&cpuset_lock); @@ -213,7 +219,7 @@ * Find a set based on an id. Returns it with a ref. */ static struct cpuset * -cpuset_lookup(cpusetid_t setid, struct thread *td) +cpuset_lookup_id(cpusetid_t setid) { struct cpuset *set; @@ -227,7 +233,17 @@ cpuset_ref(set); mtx_unlock_spin(&cpuset_lock); + return (set); +} + +static struct cpuset * +cpuset_lookup(cpusetid_t setid, struct thread *td) +{ + struct cpuset *set; + KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__)); + + set = cpuset_lookup_id(setid); if (set != NULL && jailed(td->td_ucred)) { struct cpuset *jset, *tset; @@ -245,6 +261,23 @@ } /* + * Find reserved set based on its id. Returns it with a ref. + */ +struct cpuset * +cpuset_lookup_reserved(cpusetid_t setid) +{ + struct cpuset *set; + + KASSERT(setid <= CPUSET_RESERVED_MAX, + ("[%s:%d] wrong set id: %d", __func__, __LINE__, setid)); + set = cpuset_lookup_id(setid); + KASSERT(set != NULL, + ("[%s:%d] set id %d not found", __func__, __LINE__, setid)); + + return (set); +} + +/* * Create a set in the space provided in 'set' with the provided parameters. * The set is returned with a single ref. May return EDEADLK if the set * will have no valid cpu based on restrictions from the parent. @@ -725,16 +758,18 @@ * 1 - The default set which all processes are a member of until changed. * This allows an administrator to move all threads off of given cpus to * dedicate them to high priority tasks or save power etc. + * 2 - The root set which should be used for all interrupts/ithreads. */ -struct cpuset * -cpuset_thread0(void) +static void +cpuset_init_sets(void) { struct cpuset *set; int error; cpuset_zone = uma_zcreate("cpuset", sizeof(struct cpuset), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); mtx_init(&cpuset_lock, "cpuset", NULL, MTX_SPIN | MTX_RECURSE); + /* * Create the root system set for the whole machine. Doesn't use * cpuset_create() due to NULL parent. @@ -747,18 +782,38 @@ set->cs_flags = CPU_SET_ROOT; cpuset_zero = set; cpuset_root = &set->cs_mask; + /* * Now derive a default, modifiable set from that to give out. */ set = uma_zalloc(cpuset_zone, M_WAITOK); - error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1); + error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, + CPUSET_DEFAULT); KASSERT(error == 0, ("Error creating default set: %d\n", error)); + /* - * Initialize the unit allocator. 0 and 1 are allocated above. + * Create another root set for interrupt bindings. */ - cpuset_unr = new_unrhdr(2, INT_MAX, NULL); + set = uma_zalloc(cpuset_zone, M_WAITOK); + error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, + CPUSET_ITHREAD); + KASSERT(error == 0, ("Error creating ithread cpuset: %d\n", error)); + set->cs_flags |= CPU_SET_ROOT; + cpuset_iroot = &set->cs_mask; - return (set); + /* + * Initialize the unit allocator. + * Reserve 0..CPUSET_RESERVED_MAX for special sets. + */ + cpuset_unr = new_unrhdr(CPUSET_RESERVED_MAX + 1, INT_MAX, NULL); +} + +struct cpuset * +cpuset_thread0(void) +{ + + cpuset_init_sets(); + return (cpuset_lookup_reserved(CPUSET_DEFAULT)); } /* Index: sys/kern/kern_intr.c =================================================================== --- sys/kern/kern_intr.c +++ sys/kern/kern_intr.c @@ -318,7 +318,7 @@ if (ie->ie_thread != NULL) { CPU_ZERO(&mask); if (cpu == NOCPU) - CPU_COPY(cpuset_root, &mask); + CPU_COPY(cpuset_iroot, &mask); else CPU_SET(cpu, &mask); id = ie->ie_thread->it_thread->td_tid; @@ -334,7 +334,7 @@ if (ie->ie_thread != NULL) { CPU_ZERO(&mask); if (ie->ie_cpu == NOCPU) - CPU_COPY(cpuset_root, &mask); + CPU_COPY(cpuset_iroot, &mask); else CPU_SET(ie->ie_cpu, &mask); id = ie->ie_thread->it_thread->td_tid; @@ -381,7 +381,7 @@ * If we're setting all cpus we can unbind. Otherwise make sure * only one cpu is in the set. */ - if (CPU_CMP(cpuset_root, mask)) { + if (CPU_CMP(cpuset_iroot, mask)) { for (n = 0; n < CPU_SETSIZE; n++) { if (!CPU_ISSET(n, mask)) continue; @@ -409,7 +409,7 @@ CPU_ZERO(mask); mtx_lock(&ie->ie_lock); if (ie->ie_cpu == NOCPU) - CPU_COPY(cpuset_root, mask); + CPU_COPY(cpuset_iroot, mask); else CPU_SET(ie->ie_cpu, mask); mtx_unlock(&ie->ie_lock); @@ -461,6 +461,7 @@ TD_SET_IWAIT(td); thread_unlock(td); td->td_pflags |= TDP_ITHREAD; + td->td_cpuset = cpuset_lookup_reserved(CPUSET_ITHREAD); ithd->it_thread = td; CTR2(KTR_INTR, "%s: created %s", __func__, name); return (ithd); @@ -485,6 +486,7 @@ TD_SET_IWAIT(td); thread_unlock(td); td->td_pflags |= TDP_ITHREAD; + td->td_cpuset = cpuset_lookup_reserved(CPUSET_ITHREAD); ithd->it_thread = td; CTR2(KTR_INTR, "%s: created %s", __func__, name); return (ithd); Index: sys/sys/cpuset.h =================================================================== --- sys/sys/cpuset.h +++ sys/sys/cpuset.h @@ -80,7 +80,11 @@ * Reserved cpuset identifiers. */ #define CPUSET_INVALID -1 -#define CPUSET_DEFAULT 0 +#define CPUSET_SET0 0 +#define CPUSET_DEFAULT 1 +#define CPUSET_ITHREAD 2 + +#define CPUSET_RESERVED_MAX 15 #ifdef _KERNEL LIST_HEAD(setlist, cpuset); @@ -110,11 +114,12 @@ #define CPU_SET_ROOT 0x0001 /* Set is a root set. */ #define CPU_SET_RDONLY 0x0002 /* No modification allowed. */ -extern cpuset_t *cpuset_root; +extern cpuset_t *cpuset_root, *cpuset_iroot; struct prison; struct proc; struct cpuset *cpuset_thread0(void); +struct cpuset *cpuset_lookup_reserved(cpusetid_t setid); struct cpuset *cpuset_ref(struct cpuset *); void cpuset_rel(struct cpuset *); int cpuset_setthread(lwpid_t id, cpuset_t *); --------------090403010601010707050801--