Date: Wed, 11 Jun 2014 20:52:03 +0400 From: "Alexander V. Chernikov" <melifaro@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers@freebsd.org, hackers@freebsd.org Subject: Re: Permit init(8) use its own cpuset group. Message-ID: <53988933.5010902@FreeBSD.org> In-Reply-To: <201406091343.39584.jhb@freebsd.org> References: <538C8F9A.4020301@FreeBSD.org> <5390DB64.6010704@FreeBSD.org> <53923C09.6020302@FreeBSD.org> <201406091343.39584.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On 09.06.2014 21:43, John Baldwin wrote:
> On Friday, June 06, 2014 6:09:13 pm Alexander V. Chernikov wrote:
>> On 06.06.2014 01:04, Alexander V. Chernikov wrote:
>>> On 05.06.2014 23:59, John Baldwin wrote:
>>>> On Thursday, June 05, 2014 3:46:15 pm Alexander V. Chernikov wrote:
>>>>> On 05.06.2014 18:09, John Baldwin wrote:
>>>>>> On Wednesday, June 04, 2014 3:16:59 pm Alexander V. Chernikov wrote:
>>>>>>> On 04.06.2014 19:06, John Baldwin wrote:
>>>>>>>> On Monday, June 02, 2014 12:48:50 pm Konstantin Belousov wrote:
>>>>>>>>> On Mon, Jun 02, 2014 at 06:52:10PM +0400, Alexander V. Chernikov wrote:
>>>>>>>>>> Hello list!
>>>>>>>>>>
>>>>>>>>>> Currently init(8) uses group 1 which is root group.
>>>>>>>>>> Modifications of this group affects both kernel and userland threads.
>>>>>>>>>> Additionally, such modifications are impossible, for example, in
>>>>>> presence
>>>>>>>>>> of multi-queue NIC drivers (like igb or ixgbe) which binds their
>>>> threads
>>>>>>>> to
>>>>>>>>>> particular cpus.
>>>>>>>>>>
>>>>>>>>>> Proposed change ("init_cpuset" loader tunable) permits changing cpu
>>>>>>>>>> masks for
>>>>>>>>>> userland more easily. Restricting user processes to migrate to/from
>>>> CPU
>>>>>>>>>> cores
>>>>>>>>>> used for network traffic processing is one of the cases.
>>>>>>>>>>
>>>>>>>>>> Phabricator: https://phabric.freebsd.org/D141 (the same version
>>>> attached
>>>>>>>>>> inline)
>>>>>>>>>>
>>>>>>>>>> If there are no objections, I'll commit this next week.
>>>>>>>>> Why is the tunable needed ?
>>>>>>>> Because some people already depend on doing 'cpuset -l 0 -s 1'. It is
>>>>>> also
>>>>>>>> documented in our manpages that processes start in cpuset 1 by default
>>>> so
>>>>>>>> that you can use 'cpuset -l 0 -s 1' to move all processes, etc.
>>>>>>>>
>>>>>>>> For the stated problem (bound ithreads in drivers), I would actually
>>>> like
>>>>>> to
>>>>>>>> fix ithreads that are bound to a specific CPU to create a different
>>>> cpuset
>>>>>>>> instead so they don't conflict with set 1.
>>>>>>> Yes, this seem to be much better approach.
>>>>>>> Please take a look on the new patch (here or in the phabricator).
>>>>>>> Comments:
>>>>>>>
>>>>>>> Use different approach for modifyable root set:
>>>>>>>
>>>>>>> * Make sets in 0..15 internal
>>>>>>> * Define CPUSET_SET1 & CPUSET_ITHREAD in that range
>>>>>>> * Add cpuset_lookup_builtin() to retrieve such cpu sets by id
>>>>>>> * Create additional root set for ithreads
>>>>>>> * Use this set in ithread_create()
>>>>>>> * Change intr_setaffinity() to use cpuset_iroot (do we really need
>>>> this)?
>>>>>>> We can probably do the same for kprocs, but I'm unsure if we really need
>>>> it.
>>>>>> I imagined something a bit simpler. Just create a new set in
>>>> intr_event_bind
>>>>>> and bind the ithread to the new set. No need to have more magic set ids,
>>>> etc.
>>>>> Well, we also have userland which can modify given changes via `cpuset
>>>>> -x', so we need to be able to add some more logic on set
>>>>> allocation/keeping. Additionally, we can try to do the same via `cpuset
>>>>> -t', so introducing something like cpuset_setIthread() and hooking into
>>>>> intr_event_bind() won't probably be enough. At least I can't think out a
>>>>> quick and easy way to do this.
>>>> cpuset -x calls intr_event_bind(). If you just do it there you fix both
>>>> places.
>> Yup. I was wrong. This version seems to be simpler while doing the right
>> thing.
> Hmm, I do think this patch is doing the right thing, but I'm worried you
> might have weird effects, e.g. I worry that because the 'intr' proc is in
> set 1 still, the thread masks will be (incorrectly) checked when changing
Well, as far as I understand we don't have any cpu sets bound to program
("pid" search returns set of first thread).
Anyway, each original ithread mask is released via cpuset_rel() so I'm a
bit unsure about remaining thread masks.
Playing with cpuset and ixgbe driver, does not reveal given problem:
12:59 [0] test15# cpuset -g -s 1
cpuset 1 mask: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23
13:00 [0] test15# kldload ixgbe
13:00 [0] test15# vmstat -i | grep ix
irq263: ix0:que 0 116 0
irq264: ix0:que 1 108 0
irq265: ix0:que 2 108 0
irq266: ix0:que 3 108 0
irq267: ix0:que 4 108 0
irq268: ix0:que 5 108 0
irq269: ix0:que 6 108 0
irq270: ix0:que 7 108 0
...
13:00 [0] test15# cpuset -g -x 263
irq 263 mask: 0
13:00 [0] test15# cpuset -g -x 264
irq 264 mask: 1
13:00 [0] test15# cpuset -l 21,22,23 -s 1
13:01 [0] test15# cpuset -g -s 1
cpuset 1 mask: 21, 22, 23
13:01 [0] test15# cpuset -g -x 263
irq 263 mask: 0
13:01 [0] test15# cpuset -g -x 264
irq 264 mask: 1
13:01 [0] test15# cpuset -l 12 -x 264
13:01 [0] test15# cpuset -g -x 264
irq 264 mask: 12
13:01 [0] test15# cpuset -l 22,23 -s 1
13:01 [0] test15# echo $?
0
13:01 [0] test15# cpuset -g -x 264
irq 264 mask: 12
13:01 [0] test15# cpuset -g -s 1
cpuset 1 mask: 22, 23
> set 1 and you still won't be able to 'cpuset -l 0 -s 1'.
>
> Also, strictly speaking, it would probably be best to return threads to
> set 1 when NOCPU is passed to intr_event_bind() to unbind an interrupt.
Yup.
>
[-- Attachment #2 --]
Index: sys/kern/kern_cpuset.c
===================================================================
--- sys/kern/kern_cpuset.c
+++ sys/kern/kern_cpuset.c
@@ -106,7 +106,7 @@
static struct mtx cpuset_lock;
static struct setlist cpuset_ids;
static struct unrhdr *cpuset_unr;
-static struct cpuset *cpuset_zero;
+static struct cpuset *cpuset_zero, *cpuset_default;
/* Return the size of cpuset_t at the kernel level */
SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD,
@@ -716,6 +716,89 @@
}
/*
+ * Apply new cpumask to the ithread.
+ */
+int
+cpuset_setithread(lwpid_t id, u_char cpu)
+{
+ struct cpuset *nset, *rset;
+ struct cpuset *parent, *old_set;
+ struct thread *td;
+ struct proc *p;
+ cpusetid_t cs_id;
+ cpuset_t mask;
+ int error;
+
+ nset = uma_zalloc(cpuset_zone, M_WAITOK);
+ rset = uma_zalloc(cpuset_zone, M_WAITOK);
+
+ CPU_ZERO(&mask);
+ if (cpu == NOCPU)
+ CPU_COPY(cpuset_root, &mask);
+ else
+ CPU_SET(cpu, &mask);
+
+ error = cpuset_which(CPU_WHICH_TID, id, &p, &td, &old_set);
+ if (((cs_id = alloc_unr(cpuset_unr)) == CPUSET_INVALID) || error != 0)
+ goto out;
+
+ thread_lock(td);
+ old_set = td->td_cpuset;
+
+ if (cpu == NOCPU) {
+ /*
+ * roll back to default set. We're not using cpuset_shadow()
+ * here because we can fail CPU_SUBSET() check. This can happen
+ * if default set does not contain all CPUs.
+ */
+ error = _cpuset_create(nset, cpuset_default, &mask,
+ CPUSET_INVALID);
+
+ goto applyset;
+ }
+
+ if (old_set->cs_id == 1 || (old_set->cs_id == CPUSET_INVALID &&
+ old_set->cs_parent->cs_id == 1)) {
+ /* Default mask, we need to use new root set */
+ error = _cpuset_create(rset, cpuset_zero,
+ &cpuset_zero->cs_mask, cs_id);
+ if (error != 0) {
+ PROC_UNLOCK(p);
+ goto out;
+ }
+ rset->cs_flags |= CPU_SET_ROOT;
+ parent = rset;
+ rset = NULL;
+ cs_id = CPUSET_INVALID;
+ } else {
+ /* Assume existing set was already allocated by previous call */
+ parent = td->td_cpuset;
+ old_set = NULL;
+ }
+
+ error = cpuset_shadow(parent, nset, &mask);
+applyset:
+ if (error == 0) {
+ td->td_cpuset = nset;
+ sched_affinity(td);
+ nset = NULL;
+ }
+ thread_unlock(td);
+ PROC_UNLOCK(p);
+ if (old_set != NULL)
+ cpuset_rel(old_set);
+out:
+ if (nset != NULL)
+ uma_zfree(cpuset_zone, nset);
+ if (rset != NULL)
+ uma_zfree(cpuset_zone, rset);
+ if (cs_id != CPUSET_INVALID)
+ free_unr(cpuset_unr, cs_id);
+ return (error);
+}
+
+
+/*
* Creates the cpuset for thread0. We make two sets:
*
* 0 - The root set which should represent all valid processors in the
@@ -735,6 +818,7 @@
cpuset_zone = uma_zcreate("cpuset", sizeof(struct cpuset), NULL, NULL,
NULL, NULL, UMA_ALIGN_PTR, 0);
mtx_init(&cpuset_lock, "cpuset", NULL, MTX_SPIN | MTX_RECURSE);
+
/*
* Create the root system set for the whole machine. Doesn't use
* cpuset_create() due to NULL parent.
@@ -747,12 +831,15 @@
set->cs_flags = CPU_SET_ROOT;
cpuset_zero = set;
cpuset_root = &set->cs_mask;
+
/*
* Now derive a default, modifiable set from that to give out.
*/
set = uma_zalloc(cpuset_zone, M_WAITOK);
error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1);
KASSERT(error == 0, ("Error creating default set: %d\n", error));
+ cpuset_default = set;
+
/*
* Initialize the unit allocator. 0 and 1 are allocated above.
*/
Index: sys/kern/kern_intr.c
===================================================================
--- sys/kern/kern_intr.c
+++ sys/kern/kern_intr.c
@@ -295,7 +295,6 @@
int
intr_event_bind(struct intr_event *ie, u_char cpu)
{
- cpuset_t mask;
lwpid_t id;
int error;
@@ -316,30 +315,21 @@
*/
mtx_lock(&ie->ie_lock);
if (ie->ie_thread != NULL) {
- CPU_ZERO(&mask);
- if (cpu == NOCPU)
- CPU_COPY(cpuset_root, &mask);
- else
- CPU_SET(cpu, &mask);
id = ie->ie_thread->it_thread->td_tid;
mtx_unlock(&ie->ie_lock);
- error = cpuset_setthread(id, &mask);
+ error = cpuset_setithread(id, cpu);
if (error)
return (error);
} else
mtx_unlock(&ie->ie_lock);
error = ie->ie_assign_cpu(ie->ie_source, cpu);
if (error) {
mtx_lock(&ie->ie_lock);
if (ie->ie_thread != NULL) {
- CPU_ZERO(&mask);
- if (ie->ie_cpu == NOCPU)
- CPU_COPY(cpuset_root, &mask);
- else
- CPU_SET(ie->ie_cpu, &mask);
+ cpu = ie->ie_cpu;
id = ie->ie_thread->it_thread->td_tid;
mtx_unlock(&ie->ie_lock);
- (void)cpuset_setthread(id, &mask);
+ (void)cpuset_setithread(id, cpu);
} else
mtx_unlock(&ie->ie_lock);
return (error);
Index: sys/sys/cpuset.h
===================================================================
--- sys/sys/cpuset.h
+++ sys/sys/cpuset.h
@@ -118,6 +118,7 @@
struct cpuset *cpuset_ref(struct cpuset *);
void cpuset_rel(struct cpuset *);
int cpuset_setthread(lwpid_t id, cpuset_t *);
+int cpuset_setithread(lwpid_t id, u_char cpu);
int cpuset_create_root(struct prison *, struct cpuset **);
int cpuset_setproc_update_set(struct proc *, struct cpuset *);
char *cpusetobj_strprint(char *, const cpuset_t *);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53988933.5010902>
