Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jun 2014 20:52:03 +0400
From:      "Alexander V. Chernikov" <melifaro@FreeBSD.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers@freebsd.org, hackers@freebsd.org
Subject:   Re: Permit init(8) use its own cpuset group.
Message-ID:  <53988933.5010902@FreeBSD.org>
In-Reply-To: <201406091343.39584.jhb@freebsd.org>
References:  <538C8F9A.4020301@FreeBSD.org> <5390DB64.6010704@FreeBSD.org> <53923C09.6020302@FreeBSD.org> <201406091343.39584.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------030603020702090205010604
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

On 09.06.2014 21:43, John Baldwin wrote:
> On Friday, June 06, 2014 6:09:13 pm Alexander V. Chernikov wrote:
>> On 06.06.2014 01:04, Alexander V. Chernikov wrote:
>>> On 05.06.2014 23:59, John Baldwin wrote:
>>>> On Thursday, June 05, 2014 3:46:15 pm Alexander V. Chernikov wrote:
>>>>> On 05.06.2014 18:09, John Baldwin wrote:
>>>>>> On Wednesday, June 04, 2014 3:16:59 pm Alexander V. Chernikov wrote:
>>>>>>> On 04.06.2014 19:06, John Baldwin wrote:
>>>>>>>> On Monday, June 02, 2014 12:48:50 pm Konstantin Belousov wrote:
>>>>>>>>> On Mon, Jun 02, 2014 at 06:52:10PM +0400, Alexander V. Chernikov wrote:
>>>>>>>>>> Hello list!
>>>>>>>>>>
>>>>>>>>>> Currently init(8) uses group 1 which is root group.
>>>>>>>>>> Modifications of this group affects both kernel and userland threads.
>>>>>>>>>> Additionally, such modifications are impossible, for example, in
>>>>>> presence
>>>>>>>>>> of multi-queue NIC drivers (like igb or ixgbe) which binds their
>>>> threads
>>>>>>>> to
>>>>>>>>>> particular cpus.
>>>>>>>>>>
>>>>>>>>>> Proposed change ("init_cpuset" loader tunable) permits changing cpu
>>>>>>>>>> masks for
>>>>>>>>>> userland more easily. Restricting user processes to migrate to/from
>>>> CPU
>>>>>>>>>> cores
>>>>>>>>>> used for network traffic processing is one of the cases.
>>>>>>>>>>
>>>>>>>>>> Phabricator: https://phabric.freebsd.org/D141 (the same version
>>>> attached
>>>>>>>>>> inline)
>>>>>>>>>>
>>>>>>>>>> If there are no objections, I'll commit this next week.
>>>>>>>>> Why is the tunable needed ?
>>>>>>>> Because some people already depend on doing 'cpuset -l 0 -s 1'.  It is
>>>>>> also
>>>>>>>> documented in our manpages that processes start in cpuset 1 by default
>>>> so
>>>>>>>> that you can use 'cpuset -l 0 -s 1' to move all processes, etc.
>>>>>>>>
>>>>>>>> For the stated problem (bound ithreads in drivers), I would actually
>>>> like
>>>>>> to
>>>>>>>> fix ithreads that are bound to a specific CPU to create a different
>>>> cpuset
>>>>>>>> instead so they don't conflict with set 1.
>>>>>>> Yes, this seem to be much better approach.
>>>>>>> Please take a look on the new patch (here or in the phabricator).
>>>>>>> Comments:
>>>>>>>
>>>>>>> Use different approach for modifyable root set:
>>>>>>>
>>>>>>>    * Make sets in 0..15 internal
>>>>>>>    * Define CPUSET_SET1 & CPUSET_ITHREAD in that range
>>>>>>>    * Add cpuset_lookup_builtin() to retrieve such cpu sets by id
>>>>>>>    * Create additional root set for ithreads
>>>>>>>    * Use this set in ithread_create()
>>>>>>>    * Change intr_setaffinity() to use cpuset_iroot (do we really need
>>>> this)?
>>>>>>> We can probably do the same for kprocs, but I'm unsure if we really need
>>>> it.
>>>>>> I imagined something a bit simpler.  Just create a new set in
>>>> intr_event_bind
>>>>>> and bind the ithread to the new set.  No need to have more magic set ids,
>>>> etc.
>>>>> Well, we also have userland which can modify given changes via `cpuset
>>>>> -x', so we need to be able to add some more logic on set
>>>>> allocation/keeping. Additionally, we can try to do the same via `cpuset
>>>>> -t', so introducing something like cpuset_setIthread() and hooking into
>>>>> intr_event_bind() won't probably be enough. At least I can't think out a
>>>>> quick and easy way to do this.
>>>> cpuset -x calls intr_event_bind().  If you just do it there you fix both
>>>> places.
>> Yup. I was wrong. This version seems to be simpler while doing the right
>> thing.
> Hmm, I do think this patch is doing the right thing, but I'm worried you
> might have weird effects, e.g. I worry that because the 'intr' proc is in
> set 1 still, the thread masks will be (incorrectly) checked when changing
Well, as far as I understand we don't have any cpu sets bound to program 
("pid" search returns set of first thread).
Anyway, each original ithread mask is released via cpuset_rel() so I'm a 
bit unsure about remaining thread masks.
Playing with cpuset and ixgbe driver, does not reveal given problem:

12:59 [0] test15# cpuset -g -s 1
cpuset 1 mask: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23
13:00 [0] test15# kldload ixgbe
13:00 [0] test15# vmstat -i | grep ix
irq263: ix0:que 0                    116          0
irq264: ix0:que 1                    108          0
irq265: ix0:que 2                    108          0
irq266: ix0:que 3                    108          0
irq267: ix0:que 4                    108          0
irq268: ix0:que 5                    108          0
irq269: ix0:que 6                    108          0
irq270: ix0:que 7                    108          0
...
13:00 [0] test15# cpuset -g -x 263
irq 263 mask: 0
13:00 [0] test15# cpuset -g -x 264
irq 264 mask: 1
13:00 [0] test15# cpuset -l 21,22,23 -s 1
13:01 [0] test15# cpuset -g -s 1
cpuset 1 mask: 21, 22, 23
13:01 [0] test15# cpuset -g -x 263
irq 263 mask: 0
13:01 [0] test15# cpuset -g -x 264
irq 264 mask: 1
13:01 [0] test15# cpuset -l 12 -x 264
13:01 [0] test15# cpuset -g -x 264
irq 264 mask: 12
13:01 [0] test15# cpuset -l 22,23 -s 1
13:01 [0] test15# echo $?
0
13:01 [0] test15# cpuset -g -x 264
irq 264 mask: 12
13:01 [0] test15# cpuset -g -s 1
cpuset 1 mask: 22, 23


> set 1 and you still won't be able to 'cpuset -l 0 -s 1'.
>
> Also, strictly speaking, it would probably be best to return threads to
> set 1 when NOCPU is passed to intr_event_bind() to unbind an interrupt.
Yup.
>


--------------030603020702090205010604
Content-Type: text/x-patch;
 name="D141_7.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="D141_7.diff"

Index: sys/kern/kern_cpuset.c
===================================================================
--- sys/kern/kern_cpuset.c
+++ sys/kern/kern_cpuset.c
@@ -106,7 +106,7 @@
 static struct mtx cpuset_lock;
 static struct setlist cpuset_ids;
 static struct unrhdr *cpuset_unr;
-static struct cpuset *cpuset_zero;
+static struct cpuset *cpuset_zero, *cpuset_default;
 
 /* Return the size of cpuset_t at the kernel level */
 SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD,
@@ -716,6 +716,89 @@
 }
 
 /*
+ * Apply new cpumask to the ithread.
+ */
+int
+cpuset_setithread(lwpid_t id, u_char cpu)
+{
+	struct cpuset *nset, *rset;
+	struct cpuset *parent, *old_set;
+	struct thread *td;
+	struct proc *p;
+	cpusetid_t cs_id;
+	cpuset_t mask;
+	int error;
+
+	nset = uma_zalloc(cpuset_zone, M_WAITOK);
+	rset = uma_zalloc(cpuset_zone, M_WAITOK);
+
+	CPU_ZERO(&mask);
+	if (cpu == NOCPU)
+		CPU_COPY(cpuset_root, &mask);
+	else
+		CPU_SET(cpu, &mask);
+
+	error = cpuset_which(CPU_WHICH_TID, id, &p, &td, &old_set);
+	if (((cs_id = alloc_unr(cpuset_unr)) == CPUSET_INVALID) || error != 0)
+		goto out;
+
+	thread_lock(td);
+	old_set = td->td_cpuset;
+
+	if (cpu == NOCPU) {
+		/*
+		 * roll back to default set. We're not using cpuset_shadow()
+		 * here because we can fail CPU_SUBSET() check. This can happen
+		 * if default set does not contain all CPUs.
+		 */
+		error = _cpuset_create(nset, cpuset_default, &mask,
+		    CPUSET_INVALID);
+
+		goto applyset;
+	}
+
+	if (old_set->cs_id == 1 || (old_set->cs_id == CPUSET_INVALID &&
+	    old_set->cs_parent->cs_id == 1)) {
+		/* Default mask, we need to use new root set */
+		error = _cpuset_create(rset, cpuset_zero,
+		    &cpuset_zero->cs_mask, cs_id);
+		if (error != 0) {
+			PROC_UNLOCK(p);
+			goto out;
+		}
+		rset->cs_flags |= CPU_SET_ROOT;
+		parent = rset;
+		rset = NULL;
+		cs_id = CPUSET_INVALID;
+	} else {
+		/* Assume existing set was already allocated by previous call */
+		parent = td->td_cpuset;
+		old_set = NULL;
+	}
+
+	error = cpuset_shadow(parent, nset, &mask);
+applyset:
+	if (error == 0) {
+		td->td_cpuset = nset;
+		sched_affinity(td);
+		nset = NULL;
+	}
+	thread_unlock(td);
+	PROC_UNLOCK(p);
+	if (old_set != NULL)
+		cpuset_rel(old_set);
+out:
+	if (nset != NULL)
+		uma_zfree(cpuset_zone, nset);
+	if (rset != NULL)
+		uma_zfree(cpuset_zone, rset);
+	if (cs_id != CPUSET_INVALID)
+		free_unr(cpuset_unr, cs_id);
+	return (error);
+}
+
+
+/*
  * Creates the cpuset for thread0.  We make two sets:
  * 
  * 0 - The root set which should represent all valid processors in the
@@ -735,6 +818,7 @@
 	cpuset_zone = uma_zcreate("cpuset", sizeof(struct cpuset), NULL, NULL,
 	    NULL, NULL, UMA_ALIGN_PTR, 0);
 	mtx_init(&cpuset_lock, "cpuset", NULL, MTX_SPIN | MTX_RECURSE);
+
 	/*
 	 * Create the root system set for the whole machine.  Doesn't use
 	 * cpuset_create() due to NULL parent.
@@ -747,12 +831,15 @@
 	set->cs_flags = CPU_SET_ROOT;
 	cpuset_zero = set;
 	cpuset_root = &set->cs_mask;
+
 	/*
 	 * Now derive a default, modifiable set from that to give out.
 	 */
 	set = uma_zalloc(cpuset_zone, M_WAITOK);
 	error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1);
 	KASSERT(error == 0, ("Error creating default set: %d\n", error));
+	cpuset_default = set;
+
 	/*
 	 * Initialize the unit allocator. 0 and 1 are allocated above.
 	 */
Index: sys/kern/kern_intr.c
===================================================================
--- sys/kern/kern_intr.c
+++ sys/kern/kern_intr.c
@@ -295,7 +295,6 @@
 int
 intr_event_bind(struct intr_event *ie, u_char cpu)
 {
-	cpuset_t mask;
 	lwpid_t id;
 	int error;
 
@@ -316,30 +315,21 @@
 	 */
 	mtx_lock(&ie->ie_lock);
 	if (ie->ie_thread != NULL) {
-		CPU_ZERO(&mask);
-		if (cpu == NOCPU)
-			CPU_COPY(cpuset_root, &mask);
-		else
-			CPU_SET(cpu, &mask);
 		id = ie->ie_thread->it_thread->td_tid;
 		mtx_unlock(&ie->ie_lock);
-		error = cpuset_setthread(id, &mask);
+		error = cpuset_setithread(id, cpu);
 		if (error)
 			return (error);
 	} else
 		mtx_unlock(&ie->ie_lock);
 	error = ie->ie_assign_cpu(ie->ie_source, cpu);
 	if (error) {
 		mtx_lock(&ie->ie_lock);
 		if (ie->ie_thread != NULL) {
-			CPU_ZERO(&mask);
-			if (ie->ie_cpu == NOCPU)
-				CPU_COPY(cpuset_root, &mask);
-			else
-				CPU_SET(ie->ie_cpu, &mask);
+			cpu = ie->ie_cpu;
 			id = ie->ie_thread->it_thread->td_tid;
 			mtx_unlock(&ie->ie_lock);
-			(void)cpuset_setthread(id, &mask);
+			(void)cpuset_setithread(id, cpu);
 		} else
 			mtx_unlock(&ie->ie_lock);
 		return (error);
Index: sys/sys/cpuset.h
===================================================================
--- sys/sys/cpuset.h
+++ sys/sys/cpuset.h
@@ -118,6 +118,7 @@
 struct cpuset *cpuset_ref(struct cpuset *);
 void	cpuset_rel(struct cpuset *);
 int	cpuset_setthread(lwpid_t id, cpuset_t *);
+int	cpuset_setithread(lwpid_t id, u_char cpu);
 int	cpuset_create_root(struct prison *, struct cpuset **);
 int	cpuset_setproc_update_set(struct proc *, struct cpuset *);
 char	*cpusetobj_strprint(char *, const cpuset_t *);

--------------030603020702090205010604--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53988933.5010902>