Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 05 Jun 2014 13:13:49 +0400
From:      "Alexander V. Chernikov" <melifaro@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-hackers@freebsd.org, hackers@freebsd.org
Subject:   Re: Permit init(8) use its own cpuset group.
Message-ID:  <539034CD.6080002@FreeBSD.org>
In-Reply-To: <20140605003122.GF3991@kib.kiev.ua>
References:  <538C8F9A.4020301@FreeBSD.org> <20140602164850.GS3991@kib.kiev.ua> <201406041106.11659.jhb@freebsd.org> <538F70AB.5030701@FreeBSD.org> <20140605003122.GF3991@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------090403010601010707050801
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 05.06.2014 04:31, Konstantin Belousov wrote:
> On Wed, Jun 04, 2014 at 11:16:59PM +0400, Alexander V. Chernikov wrote:
>> On 04.06.2014 19:06, John Baldwin wrote:
>>> On Monday, June 02, 2014 12:48:50 pm Konstantin Belousov wrote:
>>>> On Mon, Jun 02, 2014 at 06:52:10PM +0400, Alexander V. Chernikov wrote:
>>>>> Hello list!
>>>>>
>>>>> Currently init(8) uses group 1 which is root group.
>>>>> Modifications of this group affects both kernel and userland threads.
>>>>> Additionally, such modifications are impossible, for example, in presence
>>>>> of multi-queue NIC drivers (like igb or ixgbe) which binds their threads
>>> to
>>>>> particular cpus.
>>>>>
>>>>> Proposed change ("init_cpuset" loader tunable) permits changing cpu
>>>>> masks for
>>>>> userland more easily. Restricting user processes to migrate to/from CPU
>>>>> cores
>>>>> used for network traffic processing is one of the cases.
>>>>>
>>>>> Phabricator: https://phabric.freebsd.org/D141 (the same version attached
>>>>> inline)
>>>>>
>>>>> If there are no objections, I'll commit this next week.
>>>> Why is the tunable needed ?
>>> Because some people already depend on doing 'cpuset -l 0 -s 1'.  It is also
>>> documented in our manpages that processes start in cpuset 1 by default so
>>> that you can use 'cpuset -l 0 -s 1' to move all processes, etc.
>>>
>>> For the stated problem (bound ithreads in drivers), I would actually like to
>>> fix ithreads that are bound to a specific CPU to create a different cpuset
>>> instead so they don't conflict with set 1.
>> Yes, this seem to be much better approach.
>> Please take a look on the new patch (here or in the phabricator).
>> Comments:
>>
>> Use different approach for modifyable root set:
>>
>>    * Make sets in 0..15 internal
>>    * Define CPUSET_SET1 & CPUSET_ITHREAD in that range
>>    * Add cpuset_lookup_builtin() to retrieve such cpu sets by id
>>    * Create additional root set for ithreads
>>    * Use this set in ithread_create()
>>    * Change intr_setaffinity() to use cpuset_iroot (do we really need this)?
>>
>> We can probably do the same for kprocs, but I'm unsure if we really need it.
> I think this is fine.  See below for the style comments.
Patch updated.
>
> With the proliferation of the special cpuset ids, it is probably time
> to add an assert to cpuset_rel() and cpuset_rel_defer() that it never
> try to free  the preallocated set.
Added.

It seems that cpuset_rel() / _cpuset_create() pair is racy:

Let's imagine we have set X with single reference and the following 2 
processes are going to happen:
1) cpuset_rel() is called (for example, while changing process CPU set 
from userland)
2) given process is going to fork creating shadow set

Then, the folowing can happen:
1 calls refcount_release returning 0
2 calls cpuset_shadow() -> _cpuset_create() which finishes successfully 
adding another reference to set X and returning to caller
1 proceeds for set deletion removing it from list, releasing index and 
freeing it via uma_zfree()

I'm not sure how we can fix this easily
(
well, we can read current refcount value directly inside cpuset_lock, 
return if != 0 which is more or less safe,
but more complex cases like multiple cpuset_rel() running on the same 
time (and calling refcount_release())
are still possible
)

>
> Since you use static pre-known set id for ithreads, it should be documented
> along with the set 1 in the man page ?
Yes, I'll do that.
>
>>
>> Index: sys/kern/kern_cpuset.c
>> ===================================================================
>> --- sys/kern/kern_cpuset.c
>> +++ sys/kern/kern_cpuset.c
>> @@ -112,7 +112,7 @@
>>   SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD,
>>   	0, sizeof(cpuset_t), "sizeof(cpuset_t)");
>>   
>> -cpuset_t *cpuset_root;
>> +cpuset_t *cpuset_root, *cpuset_iroot;
>>   
>>   /*
>>    * Acquire a reference to a cpuset, all pointers must be tracked with refs.
>> @@ -213,7 +213,7 @@
>>    * Find a set based on an id.  Returns it with a ref.
>>    */
>>   static struct cpuset *
>> -cpuset_lookup(cpusetid_t setid, struct thread *td)
>> +cpuset_lookup_id(cpusetid_t setid)
>>   {
>>   	struct cpuset *set;
>>   
>> @@ -227,6 +227,17 @@
>>   		cpuset_ref(set);
>>   	mtx_unlock_spin(&cpuset_lock);
>>   
>> +	return (set);
>> +}
>> +
>> +static struct cpuset *
>> +cpuset_lookup(cpusetid_t setid, struct thread *td)
>> +{
>> +	struct cpuset *set;
>> +
>> +	set = cpuset_lookup_id(setid);
>> +
>> +
> Too many emtpy lines.  The statement above should probably go after the
> KASSERT() line below.
>
>>   	KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__));
>>   	if (set != NULL && jailed(td->td_ucred)) {
>>   		struct cpuset *jset, *tset;
>> @@ -245,6 +256,25 @@
>>   }
>>   
>>   /*
>> + * Find a set based on an id.  Returns it with a ref.
>> + */
>> +struct cpuset *
>> +cpuset_lookup_builtin(cpusetid_t setid)
>> +{
>> +	struct cpuset *set;
>> +
>> +	KASSERT(setid <= CPUSET_RESERVED_MAX,
>> +	    ("[%s:%d] wrong set id: %d", __func__, __LINE__, setid));
>> +
>> +	set = cpuset_lookup_id(setid);
>> +
>> +	KASSERT(set != NULL,
>> +	    ("[%s:%d] set id %d not found", __func__, __LINE__, setid));
>> +
>> +	return (set);
> Again the empty lines are not needed.
>
>> +}
>> +
>> +/*
>>    * Create a set in the space provided in 'set' with the provided parameters.
>>    * The set is returned with a single ref.  May return EDEADLK if the set
>>    * will have no valid cpu based on restrictions from the parent.
>> @@ -725,9 +755,10 @@
>>    * 1 - The default set which all processes are a member of until changed.
>>    *     This allows an administrator to move all threads off of given cpus to
>>    *     dedicate them to high priority tasks or save power etc.
>> + * 2 - The root set which should be used for all interruot/ithreads.
> Typo in the 'interrupt'.
>
>>    */
>> -struct cpuset *
>> -cpuset_thread0(void)
>> +static void
>> +cpuset_init_sets(void)
>>   {
>>   	struct cpuset *set;
>>   	int error;
>> @@ -751,14 +782,31 @@
>>   	 * Now derive a default, modifiable set from that to give out.
>>   	 */
>>   	set = uma_zalloc(cpuset_zone, M_WAITOK);
>> -	error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1);
>> +	error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask,
>> +	    CPUSET_SET1);
>>   	KASSERT(error == 0, ("Error creating default set: %d\n", error));
>>   	/*
>> -	 * Initialize the unit allocator. 0 and 1 are allocated above.
>> +	 * Create another root set for interrupt bindings.
>>   	 */
>> -	cpuset_unr = new_unrhdr(2, INT_MAX, NULL);
>> +	set = uma_zalloc(cpuset_zone, M_WAITOK);
>> +	error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask,
>> +	    CPUSET_ITHREAD);
>> +	KASSERT(error == 0, ("Error creating ithread cpuset: %d\n", error));
>> +	set->cs_flags |= CPU_SET_ROOT;
>> +	cpuset_iroot = &set->cs_mask;
>> +	/*
>> +	 * Initialize the unit allocator. Reserve 0..15 for special sets.
>> +	 */
>> +	cpuset_unr = new_unrhdr(CPUSET_RESERVED_MAX + 1, INT_MAX, NULL);
> Style requests that multi-line block comments have empty line before.
> Spelling CPUSET_RESERVED_MAX as 15 in the comment looks strange.
>
>> +}
>>   
>> -	return (set);
>> +struct cpuset *
>> +cpuset_thread0(void)
>> +{
>> +
>> +	cpuset_init_sets();
>> +
> Empty line.
>
>> +	return (cpuset_lookup_builtin(CPUSET_SET1));
>>   }
>>   
>>   /*
>> Index: sys/kern/kern_intr.c
>> ===================================================================
>> --- sys/kern/kern_intr.c
>> +++ sys/kern/kern_intr.c
>> @@ -318,7 +318,7 @@
>>   	if (ie->ie_thread != NULL) {
>>   		CPU_ZERO(&mask);
>>   		if (cpu == NOCPU)
>> -			CPU_COPY(cpuset_root, &mask);
>> +			CPU_COPY(cpuset_iroot, &mask);
>>   		else
>>   			CPU_SET(cpu, &mask);
>>   		id = ie->ie_thread->it_thread->td_tid;
>> @@ -334,7 +334,7 @@
>>   		if (ie->ie_thread != NULL) {
>>   			CPU_ZERO(&mask);
>>   			if (ie->ie_cpu == NOCPU)
>> -				CPU_COPY(cpuset_root, &mask);
>> +				CPU_COPY(cpuset_iroot, &mask);
>>   			else
>>   				CPU_SET(ie->ie_cpu, &mask);
>>   			id = ie->ie_thread->it_thread->td_tid;
>> @@ -381,7 +381,7 @@
>>   	 * If we're setting all cpus we can unbind.  Otherwise make sure
>>   	 * only one cpu is in the set.
>>   	 */
>> -	if (CPU_CMP(cpuset_root, mask)) {
>> +	if (CPU_CMP(cpuset_iroot, mask)) {
>>   		for (n = 0; n < CPU_SETSIZE; n++) {
>>   			if (!CPU_ISSET(n, mask))
>>   				continue;
>> @@ -409,7 +409,7 @@
>>   	CPU_ZERO(mask);
>>   	mtx_lock(&ie->ie_lock);
>>   	if (ie->ie_cpu == NOCPU)
>> -		CPU_COPY(cpuset_root, mask);
>> +		CPU_COPY(cpuset_iroot, mask);
>>   	else
>>   		CPU_SET(ie->ie_cpu, mask);
>>   	mtx_unlock(&ie->ie_lock);
>> @@ -461,6 +461,7 @@
>>   	TD_SET_IWAIT(td);
>>   	thread_unlock(td);
>>   	td->td_pflags |= TDP_ITHREAD;
>> +	td->td_cpuset = cpuset_lookup_builtin(CPUSET_ITHREAD);
>>   	ithd->it_thread = td;
>>   	CTR2(KTR_INTR, "%s: created %s", __func__, name);
>>   	return (ithd);
>> @@ -485,6 +486,7 @@
>>   	TD_SET_IWAIT(td);
>>   	thread_unlock(td);
>>   	td->td_pflags |= TDP_ITHREAD;
>> +	td->td_cpuset = cpuset_lookup_builtin(CPUSET_ITHREAD);
>>   	ithd->it_thread = td;
>>   	CTR2(KTR_INTR, "%s: created %s", __func__, name);
>>   	return (ithd);
>> Index: sys/sys/cpuset.h
>> ===================================================================
>> --- sys/sys/cpuset.h
>> +++ sys/sys/cpuset.h
>> @@ -81,6 +81,10 @@
>>    */
>>   #define	CPUSET_INVALID	-1
>>   #define	CPUSET_DEFAULT	0
>> +#define	CPUSET_SET1	1
>> +#define	CPUSET_ITHREAD	2
>> +
>> +#define	CPUSET_RESERVED_MAX	15
>>   
>>   #ifdef _KERNEL
>>   LIST_HEAD(setlist, cpuset);
>> @@ -110,11 +114,12 @@
>>   #define CPU_SET_ROOT    0x0001  /* Set is a root set. */
>>   #define CPU_SET_RDONLY  0x0002  /* No modification allowed. */
>>   
>> -extern cpuset_t *cpuset_root;
>> +extern cpuset_t *cpuset_root, *cpuset_iroot;
>>   struct prison;
>>   struct proc;
>>   
>>   struct cpuset *cpuset_thread0(void);
>> +struct cpuset *cpuset_lookup_builtin(cpusetid_t setid);
>>   struct cpuset *cpuset_ref(struct cpuset *);
>>   void	cpuset_rel(struct cpuset *);
>>   int	cpuset_setthread(lwpid_t id, cpuset_t *);


--------------090403010601010707050801
Content-Type: text/x-patch;
 name="D141_4.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="D141_4.diff"

Index: sys/kern/kern_cpuset.c
===================================================================
--- sys/kern/kern_cpuset.c
+++ sys/kern/kern_cpuset.c
@@ -112,7 +112,7 @@
 SYSCTL_INT(_kern_sched, OID_AUTO, cpusetsize, CTLFLAG_RD,
 	0, sizeof(cpuset_t), "sizeof(cpuset_t)");
 
-cpuset_t *cpuset_root;
+cpuset_t *cpuset_root, *cpuset_iroot;
 
 /*
  * Acquire a reference to a cpuset, all pointers must be tracked with refs.
@@ -167,9 +167,11 @@
 
 	if (refcount_release(&set->cs_ref) == 0)
 		return;
+	id = set->cs_id;
+	KASSERT(id > CPUSET_RESERVED_MAX || id == CPUSET_INVALID,
+	    ("[%s:%d] freeing reserved set id: %d", __func__, __LINE__, id));
 	mtx_lock_spin(&cpuset_lock);
 	LIST_REMOVE(set, cs_siblings);
-	id = set->cs_id;
 	if (id != CPUSET_INVALID)
 		LIST_REMOVE(set, cs_link);
 	mtx_unlock_spin(&cpuset_lock);
@@ -186,12 +188,16 @@
 static void
 cpuset_rel_defer(struct setlist *head, struct cpuset *set)
 {
+	cpusetid_t id;
 
 	if (refcount_release(&set->cs_ref) == 0)
 		return;
+	id = set->cs_id;
+	KASSERT(id > CPUSET_RESERVED_MAX || id == CPUSET_INVALID,
+	    ("[%s:%d] freeing reserved set id: %d", __func__, __LINE__, id));
 	mtx_lock_spin(&cpuset_lock);
 	LIST_REMOVE(set, cs_siblings);
-	if (set->cs_id != CPUSET_INVALID)
+	if (id != CPUSET_INVALID)
 		LIST_REMOVE(set, cs_link);
 	LIST_INSERT_HEAD(head, set, cs_link);
 	mtx_unlock_spin(&cpuset_lock);
@@ -213,7 +219,7 @@
  * Find a set based on an id.  Returns it with a ref.
  */
 static struct cpuset *
-cpuset_lookup(cpusetid_t setid, struct thread *td)
+cpuset_lookup_id(cpusetid_t setid)
 {
 	struct cpuset *set;
 
@@ -227,7 +233,17 @@
 		cpuset_ref(set);
 	mtx_unlock_spin(&cpuset_lock);
 
+	return (set);
+}
+
+static struct cpuset *
+cpuset_lookup(cpusetid_t setid, struct thread *td)
+{
+	struct cpuset *set;
+
 	KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__));
+
+	set = cpuset_lookup_id(setid);
 	if (set != NULL && jailed(td->td_ucred)) {
 		struct cpuset *jset, *tset;
 
@@ -245,6 +261,23 @@
 }
 
 /*
+ * Find reserved set based on its id. Returns it with a ref.
+ */
+struct cpuset *
+cpuset_lookup_reserved(cpusetid_t setid)
+{
+	struct cpuset *set;
+
+	KASSERT(setid <= CPUSET_RESERVED_MAX,
+	    ("[%s:%d] wrong set id: %d", __func__, __LINE__, setid));
+	set = cpuset_lookup_id(setid);
+	KASSERT(set != NULL,
+	    ("[%s:%d] set id %d not found", __func__, __LINE__, setid));
+
+	return (set);
+}
+
+/*
  * Create a set in the space provided in 'set' with the provided parameters.
  * The set is returned with a single ref.  May return EDEADLK if the set
  * will have no valid cpu based on restrictions from the parent.
@@ -725,16 +758,18 @@
  * 1 - The default set which all processes are a member of until changed.
  *     This allows an administrator to move all threads off of given cpus to
  *     dedicate them to high priority tasks or save power etc.
+ * 2 - The root set which should be used for all interrupts/ithreads.
  */
-struct cpuset *
-cpuset_thread0(void)
+static void
+cpuset_init_sets(void)
 {
 	struct cpuset *set;
 	int error;
 
 	cpuset_zone = uma_zcreate("cpuset", sizeof(struct cpuset), NULL, NULL,
 	    NULL, NULL, UMA_ALIGN_PTR, 0);
 	mtx_init(&cpuset_lock, "cpuset", NULL, MTX_SPIN | MTX_RECURSE);
+
 	/*
 	 * Create the root system set for the whole machine.  Doesn't use
 	 * cpuset_create() due to NULL parent.
@@ -747,18 +782,38 @@
 	set->cs_flags = CPU_SET_ROOT;
 	cpuset_zero = set;
 	cpuset_root = &set->cs_mask;
+
 	/*
 	 * Now derive a default, modifiable set from that to give out.
 	 */
 	set = uma_zalloc(cpuset_zone, M_WAITOK);
-	error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask, 1);
+	error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask,
+	    CPUSET_DEFAULT);
 	KASSERT(error == 0, ("Error creating default set: %d\n", error));
+
 	/*
-	 * Initialize the unit allocator. 0 and 1 are allocated above.
+	 * Create another root set for interrupt bindings.
 	 */
-	cpuset_unr = new_unrhdr(2, INT_MAX, NULL);
+	set = uma_zalloc(cpuset_zone, M_WAITOK);
+	error = _cpuset_create(set, cpuset_zero, &cpuset_zero->cs_mask,
+	    CPUSET_ITHREAD);
+	KASSERT(error == 0, ("Error creating ithread cpuset: %d\n", error));
+	set->cs_flags |= CPU_SET_ROOT;
+	cpuset_iroot = &set->cs_mask;
 
-	return (set);
+	/*
+	 * Initialize the unit allocator.
+	 * Reserve 0..CPUSET_RESERVED_MAX for special sets.
+	 */
+	cpuset_unr = new_unrhdr(CPUSET_RESERVED_MAX + 1, INT_MAX, NULL);
+}
+
+struct cpuset *
+cpuset_thread0(void)
+{
+
+	cpuset_init_sets();
+	return (cpuset_lookup_reserved(CPUSET_DEFAULT));
 }
 
 /*
Index: sys/kern/kern_intr.c
===================================================================
--- sys/kern/kern_intr.c
+++ sys/kern/kern_intr.c
@@ -318,7 +318,7 @@
 	if (ie->ie_thread != NULL) {
 		CPU_ZERO(&mask);
 		if (cpu == NOCPU)
-			CPU_COPY(cpuset_root, &mask);
+			CPU_COPY(cpuset_iroot, &mask);
 		else
 			CPU_SET(cpu, &mask);
 		id = ie->ie_thread->it_thread->td_tid;
@@ -334,7 +334,7 @@
 		if (ie->ie_thread != NULL) {
 			CPU_ZERO(&mask);
 			if (ie->ie_cpu == NOCPU)
-				CPU_COPY(cpuset_root, &mask);
+				CPU_COPY(cpuset_iroot, &mask);
 			else
 				CPU_SET(ie->ie_cpu, &mask);
 			id = ie->ie_thread->it_thread->td_tid;
@@ -381,7 +381,7 @@
 	 * If we're setting all cpus we can unbind.  Otherwise make sure
 	 * only one cpu is in the set.
 	 */
-	if (CPU_CMP(cpuset_root, mask)) {
+	if (CPU_CMP(cpuset_iroot, mask)) {
 		for (n = 0; n < CPU_SETSIZE; n++) {
 			if (!CPU_ISSET(n, mask))
 				continue;
@@ -409,7 +409,7 @@
 	CPU_ZERO(mask);
 	mtx_lock(&ie->ie_lock);
 	if (ie->ie_cpu == NOCPU)
-		CPU_COPY(cpuset_root, mask);
+		CPU_COPY(cpuset_iroot, mask);
 	else
 		CPU_SET(ie->ie_cpu, mask);
 	mtx_unlock(&ie->ie_lock);
@@ -461,6 +461,7 @@
 	TD_SET_IWAIT(td);
 	thread_unlock(td);
 	td->td_pflags |= TDP_ITHREAD;
+	td->td_cpuset = cpuset_lookup_reserved(CPUSET_ITHREAD);
 	ithd->it_thread = td;
 	CTR2(KTR_INTR, "%s: created %s", __func__, name);
 	return (ithd);
@@ -485,6 +486,7 @@
 	TD_SET_IWAIT(td);
 	thread_unlock(td);
 	td->td_pflags |= TDP_ITHREAD;
+	td->td_cpuset = cpuset_lookup_reserved(CPUSET_ITHREAD);
 	ithd->it_thread = td;
 	CTR2(KTR_INTR, "%s: created %s", __func__, name);
 	return (ithd);
Index: sys/sys/cpuset.h
===================================================================
--- sys/sys/cpuset.h
+++ sys/sys/cpuset.h
@@ -80,7 +80,11 @@
  * Reserved cpuset identifiers.
  */
 #define	CPUSET_INVALID	-1
-#define	CPUSET_DEFAULT	0
+#define	CPUSET_SET0	0
+#define	CPUSET_DEFAULT	1
+#define	CPUSET_ITHREAD	2
+
+#define	CPUSET_RESERVED_MAX	15
 
 #ifdef _KERNEL
 LIST_HEAD(setlist, cpuset);
@@ -110,11 +114,12 @@
 #define CPU_SET_ROOT    0x0001  /* Set is a root set. */
 #define CPU_SET_RDONLY  0x0002  /* No modification allowed. */
 
-extern cpuset_t *cpuset_root;
+extern cpuset_t *cpuset_root, *cpuset_iroot;
 struct prison;
 struct proc;
 
 struct cpuset *cpuset_thread0(void);
+struct cpuset *cpuset_lookup_reserved(cpusetid_t setid);
 struct cpuset *cpuset_ref(struct cpuset *);
 void	cpuset_rel(struct cpuset *);
 int	cpuset_setthread(lwpid_t id, cpuset_t *);

--------------090403010601010707050801--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?539034CD.6080002>