From owner-freebsd-bugs@FreeBSD.ORG Tue Oct 2 08:50:06 2012 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9EF05106564A for ; Tue, 2 Oct 2012 08:50:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 67B3D8FC08 for ; Tue, 2 Oct 2012 08:50:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q928o6XJ031704 for ; Tue, 2 Oct 2012 08:50:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q928o6JS031695; Tue, 2 Oct 2012 08:50:06 GMT (envelope-from gnats) Date: Tue, 2 Oct 2012 08:50:06 GMT Message-Id: <201210020850.q928o6JS031695@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Alexander Motin Cc: Subject: Re: kern/172166: Deadlock in the networking code, possible due to a bug in the SCHED_ULE X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Alexander Motin List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2012 08:50:06 -0000 The following reply was made to PR kern/172166; it has been noted by GNATS. From: Alexander Motin To: Eugene Grosbein Cc: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru, Andriy Gapon Subject: Re: kern/172166: Deadlock in the networking code, possible due to a bug in the SCHED_ULE Date: Tue, 02 Oct 2012 11:45:23 +0300 On 02.10.2012 10:59, Eugene Grosbein wrote: > 02.10.2012 14:53, Alexander Motin пишет: >> On 02.10.2012 10:48, Eugene Grosbein wrote: >>> 02.10.2012 13:58, Alexander Motin пишет: >>>> About rw_lock priority propagation locking(9) tells: >>>> The rw_lock locks have priority propagation like mutexes, but priority >>>> can be propagated only to an exclusive holder. This limitation comes >>>> from the fact that shared owners are anonymous. >>>> >>>> What's about idle stealing threshold, it was fixed in HEAD at r239194, >>>> but wasn't merged yet. It should be trivial to merge it. >>> >>> Would it fix my problem with 6-CPU box? >>> Your commit log talks about "8 or more cores". >> >> Hmm. Then I see no reason why threads were not stolen, unless they are >> bound to specific CPU. Check `sysctl kern.sched.steal_thresh` output to >> be sure. > > All NIC's threads and dummynet are bound in my boxes. > igb(4) in RELENG_8 bounds its threads by default in very wrong way, > so I rebound them. dummynet(8) in RELENG_8 goes wild under severe load > unless bound to single or two cores. That can be an answer. Active thread can never never stolen and if it has high absolute priority and never sleeps voluntary -- it will run there forever. If all other threads are bound to that CPU, they also can not be stolen and will wait forever. > kern.sched.steal_thresh: 2 This should not prevent stealing. PS: I've just noticed that for some reason I haven't merged my scheduler improvements to 8-STABLE branch. So behavior may differ from one in HEAD or 9-STABLE. I will recheck commits history to recall what stopped me from merge. But I don't remember all details to predict whether it may affect your problem somehow. -- Alexander Motin