From owner-freebsd-bugs@FreeBSD.ORG Thu Oct 4 10:30:12 2012 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5D84E1065670 for ; Thu, 4 Oct 2012 10:30:12 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3B6EF8FC0C for ; Thu, 4 Oct 2012 10:30:12 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q94AUCxP036184 for ; Thu, 4 Oct 2012 10:30:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q94AUCBd036181; Thu, 4 Oct 2012 10:30:12 GMT (envelope-from gnats) Date: Thu, 4 Oct 2012 10:30:12 GMT Message-Id: <201210041030.q94AUCBd036181@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Andriy Gapon Cc: Subject: Re: kern/172166: Deadlock in the networking code, possible due to a bug in the SCHED_ULE X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Andriy Gapon List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Oct 2012 10:30:12 -0000 The following reply was made to PR kern/172166; it has been noted by GNATS. From: Andriy Gapon To: Eugene Grosbein Cc: bug-followup@FreeBSD.org, Alexander Motin Subject: Re: kern/172166: Deadlock in the networking code, possible due to a bug in the SCHED_ULE Date: Thu, 04 Oct 2012 13:23:55 +0300 on 04/10/2012 09:12 Eugene Grosbein said the following: > 03.10.2012 21:56, Andriy Gapon пишет: >> on 02/10/2012 09:58 Alexander Motin said the following: >>> About rw_lock priority propagation locking(9) tells: >>> The rw_lock locks have priority propagation like mutexes, but priority can be >>> propagated only to an exclusive holder. This limitation comes from the fact that >>> shared owners are anonymous. >> >> Yeah... and as we see it has a potential to result in priority inversion. >> >>> What's about idle stealing threshold, it was fixed in HEAD at r239194, but wasn't >>> merged yet. It should be trivial to merge it. >> >> And I've also misread the code, confused 6 CPUs case with 8 CPUs case. >> BTW, I've just noticed that the syslogd thread had td_pinned == 1 and I can't explain why... But that probably explains why it was not stolen. > > Can I have any advice/workaround/bugfix on how to reconfigure my routers > to prevent them from locking this way? As I said, the primary problem here is the ipmi thread going insane. You can try to remove ipmi driver, if you can afford that. Or you can try to hack on it, so that (1) it voluntary yields even when it thinks that it always has work to do (2) there is some diagnostic on what keeps it running You may also try to set the thread's priority to PUSER (using sched_prio), but I am not sure what bad side-effects may happen because of that. No magic bullet here, sorry. -- Andriy Gapon