From owner-freebsd-current@FreeBSD.ORG Sun Aug 10 06:15:32 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 209AE37B401 for ; Sun, 10 Aug 2003 06:15:32 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id C68A943F3F for ; Sun, 10 Aug 2003 06:15:30 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id XAA08982; Sun, 10 Aug 2003 23:15:23 +1000 Date: Sun, 10 Aug 2003 23:15:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Peter Holm In-Reply-To: <20030810070311.GA97363@peter.osted.lan> Message-ID: <20030810214320.A57453@delplex.bde.org> References: <20030810070311.GA97363@peter.osted.lan> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: Deadlock [PATCH] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Aug 2003 13:15:32 -0000 On Sun, 10 Aug 2003, Peter Holm wrote: > I have tracked down, what I belive to be the cause of several > deadlock situations I have encountered, like > http://people.freebsd.org/~pho/stress/cons40.html. > > The problem seems to be the 4bsd scheduler, that does not preempt correctly. > > I've included a patch that fixes the problem for me. % --- sched_4bsd.c~ Sun Jun 15 16:57:17 2003 % +++ sched_4bsd.c Sun Aug 10 08:41:06 2003 % @@ -448,7 +448,8 @@ % % ke->ke_sched->ske_cpticks++; % kg->kg_estcpu = ESTCPULIM(kg->kg_estcpu + 1); % - if ((kg->kg_estcpu % INVERSE_ESTCPU_WEIGHT) == 0) { % + if (((kg->kg_estcpu + 1) % INVERSE_ESTCPU_WEIGHT) == 0) { % + curthread->td_flags |= TDF_NEEDRESCHED; % resetpriority(kg); % if (td->td_priority >= PUSER) % td->td_priority = kg->kg_user_pri; I wonder how this makes much difference. Setting of the rescheduling flag is supposed to be handled by resetpriority(), so the above at most works around bugs in resetpriority(). There are plenty of bugs in resetpriority(), but they are minor and mostly overcome by larger bugs AFAIK. resetpriority() was first broken in rev.1.48 of kern_synch.c 1999/03/04) and the bugs have moved around a bit since then. The main point of kern_synch.c rev.1.48 was to prevent excessive context switches for non-PRI_TIMESHARE scheduling classes so that POSIX roundrobin and fifo scheduling classes could be implemented using these clasess. This has been broken in -current by context switching even more excessively to switch to interrupt tasks. The main bugs in kern_synch.c were that it broke the PRI_TIMESHARE case by not passing the new priority to maybe_resched() (still broken), and it didn't get the logic for the non-PRI_TIMESHARE case right (now fixed). The excessive context switching in -current makes these bugs mostly moot -- we normally reschedule on the next (non-clock) interrupt, so rescheduling is just delayed a bit. The delay is normally quite small too, at least in the case that the rescheduling is triggered by sched_clock(). roundrobin() is supposed to cause rescheduling every quantum (default 1/10 seconds). sched_clock() calls resetpriority() every 1/16 seconds on i386's and rescheduling 16 times per second isn't much different from rescheduling 10 time per second. But sched_clock() should never cause rescheduling since it should only decrease (numerically increase) the priority of the current thread. I think your deadlock must be related to roundrobin() never running, which might be caused by the bugfeature that userret() doesn't consider rescheduling. roundrobin() is just a timeout routine, and its current implementation depends on excessive context switching. All timeout routines are called from a low priority task, and switching to this task causes roundrobin scheduling of user processes as a side effect. roundrobin() just ensures that there is a timeout every quantum in case the system doesn't otherwise generate timeouts frequently enough, The bugfeature lets processes return to user mode even though their user priority is much lower (numerically higher) than that of other user tasks or perhaps even some kernel tasks. They effectively run with a high kernel priority until they give up control or are kicked by an interrupt. Normally there are enough interrupts to prevent problems. But -current has the curious behaviour of forcibly rescheduling on almost all interrupts _except_ the clock (scheduling) one. Clock interrupts are fast, so they don't cause context switches. Hardclock interrupts schedule timeouts every quantum, but your bug apparently blocks timeouts. Softclock interrupts are not supposed to cause rescheduling (that is roundrobin()'s job). Your change does forcible rescheduling for statclock interrupts. In my version of FreeBSD, clock interrupt handlers are non-fast so they do forcible context switching as a side effect and I probably wouldn't notice this problem. My clock interrupt tasks have a much higher priority than the timeout task so deadlocks are less likely and would be more obvious. I'll try running your stress test. Bruce