From owner-freebsd-arch@FreeBSD.ORG Sun Sep 26 03:50:48 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8D6A216A4CE for ; Sun, 26 Sep 2004 03:50:47 +0000 (GMT) Received: from ylpvm43.prodigy.net (ylpvm43-ext.prodigy.net [207.115.57.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7A08543D39 for ; Sun, 26 Sep 2004 03:50:47 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net [67.124.49.205])i8Q3orCE026142; Sat, 25 Sep 2004 23:50:54 -0400 Message-ID: <41563C95.2020501@elischer.org> Date: Sat, 25 Sep 2004 20:50:45 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Stephan Uphoff References: <1095468747.31297.241.camel@palm.tree.com> <414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> In-Reply-To: <1096135220.53798.17754.camel@palm.tree.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: peter@holm.cc cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Sep 2004 03:50:48 -0000 Stephan Uphoff wrote: >>Maybe something brutal like: >> if ((curthread->td_ksegrp == kg) && >> (td->td_priority > curthread->td_priority)) >> curthread->td_flags |= TDF_NEEDRESCHED; >> >>in setrunqueue for >>the else case of "if (kg->kg_avail_opennings > 0)" >>would do the trick (without preemption) for the easy but probably more >>common cases? >> >>Maybe I can find some time next week to think about a clean >>fix. I find it always helpful having a small task in mind while reading >>source code. > > > I wrote a fix that should cover all cases. > However I would like to test it a little bit before posting the patch. > Is there any multi-threaded kernel torture program that you can > recommend? Peter Holm (CC'd) has a really cool set of torture tests. he has also seen all sorts of failures others have not (yet) triggered. :-) I'm 'busy" for the next couple of weeks so you may want to communicate directly with him and see if you and he together can figure out some of the things he's been seeing :-) his tests are at: http://www.holm.cc/stress/src/stress.tgz > > Thanks > > Stephan > From owner-freebsd-arch@FreeBSD.ORG Sun Sep 26 07:52:28 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C631F16A4CE for ; Sun, 26 Sep 2004 07:52:28 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 4DB1B43D58 for ; Sun, 26 Sep 2004 07:52:26 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 57081 invoked from network); 26 Sep 2004 07:52:24 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 26 Sep 2004 07:52:24 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8Q7qLCs086060; Sun, 26 Sep 2004 09:52:22 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i8Q7qIuj086059; Sun, 26 Sep 2004 09:52:18 +0200 (CEST) (envelope-from pho) Date: Sun, 26 Sep 2004 09:52:18 +0200 From: Peter Holm To: Julian Elischer Message-ID: <20040926075218.GA85983@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <41563C95.2020501@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41563C95.2020501@elischer.org> User-Agent: Mutt/1.4.1i cc: peter@holm.cc cc: "freebsd-arch@freebsd.org" cc: Stephan Uphoff Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Sep 2004 07:52:28 -0000 On Sat, Sep 25, 2004 at 08:50:45PM -0700, Julian Elischer wrote: > Stephan Uphoff wrote: > > >>Maybe something brutal like: > >> if ((curthread->td_ksegrp == kg) && > >> (td->td_priority > curthread->td_priority)) > >> curthread->td_flags |= TDF_NEEDRESCHED; > >> > >>in setrunqueue for > >>the else case of "if (kg->kg_avail_opennings > 0)" > >>would do the trick (without preemption) for the easy but probably more > >>common cases? > >> > >>Maybe I can find some time next week to think about a clean > >>fix. I find it always helpful having a small task in mind while reading > >>source code. > > > > > >I wrote a fix that should cover all cases. > >However I would like to test it a little bit before posting the patch. > >Is there any multi-threaded kernel torture program that you can > >recommend? > > > Peter Holm (CC'd) has a really cool set of torture tests. > he has also seen all sorts of failures others have not (yet) triggered. :-) > > I'm 'busy" for the next couple of weeks so you may want to communicate > directly with him and see if you and he together can figure out some of the > things he's > been seeing :-) > > his tests are at: > http://www.holm.cc/stress/src/stress.tgz > > > > >Thanks > > > > Stephan > > I'll be glad to test any patches. -- Peter Holm From owner-freebsd-arch@FreeBSD.ORG Sun Sep 26 17:26:35 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DAEC116A4CE for ; Sun, 26 Sep 2004 17:26:35 +0000 (GMT) Received: from harmony.village.org (rover.village.org [168.103.84.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8229643D48 for ; Sun, 26 Sep 2004 17:26:35 +0000 (GMT) (envelope-from imp@bsdimp.com) Received: from localhost (harmony.village.org [10.0.0.6]) by harmony.village.org (8.13.1/8.13.1) with ESMTP id i8QHNYRr067513; Sun, 26 Sep 2004 11:23:35 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Sun, 26 Sep 2004 11:24:43 -0600 (MDT) Message-Id: <20040926.112443.96451447.imp@bsdimp.com> To: phk@phk.freebsd.dk From: "M. Warner Losh" In-Reply-To: <41458.1096016465@critter.freebsd.dk> References: <41458.1096016465@critter.freebsd.dk> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: arch@freebsd.org Subject: Re: I'm counting my threads, one, two, three, four, five... [1] X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Sep 2004 17:26:36 -0000 In message: <41458.1096016465@critter.freebsd.dk> "Poul-Henning Kamp" writes: : I belive this gives us the handle we need to unload drivers and remove : hardware without panicing in the lower layers of the kernel. The : higher layers may still have a thing or two to learn in this respect. I've been extremely worried about the dev interface into the driver for a long time. This proposal looks excellent, and I can't think of anything else to add to it. It is good to see all the concerns in this area you and I have talked about over the years appear to be addressed by this. The biggest problem now is that I need to address the device_t level locking. I think with network layer locking and dev_t locking being under control, it is close to time to tackle it. The other big problem may happen in the device detach routines of bus drivers not being happy with new-found sleeps. However, these two problems always existed :-( Well done! Warner From owner-freebsd-arch@FreeBSD.ORG Mon Sep 27 13:05:08 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5862916A4CE for ; Mon, 27 Sep 2004 13:05:08 +0000 (GMT) Received: from green.homeunix.org (pcp04368961pcs.nrockv01.md.comcast.net [69.140.212.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id C417B43D39 for ; Mon, 27 Sep 2004 13:05:07 +0000 (GMT) (envelope-from green@green.homeunix.org) Received: from green.homeunix.org (green@localhost [127.0.0.1]) by green.homeunix.org (8.13.1/8.13.1) with ESMTP id i8RD54Bw022815; Mon, 27 Sep 2004 09:05:04 -0400 (EDT) (envelope-from green@green.homeunix.org) Received: (from green@localhost) by green.homeunix.org (8.13.1/8.13.1/Submit) id i8RD54U5022814; Mon, 27 Sep 2004 09:05:04 -0400 (EDT) (envelope-from green) Date: Mon, 27 Sep 2004 09:05:03 -0400 From: Brian Fundakowski Feldman To: Stephan Uphoff Message-ID: <20040927130503.GD1164@green.homeunix.org> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1096135220.53798.17754.camel@palm.tree.com> User-Agent: Mutt/1.5.6i cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2004 13:05:08 -0000 On Sat, Sep 25, 2004 at 02:00:20PM -0400, Stephan Uphoff wrote: > On Sat, 2004-09-18 at 13:42, Stephan Uphoff wrote: > > On Fri, 2004-09-17 at 21:20, Julian Elischer wrote: > > > Stephan Uphoff wrote: > > > >I am also stomped by the special case of adding a thread X with better > > > >priority than the current thread to the runqueue if they belong to the > > > >same ksegroup. In this case both kg_last_assigned and kg_avail_opennings > > > >might be zero and setrunqueue() will not call sched_add(). > > > >Because of this it looks like the current thread will neither be > > > >preempted not will TDF_NEEDRESCHED be set to force rescheduling at the > > > >kernel boundary. > > > >This situation should resolve itself at the next sched_switch - however > > > >this might take a long time. (Especially if essential interrupt threads > > > >are blocked by mutexes held by thread X) > > > > > > > > > > you are correct. I am not yet preempting a running thread with a lesser > > > priority if they are siblings > > > (unless there is a slot available) Thsi is not becasue I don't want to > > > do it, but simply because it has not been done yet.. > > > we did have NO preemption, so having "some" preemption is still better > > > than where we were. > > > Special case code to check curthread for a preemption could be done but > > > at the moment the decision code for > > > whether to preempt or not is in maybe_preempt() and I don't want to > > > duplicate that. it is on th edrawing board though. > > > The other thing is, that even if we should be able to preempt a running > > > thread, there is no guarantee that it is on THIS > > > CPU. It may be on another CPU and that gets nasty in a hurry. > > > > Yes .. this could get nasty. > > This happens when the thread is bound to another cpu or someone changed > > thr_concurrency - otherwise the current thread must be a sibling right ? > > > > Maybe something brutal like: > > if ((curthread->td_ksegrp == kg) && > > (td->td_priority > curthread->td_priority)) > > curthread->td_flags |= TDF_NEEDRESCHED; > > > > in setrunqueue for > > the else case of "if (kg->kg_avail_opennings > 0)" > > would do the trick (without preemption) for the easy but probably more > > common cases? > > > > Maybe I can find some time next week to think about a clean > > fix. I find it always helpful having a small task in mind while reading > > source code. > > I wrote a fix that should cover all cases. > However I would like to test it a little bit before posting the patch. > Is there any multi-threaded kernel torture program that you can > recommend? It wasn't particularly designed as such but the utility in the src/tools/regression/gaithrstress/ directory is very quick at provoking thread/SMP/scheduler bugs if you give it a high thread count (and use a pretty fast DNS, I suppose). -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Mon Sep 27 14:15:57 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E9E316A4CE for ; Mon, 27 Sep 2004 14:15:57 +0000 (GMT) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 519BC43D1F for ; Mon, 27 Sep 2004 14:15:57 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 27172 invoked from network); 27 Sep 2004 14:15:56 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 27 Sep 2004 14:15:56 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8REFoCE012318; Mon, 27 Sep 2004 10:15:52 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Mon, 27 Sep 2004 10:16:13 -0400 User-Agent: KMail/1.6.2 References: <1096133353.53798.17613.camel@palm.tree.com> In-Reply-To: <1096133353.53798.17613.camel@palm.tree.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409271016.13345.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2004 14:15:57 -0000 On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote: > When a thread is about to return to user space it resets its priority to > the user level priority. > However after lowering the permission its priority it needs to check if > its priority is still better than all other runable threads. > This is currently not implemented. > Without the check the thread can block kernel or user threads with > better priority until a switch is forced by by an interrupt. > > The attached patch checks the relevant runqueues and threads without > slots in the same ksegrp and forces a thread switch if the currently > running thread is no longer the best thread to run after it changed its > priority. > > The patch should improve interactive response under heavy load somewhat. > It needs a lot of testing. Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED based on on a comparison against user_pri rather than td_priority inside of sched_add()? Having the flag set by sched_add() is supposed to make this sort of check unnecessary. Even 4.x has the same bug I think as a process can make another process runnable after it's priority has been boosted by a tsleep() and need_resched() is only called based on a comparison of p_pri. Ah, 4.x doesn't have the bug because it caches the priority of curproc when it enters the kernel and compares against that. Thus, I think the correct fix is more like this: Index: sched_4bsd.c =================================================================== RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v retrieving revision 1.63 diff -u -r1.63 sched_4bsd.c --- sched_4bsd.c 11 Sep 2004 10:07:22 -0000 1.63 +++ sched_4bsd.c 27 Sep 2004 14:12:03 -0000 @@ -272,7 +272,7 @@ { mtx_assert(&sched_lock, MA_OWNED); - if (td->td_priority < curthread->td_priority) + if (td->td_priority < curthread->td_ksegrp->kg_user_pri) curthread->td_flags |= TDF_NEEDRESCHED; } Index: sched_ule.c =================================================================== RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v retrieving revision 1.129 diff -u -r1.129 sched_ule.c --- sched_ule.c 11 Sep 2004 10:07:22 -0000 1.129 +++ sched_ule.c 27 Sep 2004 14:13:01 -0000 @@ -723,7 +723,7 @@ */ pcpu = pcpu_find(cpu); td = pcpu->pc_curthread; - if (ke->ke_thread->td_priority < td->td_priority || + if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri || td == pcpu->pc_idlethread) { td->td_flags |= TDF_NEEDRESCHED; ipi_selected(1 << cpu, IPI_AST); An even better fix might be to fix td_base_pri by having it be set on kernel entry similar to how 4.x sets curpriority. The above fix should be sufficient for now, however. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Mon Sep 27 17:20:30 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8407216A4CF for ; Mon, 27 Sep 2004 17:20:30 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id B3DCC43D3F for ; Mon, 27 Sep 2004 17:20:29 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 18258 invoked by uid 89); 27 Sep 2004 17:20:28 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 27 Sep 2004 17:20:28 -0000 Received: (qmail 18241 invoked by uid 89); 27 Sep 2004 17:20:28 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 27 Sep 2004 17:20:28 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8RHKQmt003500; Mon, 27 Sep 2004 13:20:27 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <200409271016.13345.jhb@FreeBSD.org> References: <1096133353.53798.17613.camel@palm.tree.com> <200409271016.13345.jhb@FreeBSD.org> Content-Type: text/plain Message-Id: <1096305626.95152.163.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Mon, 27 Sep 2004 13:20:26 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2004 17:20:30 -0000 On Mon, 2004-09-27 at 10:16, John Baldwin wrote: > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote: > > When a thread is about to return to user space it resets its priority to > > the user level priority. > > However after lowering the permission its priority it needs to check if > > its priority is still better than all other runable threads. > > This is currently not implemented. > > Without the check the thread can block kernel or user threads with > > better priority until a switch is forced by by an interrupt. > > > > The attached patch checks the relevant runqueues and threads without > > slots in the same ksegrp and forces a thread switch if the currently > > running thread is no longer the best thread to run after it changed its > > priority. > > > > The patch should improve interactive response under heavy load somewhat. > > It needs a lot of testing. > > Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED based > on on a comparison against user_pri rather than td_priority inside of > sched_add()? Having the flag set by sched_add() is supposed to make this > sort of check unnecessary. Even 4.x has the same bug I think as a process > can make another process runnable after it's priority has been boosted by a > tsleep() and need_resched() is only called based on a comparison of p_pri. > Ah, 4.x doesn't have the bug because it caches the priority of curproc when > it enters the kernel and compares against that. Thus, I think the correct > fix is more like this: > > Index: sched_4bsd.c > =================================================================== > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v > retrieving revision 1.63 > diff -u -r1.63 sched_4bsd.c > --- sched_4bsd.c 11 Sep 2004 10:07:22 -0000 1.63 > +++ sched_4bsd.c 27 Sep 2004 14:12:03 -0000 > @@ -272,7 +272,7 @@ > { > > mtx_assert(&sched_lock, MA_OWNED); > - if (td->td_priority < curthread->td_priority) > + if (td->td_priority < curthread->td_ksegrp->kg_user_pri) > curthread->td_flags |= TDF_NEEDRESCHED; > } > > Index: sched_ule.c > =================================================================== > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v > retrieving revision 1.129 > diff -u -r1.129 sched_ule.c > --- sched_ule.c 11 Sep 2004 10:07:22 -0000 1.129 > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000 > @@ -723,7 +723,7 @@ > */ > pcpu = pcpu_find(cpu); > td = pcpu->pc_curthread; > - if (ke->ke_thread->td_priority < td->td_priority || > + if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri || > td == pcpu->pc_idlethread) { > td->td_flags |= TDF_NEEDRESCHED; > ipi_selected(1 << cpu, IPI_AST); > > An even better fix might be to fix td_base_pri by having it be set on kernel > entry similar to how 4.x sets curpriority. The above fix should be > sufficient for now, however. I don't think that this is enough since TDF_NEEDRESCHED is thread specific and not cpu specific. However the thread marked with TDF_NEEDRESCHED might not be the next thread leaving the kernel. ( Can't really talk about ULE since I am trying to avoid looking at another shiny irresistible time sink this week ;-) I think we agree that that td_priority should be set to td_base_pri on kernel entry. Since td_base_pri is changed by sleep and condvar functions it should also be reset on kernel entry. (Probably from a new ksegrp field). Condvar waits should currently non cause the base priority to change to the current priority of the thread - otherwise td_base_pri could get stuck at a really bad user priority. ( td->td_base_pri might end up being worse than td->td_ksegrp->kg_user_pri when the ksegrp priority improves) Stephan From owner-freebsd-arch@FreeBSD.ORG Mon Sep 27 18:56:56 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DCEF016A4CE for ; Mon, 27 Sep 2004 18:56:56 +0000 (GMT) Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id 96A1E43D1D for ; Mon, 27 Sep 2004 18:56:56 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 7086 invoked from network); 27 Sep 2004 18:56:56 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 27 Sep 2004 18:56:42 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8RIsVY8014057; Mon, 27 Sep 2004 14:56:34 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Stephan Uphoff Date: Mon, 27 Sep 2004 14:43:22 -0400 User-Agent: KMail/1.6.2 References: <1096133353.53798.17613.camel@palm.tree.com> <200409271016.13345.jhb@FreeBSD.org> <1096305626.95152.163.camel@palm.tree.com> In-Reply-To: <1096305626.95152.163.camel@palm.tree.com> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200409271443.22667.jhb@FreeBSD.org> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2004 18:56:57 -0000 On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote: > On Mon, 2004-09-27 at 10:16, John Baldwin wrote: > > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote: > > > When a thread is about to return to user space it resets its priority > > > to the user level priority. > > > However after lowering the permission its priority it needs to check if > > > its priority is still better than all other runable threads. > > > This is currently not implemented. > > > Without the check the thread can block kernel or user threads with > > > better priority until a switch is forced by by an interrupt. > > > > > > The attached patch checks the relevant runqueues and threads without > > > slots in the same ksegrp and forces a thread switch if the currently > > > running thread is no longer the best thread to run after it changed its > > > priority. > > > > > > The patch should improve interactive response under heavy load > > > somewhat. It needs a lot of testing. > > > > Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED > > based on on a comparison against user_pri rather than td_priority inside > > of sched_add()? Having the flag set by sched_add() is supposed to make > > this sort of check unnecessary. Even 4.x has the same bug I think as a > > process can make another process runnable after it's priority has been > > boosted by a tsleep() and need_resched() is only called based on a > > comparison of p_pri. Ah, 4.x doesn't have the bug because it caches the > > priority of curproc when it enters the kernel and compares against that. > > Thus, I think the correct fix is more like this: > > > > Index: sched_4bsd.c > > =================================================================== > > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v > > retrieving revision 1.63 > > diff -u -r1.63 sched_4bsd.c > > --- sched_4bsd.c 11 Sep 2004 10:07:22 -0000 1.63 > > +++ sched_4bsd.c 27 Sep 2004 14:12:03 -0000 > > @@ -272,7 +272,7 @@ > > { > > > > mtx_assert(&sched_lock, MA_OWNED); > > - if (td->td_priority < curthread->td_priority) > > + if (td->td_priority < curthread->td_ksegrp->kg_user_pri) > > curthread->td_flags |= TDF_NEEDRESCHED; > > } > > > > Index: sched_ule.c > > =================================================================== > > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v > > retrieving revision 1.129 > > diff -u -r1.129 sched_ule.c > > --- sched_ule.c 11 Sep 2004 10:07:22 -0000 1.129 > > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000 > > @@ -723,7 +723,7 @@ > > */ > > pcpu = pcpu_find(cpu); > > td = pcpu->pc_curthread; > > - if (ke->ke_thread->td_priority < td->td_priority || > > + if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri || > > td == pcpu->pc_idlethread) { > > td->td_flags |= TDF_NEEDRESCHED; > > ipi_selected(1 << cpu, IPI_AST); > > > > An even better fix might be to fix td_base_pri by having it be set on > > kernel entry similar to how 4.x sets curpriority. The above fix should > > be sufficient for now, however. > > I don't think that this is enough since TDF_NEEDRESCHED is thread > specific and not cpu specific. Hmm, it is CPU specific in 4.x. It could be changed back to being a per-cpu flag easily. > However the thread marked with TDF_NEEDRESCHED might not be the next > thread leaving the kernel. > ( Can't really talk about ULE since I am trying to avoid looking at > another shiny irresistible time sink this week ;-) > > I think we agree that that td_priority should be set to td_base_pri on > kernel entry. Since td_base_pri is changed by sleep and condvar > functions it should also be reset on kernel entry. (Probably from a new > ksegrp field). Condvar waits should currently non cause the base > priority to change to the current priority of the thread - otherwise > td_base_pri could get stuck at a really bad user priority. > ( td->td_base_pri might end up being worse than > td->td_ksegrp->kg_user_pri when the ksegrp priority improves) Well, I think instead that td_base_pri should be set to td_priority on kernel entry (rather than the other way around). td_priority should be unchanged just because it enters the kernel. I think the sleep functions could then leave td_base_pri alone. (I think setting it there is wrong because td_base_pri is not quite the same as curpriority in 4.x.) What td_base_pri is really supposed to provide, btw, is the priority that the thread should go back to once it has unlocked a mutex and had its priorty boosted while it held the mutex. Arguably it should just be using kg_user_pri for this, but then you loose priority "boosts" from tsleep(), which is why td_base_pri is set in msleep(). I guess what should happen is something more like this: kernel_entry() { KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri); td->td_base_pri = td->td_priority; } msleep() { sched_prio(...); td_base_pri = td->td_priority; } The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous e-mail. Also, in sched_prio(), if our priority is ever raised (numerically, logically less important), we should set TDF_NEEDRESCHED since we may need to switch (4.x does this in maybe_needresched()). Then, TDF_NEEDRESCHED could become a per-cpu flag and have it not be cleared in mi_switch() but be cleared only in userret(). Hmm, I think all of the TDF_NEEDRESCHED handling actually beings in sched_userret() btw. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Mon Sep 27 21:28:11 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FDBE16A4CE for ; Mon, 27 Sep 2004 21:28:11 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id A451D43D46 for ; Mon, 27 Sep 2004 21:28:10 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 18284 invoked by uid 89); 27 Sep 2004 21:28:09 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 27 Sep 2004 21:28:09 -0000 Received: (qmail 18262 invoked by uid 89); 27 Sep 2004 21:28:09 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 27 Sep 2004 21:28:09 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8RLS7mt004724; Mon, 27 Sep 2004 17:28:07 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <200409271443.22667.jhb@FreeBSD.org> References: <1096133353.53798.17613.camel@palm.tree.com> <200409271016.13345.jhb@FreeBSD.org> <1096305626.95152.163.camel@palm.tree.com> <200409271443.22667.jhb@FreeBSD.org> Content-Type: text/plain Message-Id: <1096320486.3733.58.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Mon, 27 Sep 2004 17:28:07 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2004 21:28:11 -0000 On Mon, 2004-09-27 at 14:43, John Baldwin wrote: > On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote: > > On Mon, 2004-09-27 at 10:16, John Baldwin wrote: > > > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote: > > > > When a thread is about to return to user space it resets its priority > > > > to the user level priority. > > > > However after lowering the permission its priority it needs to check if > > > > its priority is still better than all other runable threads. > > > > This is currently not implemented. > > > > Without the check the thread can block kernel or user threads with > > > > better priority until a switch is forced by by an interrupt. > > > > > > > > The attached patch checks the relevant runqueues and threads without > > > > slots in the same ksegrp and forces a thread switch if the currently > > > > running thread is no longer the best thread to run after it changed its > > > > priority. > > > > > > > > The patch should improve interactive response under heavy load > > > > somewhat. It needs a lot of testing. > > > > > > Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED > > > based on on a comparison against user_pri rather than td_priority inside > > > of sched_add()? Having the flag set by sched_add() is supposed to make > > > this sort of check unnecessary. Even 4.x has the same bug I think as a > > > process can make another process runnable after it's priority has been > > > boosted by a tsleep() and need_resched() is only called based on a > > > comparison of p_pri. Ah, 4.x doesn't have the bug because it caches the > > > priority of curproc when it enters the kernel and compares against that. > > > Thus, I think the correct fix is more like this: > > > > > > Index: sched_4bsd.c > > > =================================================================== > > > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v > > > retrieving revision 1.63 > > > diff -u -r1.63 sched_4bsd.c > > > --- sched_4bsd.c 11 Sep 2004 10:07:22 -0000 1.63 > > > +++ sched_4bsd.c 27 Sep 2004 14:12:03 -0000 > > > @@ -272,7 +272,7 @@ > > > { > > > > > > mtx_assert(&sched_lock, MA_OWNED); > > > - if (td->td_priority < curthread->td_priority) > > > + if (td->td_priority < curthread->td_ksegrp->kg_user_pri) > > > curthread->td_flags |= TDF_NEEDRESCHED; > > > } > > > > > > Index: sched_ule.c > > > =================================================================== > > > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v > > > retrieving revision 1.129 > > > diff -u -r1.129 sched_ule.c > > > --- sched_ule.c 11 Sep 2004 10:07:22 -0000 1.129 > > > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000 > > > @@ -723,7 +723,7 @@ > > > */ > > > pcpu = pcpu_find(cpu); > > > td = pcpu->pc_curthread; > > > - if (ke->ke_thread->td_priority < td->td_priority || > > > + if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri || > > > td == pcpu->pc_idlethread) { > > > td->td_flags |= TDF_NEEDRESCHED; > > > ipi_selected(1 << cpu, IPI_AST); > > > > > > An even better fix might be to fix td_base_pri by having it be set on > > > kernel entry similar to how 4.x sets curpriority. The above fix should > > > be sufficient for now, however. > > > > I don't think that this is enough since TDF_NEEDRESCHED is thread > > specific and not cpu specific. > > Hmm, it is CPU specific in 4.x. It could be changed back to being a per-cpu > flag easily. But this might not help :-(. Example: Thread A is running in the kernel and is preempted by an interrupt Thread I. Thread I wakes up thread B. If I->td_ksegrp->kg_user_pri <= B->td_priority TDF_NEEDRESCHED will not be set. If A->td_priority < B->td_priority thread A will run once I is finished serving interrupts. Thread A can now leave the kernel also A->td_ksegrp->kg_user_pri > B->td_priority may be true. > > However the thread marked with TDF_NEEDRESCHED might not be the next > > thread leaving the kernel. > > ( Can't really talk about ULE since I am trying to avoid looking at > > another shiny irresistible time sink this week ;-) > > > > I think we agree that that td_priority should be set to td_base_pri on > > kernel entry. Since td_base_pri is changed by sleep and condvar > > functions it should also be reset on kernel entry. (Probably from a new > > ksegrp field). Condvar waits should currently non cause the base > > priority to change to the current priority of the thread - otherwise > > td_base_pri could get stuck at a really bad user priority. > > ( td->td_base_pri might end up being worse than > > td->td_ksegrp->kg_user_pri when the ksegrp priority improves) > > Well, I think instead that td_base_pri should be set to td_priority on kernel > entry (rather than the other way around). td_priority should be unchanged > just because it enters the kernel. I guess we disagree here. There are just to many resource dependencies in the kernel that can lead to priority inversion (vnode locks, disk buffer ownership, etc). It would be nice to delay the priority boost until a thread acquires such a resource (or even trace resource dependencies and implement priority inheritance) ... but this would be a huge task. Boosting the priority on kernel entry is easy and less error prone. I guess we had this discussion last week and we just disagree on the issue. > I think the sleep functions could then > leave td_base_pri alone. (I think setting it there is wrong because > td_base_pri is not quite the same as curpriority in 4.x.) What td_base_pri > is really supposed to provide, btw, is the priority that the thread should go > back to once it has unlocked a mutex and had its priorty boosted while it > held the mutex. Arguably it should just be using kg_user_pri for this, but > then you loose priority "boosts" from tsleep(), which is why td_base_pri is > set in msleep(). I guess what should happen is something more like this: > > kernel_entry() > { > KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri); > td->td_base_pri = td->td_priority; > } > > msleep() > { > sched_prio(...); > td_base_pri = td->td_priority; > } > > The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous > e-mail. Also, in sched_prio(), if our priority is ever raised (numerically, > logically less important), we should set TDF_NEEDRESCHED since we may need to > switch (4.x does this in maybe_needresched()). Then, TDF_NEEDRESCHED could > become a per-cpu flag and have it not be cleared in mi_switch() but be > cleared only in userret(). Hmm, I think all of the TDF_NEEDRESCHED handling > actually beings in sched_userret() btw. Wouldn't this lead to unnecessary round-robin switches between threads with the same priority on sched_4bsd? Stephan From owner-freebsd-arch@FreeBSD.ORG Mon Sep 27 22:10:43 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 42EBD16A4E1; Mon, 27 Sep 2004 22:10:43 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id CD74143D2D; Mon, 27 Sep 2004 22:10:13 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id A86637A41E; Mon, 27 Sep 2004 15:10:13 -0700 (PDT) Message-ID: <41588FC5.6090203@elischer.org> Date: Mon, 27 Sep 2004 15:10:13 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Stephan Uphoff References: <1096133353.53798.17613.camel@palm.tree.com> <200409271016.13345.jhb@FreeBSD.org> <1096305626.95152.163.camel@palm.tree.com> <200409271443.22667.jhb@FreeBSD.org> <1096320486.3733.58.camel@palm.tree.com> In-Reply-To: <1096320486.3733.58.camel@palm.tree.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: "freebsd-arch@freebsd.org" Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2004 22:10:43 -0000 Stephan Uphoff wrote: >On Mon, 2004-09-27 at 14:43, John Baldwin wrote: > > >>>>@@ -272,7 +272,7 @@ >>>> { >>>> >>>> mtx_assert(&sched_lock, MA_OWNED); >>>>- if (td->td_priority < curthread->td_priority) >>>>+ if (td->td_priority < curthread->td_ksegrp->kg_user_pri) >>>> curthread->td_flags |= TDF_NEEDRESCHED; >>>> } >>>> >>>> in sched_userret() we do: kg = td->td_ksegrp; if (td->td_priority != kg->kg_user_pri) { mtx_lock_spin(&sched_lock); td->td_priority = kg->kg_user_pri; mtx_unlock_spin(&sched_lock); } but we don't actually take any action in the case where the thread is heading out to userland with a priority of less importance than a waiting thread. That happens in AST() where we also set it down but only in the case of TDF_NEEDRESCHED being set. it would make more sense to ALWAYS to the TDF_NEEDRESCHED clause, in userret() based on the user priority... i.e. the priority would be reduced going to userland. Unfortunatly this would stop one of the reasons to for priorityu raisning in BSD. The priority of a thread that waits for IO is raised not only to make it start again in the kernel as an interactive thread, but also so that it can run into userland too and get some priority for actually USING the new data/input.. it maybe that we should consider the priority as a number of different components that should not be added together until needed. The "interractive IO priority boost" that comes from having doen an msleep(). should wear off very quickly.. maybe we knock it down again at the first or second clock tick but to do that we nned to track that "interractive boost" separatly from the general priority. >>>>Index: sched_ule.c >>>>=================================================================== >>>>RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v >>>>retrieving revision 1.129 >>>>diff -u -r1.129 sched_ule.c >>>>--- sched_ule.c 11 Sep 2004 10:07:22 -0000 1.129 >>>>+++ sched_ule.c 27 Sep 2004 14:13:01 -0000 >>>>@@ -723,7 +723,7 @@ >>>> */ >>>> pcpu = pcpu_find(cpu); >>>> td = pcpu->pc_curthread; >>>>- if (ke->ke_thread->td_priority < td->td_priority || >>>>+ if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri || >>>> td == pcpu->pc_idlethread) { >>>> td->td_flags |= TDF_NEEDRESCHED; >>>> ipi_selected(1 << cpu, IPI_AST); >>>> >>>>An even better fix might be to fix td_base_pri by having it be set on >>>>kernel entry similar to how 4.x sets curpriority. The above fix should >>>>be sufficient for now, however. >>>> >>>> >>>I don't think that this is enough since TDF_NEEDRESCHED is thread >>>specific and not cpu specific. >>> >>> >>Hmm, it is CPU specific in 4.x. It could be changed back to being a per-cpu >>flag easily. >> >> > >I guess we disagree here. >There are just to many resource dependencies in the kernel that can lead >to priority inversion (vnode locks, disk buffer ownership, etc). >It would be nice to delay the priority boost until a thread acquires >such a resource (or even trace resource dependencies and implement >priority inheritance) ... but this would be a huge task. >Boosting the priority on kernel entry is easy and less error prone. I >guess we had this discussion last week and we just disagree on the >issue. > > > >>I think the sleep functions could then >>leave td_base_pri alone. (I think setting it there is wrong because >>td_base_pri is not quite the same as curpriority in 4.x.) What td_base_pri >>is really supposed to provide, btw, is the priority that the thread should go >>back to once it has unlocked a mutex and had its priorty boosted while it >>held the mutex. Arguably it should just be using kg_user_pri for this, but >>then you loose priority "boosts" from tsleep(), which is why td_base_pri is >>set in msleep(). I guess what should happen is something more like this: >> >>kernel_entry() >>{ >> KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri); >> td->td_base_pri = td->td_priority; >>} >> >>msleep() >>{ >> sched_prio(...); >> td_base_pri = td->td_priority; >>} >> >>The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous >>e-mail. Also, in sched_prio(), if our priority is ever raised (numerically, >>logically less important), we should set TDF_NEEDRESCHED since we may need to >>switch (4.x does this in maybe_needresched()). Then, TDF_NEEDRESCHED could >>become a per-cpu flag and have it not be cleared in mi_switch() but be >>cleared only in userret(). Hmm, I think all of the TDF_NEEDRESCHED handling >>actually beings in sched_userret() btw. >> >> > >Wouldn't this lead to unnecessary round-robin switches between threads >with the same priority on sched_4bsd? > maybe.. I added code to be "kind" to preempted threads (by puting them back on the head of their queue) but overall the 4bsd scheduler doesn't translate very well into a multithreaded and multi processer world.. if you have multiple threads in a process, how much of the interractive boost do you assign to the thread and how much to the process, which may be doing work derived from the IO but in another thread? It gets even more difficult when you realise that user threads can switch between kernel threads without the kernel being aware. I have ocnsiderred of the kernel SHUOLD be aware of user theads. By which I mean that while we might have only a few FUll kernel threads (with stacks etc) it might be a worthwhile thing to keep a small structure in teh kernel to correspond with each user thread, that can hold a coup,e of basic parameters. Sort of a hybrid between the current "Only the UTS knows about all threads" and "full kernel knowledge of all threads".. The kernel knows abot them and their history, but doesn't need to supply full running resources for them, just a small (probably maybe only 8 ints worth may be enough) amount of info that can be looked up on kernel entry using the mailbox. > > Stephan > >_______________________________________________ >freebsd-arch@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-arch >To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > From owner-freebsd-arch@FreeBSD.ORG Tue Sep 28 02:27:54 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AD8BF16A4CE for ; Tue, 28 Sep 2004 02:27:54 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 358A543D3F for ; Tue, 28 Sep 2004 02:27:54 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 17397 invoked by uid 89); 28 Sep 2004 02:27:52 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:27:52 -0000 Received: (qmail 17357 invoked by uid 89); 28 Sep 2004 02:27:52 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:27:52 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8S2Rqmt005983; Mon, 27 Sep 2004 22:27:52 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Julian Elischer In-Reply-To: <41588FC5.6090203@elischer.org> References: <1096133353.53798.17613.camel@palm.tree.com> <200409271016.13345.jhb@FreeBSD.org> <1096305626.95152.163.camel@palm.tree.com> <200409271443.22667.jhb@FreeBSD.org><41588FC5.6090203@elischer.org> Content-Type: text/plain Message-Id: <1096338471.3733.254.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Mon, 27 Sep 2004 22:27:51 -0400 Content-Transfer-Encoding: 7bit cc: "freebsd-arch@freebsd.org" Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2004 02:27:54 -0000 On Mon, 2004-09-27 at 18:10, Julian Elischer wrote: > Stephan Uphoff wrote: > > >On Mon, 2004-09-27 at 14:43, John Baldwin wrote: > > > > > >>>>@@ -272,7 +272,7 @@ > >>>> { > >>>> > >>>> mtx_assert(&sched_lock, MA_OWNED); > >>>>- if (td->td_priority < curthread->td_priority) > >>>>+ if (td->td_priority < curthread->td_ksegrp->kg_user_pri) > >>>> curthread->td_flags |= TDF_NEEDRESCHED; > >>>> } > >>>> > >>>> > > in sched_userret() we do: > kg = td->td_ksegrp; > if (td->td_priority != kg->kg_user_pri) { > mtx_lock_spin(&sched_lock); > td->td_priority = kg->kg_user_pri; > mtx_unlock_spin(&sched_lock); > } > > but we don't actually take any action in the case where the thread is > heading out to userland with > a priority of less importance than a waiting thread. That happens in > AST() where we also set it down > but only in the case of TDF_NEEDRESCHED being set. > > it would make more sense to ALWAYS to the TDF_NEEDRESCHED clause, in > userret() > based on the user priority... i.e. the priority would be reduced going > to userland. > Unfortunatly this would stop one of the reasons to for priorityu > raisning in BSD. > > The priority of a thread that waits for IO is raised not only to make it > start again in the kernel > as an interactive thread, but also so that it can run into userland too > and get some priority > for actually USING the new data/input.. Thanks - I wasn't aware of this. Isn't there a high potential for abuse? A client/server programs constantly refreshing priority by waiting for requests/replies comes to mind. If a client/server pair constantly talks to each other they could eat a lot of cpu time. I have to think about this some more. Stephan From owner-freebsd-arch@FreeBSD.ORG Tue Sep 28 02:52:19 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B4C4F16A4CE for ; Tue, 28 Sep 2004 02:52:19 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 58C2A43D49 for ; Tue, 28 Sep 2004 02:52:19 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 18056 invoked by uid 89); 28 Sep 2004 02:52:18 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:52:18 -0000 Received: (qmail 18036 invoked by uid 89); 28 Sep 2004 02:52:18 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:52:18 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8S2qGmt006097; Mon, 27 Sep 2004 22:52:17 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <20040926075218.GA85983@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> Content-Type: multipart/mixed; boundary="=-rj4c4DOGe8hfA+4+RE6p" Message-Id: <1096339936.3733.279.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Mon, 27 Sep 2004 22:52:16 -0400 cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2004 02:52:19 -0000 --=-rj4c4DOGe8hfA+4+RE6p Content-Type: text/plain Content-Transfer-Encoding: 7bit On Sun, 2004-09-26 at 03:52, Peter Holm wrote: > On Sat, Sep 25, 2004 at 08:50:45PM -0700, Julian Elischer wrote: > > Stephan Uphoff wrote: > > > > >>Maybe something brutal like: > > >> if ((curthread->td_ksegrp == kg) && > > >> (td->td_priority > curthread->td_priority)) > > >> curthread->td_flags |= TDF_NEEDRESCHED; > > >> > > >>in setrunqueue for > > >>the else case of "if (kg->kg_avail_opennings > 0)" > > >>would do the trick (without preemption) for the easy but probably more > > >>common cases? > > >> > > >>Maybe I can find some time next week to think about a clean > > >>fix. I find it always helpful having a small task in mind while reading > > >>source code. > > > > > > > > >I wrote a fix that should cover all cases. > > >However I would like to test it a little bit before posting the patch. > > >Is there any multi-threaded kernel torture program that you can > > >recommend? > > > > > > Peter Holm (CC'd) has a really cool set of torture tests. > > he has also seen all sorts of failures others have not (yet) triggered. :-) > > > > I'm 'busy" for the next couple of weeks so you may want to communicate > > directly with him and see if you and he together can figure out some of the > > things he's > > been seeing :-) > > > > his tests are at: > > http://www.holm.cc/stress/src/stress.tgz > > > > > > > >Thanks > > > > > > Stephan > > > > > I'll be glad to test any patches. Great. Can you try the attached patch to see if it changes any of your previously observed behaviour? Thanks Stephan --=-rj4c4DOGe8hfA+4+RE6p Content-Disposition: attachment; filename=switch_patch Content-Type: text/x-patch; name=switch_patch; charset=ASCII Content-Transfer-Encoding: 7bit Index: sys/kern/kern_switch.c =================================================================== RCS file: /cvsroot/src/sys/kern/kern_switch.c,v retrieving revision 1.95 diff -u -r1.95 kern_switch.c --- sys/kern/kern_switch.c 19 Sep 2004 18:34:17 -0000 1.95 +++ sys/kern/kern_switch.c 28 Sep 2004 02:48:43 -0000 @@ -315,6 +315,94 @@ td->td_priority = newpri; setrunqueue(td, SRQ_BORING); } + + +/* + * This function is called when a thread is about to be put on a + * ksegrp run queue because it has been made runnable or its + * priority has been adjusted and the ksegrp does not have a + * free kse slot. It determines if a thread from the same ksegrp + * should be preempted. If so, it tries to switch threads + * if the thread is on the same cpu or notifies another cpu that + * it should switch threads. + */ + +static void +maybe_preempt_in_ksegrp(struct thread *td) +{ +#if defined(SMP) + int highest_pri; + struct ksegrp *kg; + cpumask_t cpumask,dontuse; + struct pcpu *pc; + struct pcpu *highest_pcpu; + + mtx_assert(&sched_lock, MA_OWNED); + +#if !defined(KSEG_PEEMPT_BEST_CPU) + if(curthread->td_ksegrp != td->td_ksegrp) +#endif + { + kg = td->td_ksegrp; + + /* Anyone waiting in front ? */ + if(td != TAILQ_FIRST(&kg->kg_runq)) { + return; /* Yes - wait your turn*/ + } + highest_pri = td->td_priority; + highest_pcpu = NULL; + dontuse = stopped_cpus | idle_cpus_mask; + + /* Find a cpu with the worst priority that runs at thread from the + * same ksegrp - if multiple exist give first the last run cpu and then + * the current cpu priority + */ + + SLIST_FOREACH(pc, &cpuhead, pc_allcpu) { + cpumask = pc->pc_cpumask; + if ( (cpumask & dontuse) == 0 && + pc->pc_curthread->td_ksegrp == kg) { + if (pc->pc_curthread->td_priority > highest_pri) { + highest_pri = pc->pc_curthread->td_priority; + highest_pcpu = pc; + } else if (pc->pc_curthread->td_priority == highest_pri && + highest_pcpu != NULL) { + if (td->td_lastcpu == pc->pc_cpuid || + (PCPU_GET(cpumask) == cpumask && + td->td_lastcpu != highest_pcpu->pc_cpuid)) { + highest_pcpu = pc; + } + } + } + } + + /* Check if we need to preempt someone */ + if (highest_pcpu == NULL) return; + + if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) { + pc->pc_curthread->td_flags |= TDF_NEEDRESCHED; + ipi_selected(highest_pcpu->pc_cpumask, IPI_AST); + return; + } + } +#else + KASSERT(curthread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread")); +#endif + + if (td->td_priority <= curthread->td_priority) + return; +#ifdef PREEMPTION + if (td->td_critnest > 1) { + td->td_pflags |= TDP_OWEPREEMPT; + } else { + mi_switch(SW_INVOL, NULL); + } +#else + curthread->td_flags |= TDF_NEEDRESCHED; +#endif + return; +} + int limitcount; void setrunqueue(struct thread *td, int flags) @@ -422,6 +510,7 @@ } else { CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d", td, td->td_ksegrp, td->td_proc->p_pid); + maybe_preempt_in_ksegrp(td); } } --=-rj4c4DOGe8hfA+4+RE6p-- From owner-freebsd-arch@FreeBSD.ORG Tue Sep 28 07:49:29 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA26716A4CE for ; Tue, 28 Sep 2004 07:49:29 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 571C543D31 for ; Tue, 28 Sep 2004 07:49:29 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 44107 invoked from network); 28 Sep 2004 07:49:27 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 28 Sep 2004 07:49:27 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8S7nRCs099983; Tue, 28 Sep 2004 09:49:27 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i8S7nQxS099982; Tue, 28 Sep 2004 09:49:26 +0200 (CEST) (envelope-from pho) Date: Tue, 28 Sep 2004 09:49:26 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20040928074926.GA99957@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <41563C95.2020501@elischer.org> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1096339936.3733.279.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2004 07:49:29 -0000 On Mon, Sep 27, 2004 at 10:52:16PM -0400, Stephan Uphoff wrote: > On Sun, 2004-09-26 at 03:52, Peter Holm wrote: > > On Sat, Sep 25, 2004 at 08:50:45PM -0700, Julian Elischer wrote: > > > Stephan Uphoff wrote: > > > > > > >>Maybe something brutal like: > > > >> if ((curthread->td_ksegrp == kg) && > > > >> (td->td_priority > curthread->td_priority)) > > > >> curthread->td_flags |= TDF_NEEDRESCHED; > > > >> > > > >>in setrunqueue for > > > >>the else case of "if (kg->kg_avail_opennings > 0)" > > > >>would do the trick (without preemption) for the easy but probably more > > > >>common cases? > > > >> > > > >>Maybe I can find some time next week to think about a clean > > > >>fix. I find it always helpful having a small task in mind while reading > > > >>source code. > > > > > > > > > > > >I wrote a fix that should cover all cases. > > > >However I would like to test it a little bit before posting the patch. > > > >Is there any multi-threaded kernel torture program that you can > > > >recommend? > > > > > > > > > Peter Holm (CC'd) has a really cool set of torture tests. > > > he has also seen all sorts of failures others have not (yet) triggered. :-) > > > > > > I'm 'busy" for the next couple of weeks so you may want to communicate > > > directly with him and see if you and he together can figure out some of the > > > things he's > > > been seeing :-) > > > > > > his tests are at: > > > http://www.holm.cc/stress/src/stress.tgz > > > > > > > > > > >Thanks > > > > > > > > Stephan > > > > > > > > I'll be glad to test any patches. > > Great. > Can you try the attached patch to see if it changes any of your > previously observed behaviour? > The system still freezes and can be unfrozen by a ping: http://www.holm.cc/stress/log/stephan.html -- Peter Holm From owner-freebsd-arch@FreeBSD.ORG Tue Sep 28 14:52:02 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AF7D916A4CE for ; Tue, 28 Sep 2004 14:52:02 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 31A2843D55 for ; Tue, 28 Sep 2004 14:52:02 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 18039 invoked by uid 89); 28 Sep 2004 14:51:53 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 28 Sep 2004 14:51:53 -0000 Received: (qmail 17736 invoked by uid 89); 28 Sep 2004 14:51:45 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 28 Sep 2004 14:51:45 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8SEphmt009097; Tue, 28 Sep 2004 10:51:43 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <20040928074926.GA99957@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> Content-Type: text/plain Message-Id: <1096383103.3733.312.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Tue, 28 Sep 2004 10:51:43 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2004 14:52:02 -0000 On Tue, 2004-09-28 at 03:49, Peter Holm wrote: > The system still freezes and can be unfrozen by a ping: > http://www.holm.cc/stress/log/stephan.html Could you try the sched_userret_patch in addition to the switch_patch? ( Patch is in email to arch on 25 September with the subject "sched_userret priority adjustment patch for sched_4bsd" Are you running with the current GENERIC configuration? ( If not could you send me your config file?) My debug target is currently diskless - I will reconfigure it later this week and hopefully will be able to reproduce your freezes. Thanks Stephan From owner-freebsd-arch@FreeBSD.ORG Tue Sep 28 15:43:21 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 692CA16A4CE for ; Tue, 28 Sep 2004 15:43:21 +0000 (GMT) Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2B10B43D54 for ; Tue, 28 Sep 2004 15:43:21 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 27103 invoked from network); 28 Sep 2004 15:43:20 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 28 Sep 2004 15:43:19 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8SFhF2M002165; Tue, 28 Sep 2004 11:43:15 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Tue, 28 Sep 2004 10:56:00 -0400 User-Agent: KMail/1.6.2 References: <1096133353.53798.17613.camel@palm.tree.com> <200409271443.22667.jhb@FreeBSD.org> <1096320486.3733.58.camel@palm.tree.com> In-Reply-To: <1096320486.3733.58.camel@palm.tree.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409281056.00870.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2004 15:43:21 -0000 On Monday 27 September 2004 05:28 pm, Stephan Uphoff wrote: > On Mon, 2004-09-27 at 14:43, John Baldwin wrote: > > On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote: > > > On Mon, 2004-09-27 at 10:16, John Baldwin wrote: > > > > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote: > > > > > When a thread is about to return to user space it resets its > > > > > priority to the user level priority. > > > > > However after lowering the permission its priority it needs to > > > > > check if its priority is still better than all other runable > > > > > threads. This is currently not implemented. > > > > > Without the check the thread can block kernel or user threads with > > > > > better priority until a switch is forced by by an interrupt. > > > > > > > > > > The attached patch checks the relevant runqueues and threads > > > > > without slots in the same ksegrp and forces a thread switch if the > > > > > currently running thread is no longer the best thread to run after > > > > > it changed its priority. > > > > > > > > > > The patch should improve interactive response under heavy load > > > > > somewhat. It needs a lot of testing. > > > > > > > > Perhaps the better fix is to teach the schedulers to set > > > > TDF_NEEDRESCHED based on on a comparison against user_pri rather than > > > > td_priority inside of sched_add()? Having the flag set by > > > > sched_add() is supposed to make this sort of check unnecessary. Even > > > > 4.x has the same bug I think as a process can make another process > > > > runnable after it's priority has been boosted by a tsleep() and > > > > need_resched() is only called based on a comparison of p_pri. Ah, 4.x > > > > doesn't have the bug because it caches the priority of curproc when > > > > it enters the kernel and compares against that. Thus, I think the > > > > correct fix is more like this: > > > > > > > > Index: sched_4bsd.c > > > > =================================================================== > > > > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v > > > > retrieving revision 1.63 > > > > diff -u -r1.63 sched_4bsd.c > > > > --- sched_4bsd.c 11 Sep 2004 10:07:22 -0000 1.63 > > > > +++ sched_4bsd.c 27 Sep 2004 14:12:03 -0000 > > > > @@ -272,7 +272,7 @@ > > > > { > > > > > > > > mtx_assert(&sched_lock, MA_OWNED); > > > > - if (td->td_priority < curthread->td_priority) > > > > + if (td->td_priority < curthread->td_ksegrp->kg_user_pri) > > > > curthread->td_flags |= TDF_NEEDRESCHED; > > > > } > > > > > > > > Index: sched_ule.c > > > > =================================================================== > > > > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v > > > > retrieving revision 1.129 > > > > diff -u -r1.129 sched_ule.c > > > > --- sched_ule.c 11 Sep 2004 10:07:22 -0000 1.129 > > > > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000 > > > > @@ -723,7 +723,7 @@ > > > > */ > > > > pcpu = pcpu_find(cpu); > > > > td = pcpu->pc_curthread; > > > > - if (ke->ke_thread->td_priority < td->td_priority || > > > > + if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri > > > > || td == pcpu->pc_idlethread) { > > > > td->td_flags |= TDF_NEEDRESCHED; > > > > ipi_selected(1 << cpu, IPI_AST); > > > > > > > > An even better fix might be to fix td_base_pri by having it be set on > > > > kernel entry similar to how 4.x sets curpriority. The above fix > > > > should be sufficient for now, however. > > > > > > I don't think that this is enough since TDF_NEEDRESCHED is thread > > > specific and not cpu specific. > > > > Hmm, it is CPU specific in 4.x. It could be changed back to being a > > per-cpu flag easily. > > But this might not help :-(. > Example: > > Thread A is running in the kernel and is preempted by an interrupt > Thread I. Thread I wakes up thread B. > If I->td_ksegrp->kg_user_pri <= B->td_priority TDF_NEEDRESCHED will not > be set. > If A->td_priority < B->td_priority thread A will run once I is finished > serving interrupts. > Thread A can now leave the kernel also A->td_ksegrp->kg_user_pri > > B->td_priority may be true. If A has a priority boost from tsleep() this is intentional, however. The priroity boosts from tsleep() are _supposed_ to do this so as to favor interactive tasks. Note that if you add the code to always raise td_priority while in the kernel as below you may end up defeating this well-known feature of the 4BSD scheduler. > > > However the thread marked with TDF_NEEDRESCHED might not be the next > > > thread leaving the kernel. > > > ( Can't really talk about ULE since I am trying to avoid looking at > > > another shiny irresistible time sink this week ;-) > > > > > > I think we agree that that td_priority should be set to td_base_pri on > > > kernel entry. Since td_base_pri is changed by sleep and condvar > > > functions it should also be reset on kernel entry. (Probably from a new > > > ksegrp field). Condvar waits should currently non cause the base > > > priority to change to the current priority of the thread - otherwise > > > td_base_pri could get stuck at a really bad user priority. > > > ( td->td_base_pri might end up being worse than > > > td->td_ksegrp->kg_user_pri when the ksegrp priority improves) > > > > Well, I think instead that td_base_pri should be set to td_priority on > > kernel entry (rather than the other way around). td_priority should be > > unchanged just because it enters the kernel. > > I guess we disagree here. > There are just to many resource dependencies in the kernel that can lead > to priority inversion (vnode locks, disk buffer ownership, etc). > It would be nice to delay the priority boost until a thread acquires > such a resource (or even trace resource dependencies and implement > priority inheritance) ... but this would be a huge task. > Boosting the priority on kernel entry is easy and less error prone. I > guess we had this discussion last week and we just disagree on the > issue. Well, I think you don't understand exactly what td_base_pri is supposed to do. :) If you want to boost td_priority on kernel entry that is fine, but is completely orthogonal to this discussion. If you wanted to that then you might do something like this: kernel_entry() { if (td->td_priority > PRI_KERN_MAX) sched_prio(td, PRI_KERN_MAX); td_base_pri = td->td_priority; } i.e., just add the boost to the kernel_entry function. > > I think the sleep functions could then > > leave td_base_pri alone. (I think setting it there is wrong because > > td_base_pri is not quite the same as curpriority in 4.x.) What > > td_base_pri is really supposed to provide, btw, is the priority that the > > thread should go back to once it has unlocked a mutex and had its priorty > > boosted while it held the mutex. Arguably it should just be using > > kg_user_pri for this, but then you loose priority "boosts" from tsleep(), > > which is why td_base_pri is set in msleep(). I guess what should happen > > is something more like this: > > > > kernel_entry() > > { > > KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri); > > td->td_base_pri = td->td_priority; > > } > > > > msleep() > > { > > sched_prio(...); > > td_base_pri = td->td_priority; > > } > > > > The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous > > e-mail. Also, in sched_prio(), if our priority is ever raised > > (numerically, logically less important), we should set TDF_NEEDRESCHED > > since we may need to switch (4.x does this in maybe_needresched()). > > Then, TDF_NEEDRESCHED could become a per-cpu flag and have it not be > > cleared in mi_switch() but be cleared only in userret(). Hmm, I think > > all of the TDF_NEEDRESCHED handling actually beings in sched_userret() > > btw. > > Wouldn't this lead to unnecessary round-robin switches between threads > with the same priority on sched_4bsd? It can on 4.x then. The idea is that the occasional spurious context switch is cheaper than doing a lot of work on each kernel exit. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 08:57:52 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 566BC16A4CE for ; Wed, 29 Sep 2004 08:57:52 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id DE83643D2F for ; Wed, 29 Sep 2004 08:57:51 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 93406 invoked from network); 29 Sep 2004 08:57:50 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 29 Sep 2004 08:57:50 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8T8vnCs019885; Wed, 29 Sep 2004 10:57:49 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i8T8vmjA019884; Wed, 29 Sep 2004 10:57:48 +0200 (CEST) (envelope-from pho) Date: Wed, 29 Sep 2004 10:57:48 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20040929085748.GA19695@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <41563C95.2020501@elischer.org> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> <1096383103.3733.312.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1096383103.3733.312.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 08:57:52 -0000 On Tue, Sep 28, 2004 at 10:51:43AM -0400, Stephan Uphoff wrote: > On Tue, 2004-09-28 at 03:49, Peter Holm wrote: > > The system still freezes and can be unfrozen by a ping: > > http://www.holm.cc/stress/log/stephan.html > > Could you try the sched_userret_patch in addition to the switch_patch? > ( Patch is in email to arch on 25 September with the subject > "sched_userret priority adjustment patch for sched_4bsd" > Done. > Are you running with the current GENERIC configuration? > ( If not could you send me your config file?) > Yes. GENERIC + BREAK_TO_DEBUGGER. > My debug target is currently diskless - I will reconfigure it later this > week and hopefully will be able to reproduce your freezes. > > Thanks > > Stephan Sorry for the late reply, but I ran into a few other problems along the way: http://www.holm.cc/stress/log/cons79.html http://www.holm.cc/stress/log/cons80.html It's hard for me to tell if your patch has made any difference. The freeze is still there. I'll try to make the same test once more without your patches to see if I get the same pattern in freezes. - Peter From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 14:24:10 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6007D16A4CE for ; Wed, 29 Sep 2004 14:24:10 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id EA70843D46 for ; Wed, 29 Sep 2004 14:24:09 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 27716 invoked by uid 89); 29 Sep 2004 14:24:06 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:24:06 -0000 Received: (qmail 27698 invoked by uid 89); 29 Sep 2004 14:24:05 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:24:05 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TEO3mt015217; Wed, 29 Sep 2004 10:24:03 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <20040929085748.GA19695@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> <1096383103.3733.312.camel@palm.tree.com> <20040929085748.GA19695@peter.osted.lan> Content-Type: text/plain Message-Id: <1096467843.3733.1145.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Wed, 29 Sep 2004 10:24:03 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 14:24:10 -0000 On Wed, 2004-09-29 at 04:57, Peter Holm wrote: > On Tue, Sep 28, 2004 at 10:51:43AM -0400, Stephan Uphoff wrote: > > On Tue, 2004-09-28 at 03:49, Peter Holm wrote: > > > The system still freezes and can be unfrozen by a ping: > > > http://www.holm.cc/stress/log/stephan.html > > > > Could you try the sched_userret_patch in addition to the switch_patch? > > ( Patch is in email to arch on 25 September with the subject > > "sched_userret priority adjustment patch for sched_4bsd" > > > > Done. > > > Are you running with the current GENERIC configuration? > > ( If not could you send me your config file?) > > > > Yes. GENERIC + BREAK_TO_DEBUGGER. > > > My debug target is currently diskless - I will reconfigure it later this > > week and hopefully will be able to reproduce your freezes. > > > > Thanks > > > > Stephan > > Sorry for the late reply, but I ran into a few other problems along the way: Late reply? !!! What late reply? Are you trying to scare me? ;-) > http://www.holm.cc/stress/log/cons79.html > http://www.holm.cc/stress/log/cons80.html > > It's hard for me to tell if your patch has made any difference. > The freeze is still there. I'll try to make the same test once more > without your patches to see if I get the same pattern in freezes. I found some problems yesterday with mutex priority inheritance that could potentially cause your freeze patterns. I will try to roll a preliminary patch as soon as the caffeine does its magic. Stephan From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 14:39:01 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 839DA16A4CF for ; Wed, 29 Sep 2004 14:39:01 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id E793643D48 for ; Wed, 29 Sep 2004 14:39:00 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 5008 invoked by uid 89); 29 Sep 2004 14:38:56 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:38:56 -0000 Received: (qmail 4937 invoked by uid 89); 29 Sep 2004 14:38:55 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:38:55 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TEcsmt015285; Wed, 29 Sep 2004 10:38:54 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <200409281056.00870.jhb@FreeBSD.org> References: <1096133353.53798.17613.camel@palm.tree.com> <200409271443.22667.jhb@FreeBSD.org> <1096320486.3733.58.camel@palm.tree.com> <200409281056.00870.jhb@FreeBSD.org> Content-Type: text/plain Message-Id: <1096468734.3733.1177.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Wed, 29 Sep 2004 10:38:54 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 14:39:01 -0000 On Tue, 2004-09-28 at 10:56, John Baldwin wrote: > If A has a priority boost from tsleep() this is intentional, however. The > priroity boosts from tsleep() are _supposed_ to do this so as to favor > interactive tasks. Note that if you add the code to always raise td_priority > while in the kernel as below you may end up defeating this well-known feature > of the 4BSD scheduler. OK - you and Julian convinced me that this is a feature that I should have known about. Without test cases or interactivity benchmarks discussions if this is still a desirable feature are probably useless. I will revisit the this once test cases materialize or I have time to think about a benchmark (Not likely anytime soon). Stephan From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 17:12:16 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4CA7516A4CE for ; Wed, 29 Sep 2004 17:12:16 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id B1B7E43D46 for ; Wed, 29 Sep 2004 17:12:15 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 7868 invoked by uid 89); 29 Sep 2004 17:12:14 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 29 Sep 2004 17:12:14 -0000 Received: (qmail 7842 invoked by uid 89); 29 Sep 2004 17:12:14 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 29 Sep 2004 17:12:14 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8THCDmt015815; Wed, 29 Sep 2004 13:12:13 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <1096467843.3733.1145.camel@palm.tree.com> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> <1096383103.3733.312.camel@palm.tree.com> <20040929085748.GA19695@peter.osted.lan> <1096467843.3733.1145.camel@palm.tree.com> Content-Type: text/plain Message-Id: <1096477932.3733.1471.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Wed, 29 Sep 2004 13:12:13 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 17:12:16 -0000 On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote: > On Wed, 2004-09-29 at 04:57, Peter Holm wrote: > > It's hard for me to tell if your patch has made any difference. > > The freeze is still there. I'll try to make the same test once more > > without your patches to see if I get the same pattern in freezes. > > I found some problems yesterday with mutex priority inheritance that > could potentially cause your freeze patterns. > > I will try to roll a preliminary patch as soon as the caffeine does its > magic. OK - here is a crude patch to fix some problems with mutex priority inheritance. My theory is that the clock thread gets stuck waiting on GIANT. During release/acquisition of a contested sleep mutex there are a few windows where a task can be preempted when actions (waking up blocked threads, ownership of the mutex, ..) need to be atomic as far as scheduling is concerned. Otherwise priority inheritance may fail. The patch uses critical_enter/critical_exit to protect these regions against preemption. It would be great if could run this in addition to the other patches. Stephan From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 19:50:14 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F45616A4CE for ; Wed, 29 Sep 2004 19:50:14 +0000 (GMT) Received: from mail.gmx.net (mail.gmx.de [213.165.64.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 4C75243D46 for ; Wed, 29 Sep 2004 19:50:13 +0000 (GMT) (envelope-from idontknowmyself@gmx.net) Received: (qmail 12598 invoked by uid 65534); 29 Sep 2004 19:50:12 -0000 Received: from p5080C7E6.dip0.t-ipconnect.de (EHLO chaos) (80.128.199.230) by mail.gmx.net (mp019) with SMTP; 29 Sep 2004 21:50:12 +0200 X-Authenticated: #17701688 To: freebsd-arch@freebsd.org Date: Wed, 29 Sep 2004 21:53:01 +0200 From: idontknowmyself@gmx.net Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-15 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: User-Agent: Opera M2/7.54 (Win32, build 3865) Subject: freebsd on mvme147 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 19:50:14 -0000 hi there could you tell me if there is any freebsd port for motorola mvme147 processor architectures? i would be very pleased to know that thank you oskar roeding From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 20:11:02 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F85A16A4CE for ; Wed, 29 Sep 2004 20:11:02 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4C78B43D48 for ; Wed, 29 Sep 2004 20:11:02 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 28384 invoked from network); 29 Sep 2004 20:11:02 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 29 Sep 2004 20:10:59 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8TKAukO012092; Wed, 29 Sep 2004 16:10:56 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Wed, 29 Sep 2004 10:55:47 -0400 User-Agent: KMail/1.6.2 References: <1096133353.53798.17613.camel@palm.tree.com> <200409281056.00870.jhb@FreeBSD.org> <1096468734.3733.1177.camel@palm.tree.com> In-Reply-To: <1096468734.3733.1177.camel@palm.tree.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409291055.48387.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 20:11:02 -0000 On Wednesday 29 September 2004 10:38 am, Stephan Uphoff wrote: > On Tue, 2004-09-28 at 10:56, John Baldwin wrote: > > If A has a priority boost from tsleep() this is intentional, however. > > The priroity boosts from tsleep() are _supposed_ to do this so as to > > favor interactive tasks. Note that if you add the code to always raise > > td_priority while in the kernel as below you may end up defeating this > > well-known feature of the 4BSD scheduler. > > OK - you and Julian convinced me that this is a feature that I should > have known about. Without test cases or interactivity benchmarks > discussions if this is still a desirable feature are probably useless. > I will revisit the this once test cases materialize or I have time to > think about a benchmark (Not likely anytime soon). That's ok. This discussion has been very fruitful on my end at least as talking this out has helped me get a much better grasp on how this stuff works on 4.x and should be done in 5.x to obtain at least somewhat similar behavior. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 20:26:09 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1856516A4CE for ; Wed, 29 Sep 2004 20:26:09 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 06CC743D53 for ; Wed, 29 Sep 2004 20:26:09 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id E17637A446; Wed, 29 Sep 2004 13:26:08 -0700 (PDT) Message-ID: <415B1A60.5040306@elischer.org> Date: Wed, 29 Sep 2004 13:26:08 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: idontknowmyself@gmx.net References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-arch@freebsd.org Subject: Re: freebsd on mvme147 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 20:26:09 -0000 idontknowmyself@gmx.net wrote: > hi there > could you tell me if there is any freebsd port for motorola mvme147 > processor architectures? NetBSD (www.netbsd.org) has it.. > > i would be very pleased to know that > thank you > oskar roeding > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 20:26:19 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 96B5B16A4CE for ; Wed, 29 Sep 2004 20:26:19 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 2D3A443D2F for ; Wed, 29 Sep 2004 20:26:19 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 10052 invoked by uid 89); 29 Sep 2004 20:26:18 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 29 Sep 2004 20:26:18 -0000 Received: (qmail 10028 invoked by uid 89); 29 Sep 2004 20:26:17 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 29 Sep 2004 20:26:17 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TKQGmt017003; Wed, 29 Sep 2004 16:26:16 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <1096477932.3733.1471.camel@palm.tree.com> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> <1096383103.3733.312.camel@palm.tree.com> <20040929085748.GA19695@peter.osted.lan> <1096467843.3733.1145.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> Content-Type: multipart/mixed; boundary="=-MHoXvfgwm/AM79gvF4x9" Message-Id: <1096489576.3733.1868.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Wed, 29 Sep 2004 16:26:16 -0400 cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 20:26:19 -0000 --=-MHoXvfgwm/AM79gvF4x9 Content-Type: text/plain Content-Transfer-Encoding: 7bit Forgot to attach the patch ... Stephan On Wed, 2004-09-29 at 13:12, Stephan Uphoff wrote: > On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote: > > On Wed, 2004-09-29 at 04:57, Peter Holm wrote: > > > It's hard for me to tell if your patch has made any difference. > > > The freeze is still there. I'll try to make the same test once more > > > without your patches to see if I get the same pattern in freezes. > > > > I found some problems yesterday with mutex priority inheritance that > > could potentially cause your freeze patterns. > > > > I will try to roll a preliminary patch as soon as the caffeine does its > > magic. > > OK - here is a crude patch to fix some problems with mutex priority > inheritance. My theory is that the clock thread gets stuck waiting on > GIANT. > > During release/acquisition of a contested sleep mutex there are a few > windows where a task can be preempted when actions (waking up blocked > threads, ownership of the mutex, ..) need to be atomic as far as > scheduling is concerned. Otherwise priority inheritance may fail. The > patch uses critical_enter/critical_exit to protect these regions against > preemption. > > It would be great if could run this in addition to the other patches. > > Stephan --=-MHoXvfgwm/AM79gvF4x9 Content-Disposition: attachment; filename=mutex_patch Content-Type: text/x-patch; name=mutex_patch; charset=ASCII Content-Transfer-Encoding: 7bit Index: kern_mutex.c =================================================================== RCS file: /cvsroot/src/sys/kern/kern_mutex.c,v retrieving revision 1.149 diff -u -r1.149 kern_mutex.c --- kern_mutex.c 2 Sep 2004 18:59:15 -0000 1.149 +++ kern_mutex.c 29 Sep 2004 16:50:36 -0000 @@ -492,7 +492,9 @@ if (v == MTX_CONTESTED) { MPASS(ts != NULL); m->mtx_lock = (uintptr_t)td | MTX_CONTESTED; + critical_enter(); turnstile_claim(ts); + critical_exit(); break; } #endif @@ -651,6 +653,9 @@ #else MPASS(ts != NULL); #endif + + critical_enter(); + #ifndef PREEMPTION /* XXX */ td1 = turnstile_head(ts); @@ -671,6 +676,7 @@ } #endif turnstile_unpend(ts); + critical_exit(); #ifndef PREEMPTION /* --=-MHoXvfgwm/AM79gvF4x9-- From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 20:40:58 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0FD7416A4CE; Wed, 29 Sep 2004 20:40:58 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id E33D343D45; Wed, 29 Sep 2004 20:40:57 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id B7A877A446; Wed, 29 Sep 2004 13:40:57 -0700 (PDT) Message-ID: <415B1DD9.2050409@elischer.org> Date: Wed, 29 Sep 2004 13:40:57 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: John Baldwin References: <1096133353.53798.17613.camel@palm.tree.com> <200409281056.00870.jhb@FreeBSD.org> <1096468734.3733.1177.camel@palm.tree.com> <200409291055.48387.jhb@FreeBSD.org> In-Reply-To: <200409291055.48387.jhb@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Stephan Uphoff cc: David Xu cc: freebsd-arch@FreeBSD.org Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 20:40:58 -0000 John Baldwin wrote: >On Wednesday 29 September 2004 10:38 am, Stephan Uphoff wrote: > > >>On Tue, 2004-09-28 at 10:56, John Baldwin wrote: >> >> >>>If A has a priority boost from tsleep() this is intentional, however. >>>The priroity boosts from tsleep() are _supposed_ to do this so as to >>>favor interactive tasks. Note that if you add the code to always raise >>>td_priority while in the kernel as below you may end up defeating this >>>well-known feature of the 4BSD scheduler. >>> >>> >>OK - you and Julian convinced me that this is a feature that I should >>have known about. Without test cases or interactivity benchmarks >>discussions if this is still a desirable feature are probably useless. >>I will revisit the this once test cases materialize or I have time to >>think about a benchmark (Not likely anytime soon). >> >> > >That's ok. This discussion has been very fruitful on my end at least as >talking this out has helped me get a much better grasp on how this stuff >works on 4.x and should be done in 5.x to obtain at least somewhat similar >behavior. > well if you've worked it out,.. do let the rest of us know :-) I do think that there are several points that need work.. 1/ kse threads are ephemeral, and so they don't gather any 'history'. therefore it needs to be gathered somewher eelse.. (e.g. the ksegrp, but what does that actually mean?) 2/ what if the kg has both long-running and interractive threads? 3/ sibling thread affinity and how that affects priority and scheduling. We COULD store information in the mailbox.. but then we need to trust the user with it.. So then where do we store it? I have considerrred a store of 'cached' and "hashed" (like the buffer cache) sched-info structs that are recycled in a least-recently used manner.. when you get a thread with a mailbox you look for a sched-stats block corresponding with that mailbox address and use it.. if yu don't find it then you know that thread has not run for a long time.. so you grab the least-recently used one and recycle it as that thread hasn't run for a while. Basically the kernel could keep stats on behalf of the most active KSE threads in an efficient manner. The small stats structs would need to be only about 8 words.. (4 for 2 x doubel links. one for mailbox addr/key, and 3 for sched stats.) In effect the kernel keeps tabs on the most active user threads without the UTS knowing about it. From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 20:58:32 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 66A3716A4CE for ; Wed, 29 Sep 2004 20:58:32 +0000 (GMT) Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3797D43D39 for ; Wed, 29 Sep 2004 20:58:32 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6873 invoked from network); 29 Sep 2004 20:58:31 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 29 Sep 2004 20:58:31 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8TKwRWq012425; Wed, 29 Sep 2004 16:58:28 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Wed, 29 Sep 2004 16:52:29 -0400 User-Agent: KMail/1.6.2 References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> In-Reply-To: <1096489576.3733.1868.camel@palm.tree.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409291652.29990.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Peter Holm cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 20:58:32 -0000 On Wednesday 29 September 2004 04:26 pm, Stephan Uphoff wrote: > Forgot to attach the patch ... > > Stephan > > On Wed, 2004-09-29 at 13:12, Stephan Uphoff wrote: > > On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote: > > > On Wed, 2004-09-29 at 04:57, Peter Holm wrote: > > > > It's hard for me to tell if your patch has made any difference. > > > > The freeze is still there. I'll try to make the same test once more > > > > without your patches to see if I get the same pattern in freezes. > > > > > > I found some problems yesterday with mutex priority inheritance that > > > could potentially cause your freeze patterns. > > > > > > I will try to roll a preliminary patch as soon as the caffeine does its > > > magic. > > > > OK - here is a crude patch to fix some problems with mutex priority > > inheritance. My theory is that the clock thread gets stuck waiting on > > GIANT. > > > > During release/acquisition of a contested sleep mutex there are a few > > windows where a task can be preempted when actions (waking up blocked > > threads, ownership of the mutex, ..) need to be atomic as far as > > scheduling is concerned. Otherwise priority inheritance may fail. The > > patch uses critical_enter/critical_exit to protect these regions against > > preemption. > > > > It would be great if could run this in addition to the other patches. turnstile_claim() doesn't make any threads runnable and thus can't preempt. The other place is supposed to preempt, and it should be ok to do so. Note that since the turnstile chain lock is held, that includes a nested critical section and any preemption will be deferred until the turnstile lock is released via turnstile_release which happens in the middle of turnstile_unpend() after it has finished building a list of all the threads to be made runnable so that the turnstile object can be re-used safely. I don't think this patch will make much of a difference (if any). Can you provide a description of a case where you think the priority inheritance can fail if turnstile_unpend() doesn't run in a nested critical section? -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 21:00:44 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EAE4D16A4CE; Wed, 29 Sep 2004 21:00:44 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B34F43D58; Wed, 29 Sep 2004 21:00:43 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i8TL0aNl017870; Wed, 29 Sep 2004 17:00:36 -0400 (EDT) Date: Wed, 29 Sep 2004 17:00:36 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Julian Elischer In-Reply-To: <415B1DD9.2050409@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: freebsd-arch@freebsd.org cc: David Xu cc: Stephan Uphoff Subject: Re: sched_userret priority adjustment patch for sched_4bsd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 21:00:45 -0000 On Wed, 29 Sep 2004, Julian Elischer wrote: > > > John Baldwin wrote: > > > > >That's ok. This discussion has been very fruitful on my end at least as > >talking this out has helped me get a much better grasp on how this stuff > >works on 4.x and should be done in 5.x to obtain at least somewhat similar > >behavior. > > > > > well if you've worked it out,.. do let the rest of us know :-) > > I do think that there are several points that need work.. > 1/ kse threads are ephemeral, and so they don't gather any 'history'. > therefore it needs to be gathered somewher eelse.. (e.g. the ksegrp, > but what does that actually mean?) > 2/ what if the kg has both long-running and interractive threads? > 3/ sibling thread affinity and how that affects priority and scheduling. > > > We COULD store information in the mailbox.. > but then we need to trust the user with it.. > So then where do we store it? > > I have considerrred a store of 'cached' and "hashed" (like the buffer > cache) sched-info structs that are recycled > in a least-recently used manner.. when you get a thread with a mailbox > you look for a sched-stats block > corresponding with that mailbox address and use it.. > if yu don't find it then you know that thread has not run for a long time.. > so you grab the least-recently used one and recycle it as that thread > hasn't run for a while. > Basically the kernel could keep stats on behalf of the most active KSE > threads in an efficient manner. > The small stats structs would need to be only about 8 words.. > (4 for 2 x doubel links. one for mailbox addr/key, and 3 for sched stats.) > In effect the kernel keeps tabs on the most active user threads without > the UTS knowing about it. Remember that the UTS (IAW POSIX) should be in charge of which threads run _within_ a process. Across processes, and for system scope threads, that's another story. I think it would be cool if the UTS could store its version of priority in the thread mailbox, and the kernel would use this as a hint for which threads should get worked on when blocked in the kernel. For instance, if a thread is currently running with high priority and it makes a system call, that's a chance for the kernel to continue other blocked threads. But if the other blocked threads are all of lower (UTS) priority, you might not want to continue them (or upcall) when the currently running thread has a higher priority. -- Dan Eischen From owner-freebsd-arch@FreeBSD.ORG Wed Sep 29 22:14:37 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5118C16A4CE for ; Wed, 29 Sep 2004 22:14:37 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id A96AC43D2F for ; Wed, 29 Sep 2004 22:14:36 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 22904 invoked by uid 89); 29 Sep 2004 22:14:34 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 29 Sep 2004 22:14:34 -0000 Received: (qmail 22231 invoked by uid 89); 29 Sep 2004 22:14:19 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 29 Sep 2004 22:14:19 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TMEHmt017471; Wed, 29 Sep 2004 18:14:18 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <200409291652.29990.jhb@FreeBSD.org> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> Content-Type: text/plain Message-Id: <1096496057.3733.2163.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Wed, 29 Sep 2004 18:14:17 -0400 Content-Transfer-Encoding: 7bit cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Sep 2004 22:14:37 -0000 On Wed, 2004-09-29 at 16:52, John Baldwin wrote: > > > OK - here is a crude patch to fix some problems with mutex priority > > > inheritance. My theory is that the clock thread gets stuck waiting on > > > GIANT. > > > > > > During release/acquisition of a contested sleep mutex there are a few > > > windows where a task can be preempted when actions (waking up blocked > > > threads, ownership of the mutex, ..) need to be atomic as far as > > > scheduling is concerned. Otherwise priority inheritance may fail. The > > > patch uses critical_enter/critical_exit to protect these regions against > > > preemption. > > > > > > It would be great if could run this in addition to the other patches. > > turnstile_claim() doesn't make any threads runnable and thus can't preempt. > The other place is supposed to preempt, and it should be ok to do so. Note > that since the turnstile chain lock is held, that includes a nested critical > section and any preemption will be deferred until the turnstile lock is > released via turnstile_release which happens in the middle of > turnstile_unpend() after it has finished building a list of all the threads > to be made runnable so that the turnstile object can be re-used safely. I > don't think this patch will make much of a difference (if any). Can you > provide a description of a case where you think the priority inheritance can > fail if turnstile_unpend() doesn't run in a nested critical section? This is a bit of a mind bender. I hope you have some aspirins close by ;-) Thread A holds a mutex x contested by Thread B and has priority pri(A). Thread B holds a mutex y. There is a thread C with priority pri(C) with pri(C) < pri(A). Thread A is in the process of releasing x. It removes thread B from the turnstile and holds a pointer to B in a private list. Thread A sets the owner of the turnstile to NULL and releases all spin locks. ( mtx_unlock_spin(&tc->tc_lock); line 148) This means interrupts are now enabled. An interrupt occurs (or is already pending) and the interrupt handler puts the associated interrupt thread I on the run queue. This causes a preemption from A to I. The interrupt thread I tries to acquire mutex y owned by B and blocks. I donates its priority to B - but inheritance stops at B. The next thread with the best priority is C and the cpu switches to C. However B needs A to run to make it to the run-queue. If y is GIANT and I is the clock thread C could run forever in userspace without being interrupted. There is another scenario that does not require an interrupt (preemption in setrunqueue(td, SRQ_BORING), two blocked threads ...). I was looking at the MUTEX_WAKE_ALL undefined case when I used the critical section for turnstile_claim(). However there are bigger problems with MUTEX_WAKE_ALL undefined so you are right - the critical section for turnstile_claim is pretty useless. Stephan From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 07:58:04 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 51A7916A4CE for ; Thu, 30 Sep 2004 07:58:04 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id DBB3743D4C for ; Thu, 30 Sep 2004 07:58:03 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 84475 invoked from network); 30 Sep 2004 07:58:01 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 30 Sep 2004 07:58:01 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8U7w0Cs052399; Thu, 30 Sep 2004 09:58:00 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i8U7w0Va052398; Thu, 30 Sep 2004 09:58:00 +0200 (CEST) (envelope-from pho) Date: Thu, 30 Sep 2004 09:57:59 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20040930075759.GA52233@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> <1096383103.3733.312.camel@palm.tree.com> <20040929085748.GA19695@peter.osted.lan> <1096467843.3733.1145.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Kj7319i9nmIyA2yE" Content-Disposition: inline In-Reply-To: <1096477932.3733.1471.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 07:58:04 -0000 --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Sep 29, 2004 at 01:12:13PM -0400, Stephan Uphoff wrote: > On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote: > > On Wed, 2004-09-29 at 04:57, Peter Holm wrote: > > > It's hard for me to tell if your patch has made any difference. > > > The freeze is still there. I'll try to make the same test once more > > > without your patches to see if I get the same pattern in freezes. > > > > I found some problems yesterday with mutex priority inheritance that > > could potentially cause your freeze patterns. > > > > I will try to roll a preliminary patch as soon as the caffeine does its > > magic. > > OK - here is a crude patch to fix some problems with mutex priority > inheritance. My theory is that the clock thread gets stuck waiting on > GIANT. > > During release/acquisition of a contested sleep mutex there are a few > windows where a task can be preempted when actions (waking up blocked > threads, ownership of the mutex, ..) need to be atomic as far as > scheduling is concerned. Otherwise priority inheritance may fail. The > patch uses critical_enter/critical_exit to protect these regions against > preemption. > > It would be great if could run this in addition to the other patches. > > Stephan OK, did so. Doesn't seem to make any difference. In order to spot a freeze I have instrumented hardclock() to report if Giant is being held more than 60 seconds. I don't know if this is any help, but here are examples of two freezes, both unfrozen by ping: Mounted root from ufs:/dev/ad0s1a. Giant held for more than 60 sec by td 0xc1ad5180, pid 1100 ~KDB: enter: Line break on console [thread 100105] Stopped at kdb_enter+0x2b: nop db> where 1100 sched_switch(c1ad5180,0,1) at sched_switch+0x14f mi_switch(1,0) at mi_switch+0x264 turnstile_wait(c17ec700,c10429cc,c1ad9900,c10429cc,2,c07f9e66,219) at turnstile_wait+0x2ec _mtx_lock_sleep(c10429cc,c1ad5180,0,c0812243,88d) at _mtx_lock_sleep+0x167 _mtx_lock_flags(c10429cc,0,c0812243,88d) at _mtx_lock_flags+0x85 vm_map_entry_delete(c1a63708,c2071374,cf289a10,c074a75b,c1a63708) at vm_map_entry_delete+0x7e vm_map_delete(c1a63708,0,bfc00000,c1a63708,c1a63708) at vm_map_delete+0x18f vm_map_remove(c1a63708,0,bfc00000) at vm_map_remove+0x42 exec_new_vmspace(cf289b94,c0897da0,c07f81d2,31e,c17d6318) at exec_new_vmspace+0x175 exec_elf32_imgact(cf289b94,c08c4e58,c08c4ef8,0,0) at exec_elf32_imgact+0x1b3 kern_execve(c1ad5180,8067470,806739c,8067404,0) at kern_execve+0x30e execve(c1ad5180,cf289d14,3,0,286) at execve+0x18 syscall(2f,2f,2f,8067470,806739c) at syscall+0x213 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (59, FreeBSD ELF32, execve), eip = 0x2812b01f, esp = 0xbfbfe68c, ebp = 0xbfbfe6b8 --- db> show locks 1100 exclusive sx user map r = 0 (0xc1a6374c) locked @ vm/vm_map.c:2313 exclusive sleep mutex Giant r = 0 (0xc08bc2c0) locked @ vm/vm_object.c:453 db> c pid 1353: corrected slot count (2->1) Giant held for more than 60 sec by td 0xc1cba900, pid 1453 ~KDB: enter: Line break on console [thread 100134] Stopped at kdb_enter+0x2b: nop db> where 1453 sched_switch(c1cba900,0,1) at sched_switch+0x14f mi_switch(1,0) at mi_switch+0x264 turnstile_wait(c1af1e00,c08f5ea0,c1caa000,c08f5ea0,2,c07f9e66,219) at turnstile_wait+0x2ec _mtx_lock_sleep(c08f5ea0,c1cba900,0,c080174c,d99) at _mtx_lock_sleep+0x167 _mtx_lock_flags(c08f5ea0,0,c080174c,d99,c1a755ac) at _mtx_lock_flags+0x85 vfs_clean_pages(c66638e8,c66638e8,8048cd9,c7f16e82,1) at vfs_clean_pages+0x7c bdwrite(c66638e8) at bdwrite+0x2d0 ffs_write(cf407c14) at ffs_write+0x558 vn_write(c1d78ae4,cf407c88,c1a59880,0,c1cba900) at vn_write+0x1f8 dofilewrite(c1cba900,c1d78ae4,1,8048cd9,1) at dofilewrite+0xa8 write(c1cba900,cf407d14,3,45,292) at write+0x39 syscall(804002f,bfbf002f,bfbf002f,8049568,bfbfebc0) at syscall+0x213 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x280c151f, esp = 0xbfbfeb3c, ebp = 0xbfbfeb78 --- db> show locks 1453 exclusive sleep mutex vm object (standard object) r = 0 (0xc1a755ac) locked @ kern/vfs_bio.c:3480 exclusive sleep mutex Giant r = 0 (0xc08bc2c0) locked @ kern/vfs_vnops.c:582 db> c -- Peter Holm --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="kern_clock.diff" cvs diff: Diffing . Index: kern_clock.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_clock.c,v retrieving revision 1.172 diff -u -r1.172 kern_clock.c --- kern_clock.c 10 Jul 2004 21:36:01 -0000 1.172 +++ kern_clock.c 30 Sep 2004 07:19:44 -0000 @@ -193,6 +193,14 @@ mtx_unlock_spin_flags(&sched_lock, MTX_QUIET); } +int pho = 0; +int pho_giant = 0; +struct thread *pho_giant_td = NULL; +#define mtx_unowned(m) ((m)->mtx_lock == MTX_UNOWNED) +#define mtx_owner(m) (mtx_unowned((m)) ? NULL \ + : (struct thread *)((m)->mtx_lock & MTX_FLAGMASK)) + + /* * The real-time timer, interrupting hz times per second. */ @@ -239,6 +247,23 @@ if (need_softclock) swi_sched(softclock_ih, 0); + if (pho > 0) + if (--pho == 0) panic("testing ..."); + + if (!(mtx_unowned(&Giant))) { + if (pho_giant_td != mtx_owner(&Giant)) { + pho_giant_td = mtx_owner(&Giant); + pho_giant = 0; + } + if (++pho_giant == 60*hz) { + printf("Giant held for more than %d sec by td %p, pid %d\n", + pho_giant / hz, pho_giant_td, + pho_giant_td->td_proc->p_pid); + pho_giant = 0; + } + } else + pho_giant = 0; + #ifdef SW_WATCHDOG if (watchdog_enabled > 0 && --watchdog_ticks <= 0) watchdog_fire(); --Kj7319i9nmIyA2yE-- From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 13:05:21 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B7E7216A4CE; Thu, 30 Sep 2004 13:05:21 +0000 (GMT) Received: from athena.softcardsystems.com (mail.softcardsystems.com [12.34.136.114]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4E0F643D3F; Thu, 30 Sep 2004 13:05:21 +0000 (GMT) (envelope-from sah@softcardsystems.com) Received: from athena (athena [12.34.136.114])i8UE4Ika006325; Thu, 30 Sep 2004 09:04:18 -0500 Date: Thu, 30 Sep 2004 09:04:18 -0500 (EST) From: Sam X-X-Sender: sah@athena To: Stephan Uphoff In-Reply-To: <1095976309.53798.8390.camel@palm.tree.com> Message-ID: References: <41508FEB.6030203@elischer.org><20040923191423.GE61631@FreeBSD.org> <41532FA0.6030405@elischer.org> <41533E0D.9000908@elischer.org> <1095976309.53798.8390.camel@palm.tree.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: re@freebsd.org cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: AoE for 4.x X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 13:05:21 -0000 I haven't heard any major objections to my getting a major number -- can someone please step up and help me out? Sam On Thu, 23 Sep 2004, Stephan Uphoff wrote: > Since a complete disk operation in AoE is encapsulated in a single > Ethernet request/response pair - The data size of a read/write operation > is smaller than a single page. > > I don't think any existing framework can deal with this efficiently. > > Stephan > > On Thu, 2004-09-23 at 17:20, Julian Elischer wrote: >> you could look at the sbp driver that is part of the firewire code.. >> I think that may be the closest analog. >> >> >> Sam wrote: >> >>> On Thu, 23 Sep 2004, Julian Elischer wrote: >>> >>>> I think that if you have a working driver we can assign you a number. >>>> I do have some questions however.. >>>> >>>> this is AoE.. is it not possible at all to combne it with either the CAM >>>> framework (such as the atapicam stuff) or the existing ATA stuff.. >>>> Don't take this the wrong way.. it's just a question.. >>>> CAM is being used to talk to drives over firewire, usb, ata, scsi, >>>> fibrechannel. >>>> it would seem that to unify this would be something that we should >>>> look at.. >>>> Of course CAM itslef is showing its age in soem places and it could >>>> do with some work itself.. >>> >>> >>> It might be possible to plug into the CAM; I only briefly >>> glanced at it and it didn't appear appropriate. The ATA >>> layer definitely isn't as parts of ATA don't make sense >>> in this context (Read DMA, Read Multiple, eg) and AoE >>> devices don't conform to the simple hardware probe/attach >>> methodology (as I understand it). >>> >>> I would love to be proved wrong. I'm always willing to >>> try a new approach if it's demonstrably better. >>> >>> Sam >> >> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >> >> > > From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 17:35:36 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 40F8516A4CE for ; Thu, 30 Sep 2004 17:35:36 +0000 (GMT) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0A0A443D49 for ; Thu, 30 Sep 2004 17:35:34 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 26464 invoked from network); 30 Sep 2004 17:35:33 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 30 Sep 2004 17:35:32 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8UHZREi019616; Thu, 30 Sep 2004 13:35:27 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Thu, 30 Sep 2004 10:17:54 -0400 User-Agent: KMail/1.6.2 References: <1095468747.31297.241.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> In-Reply-To: <1096496057.3733.2163.camel@palm.tree.com> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200409301017.54350.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: Peter Holm cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 17:35:36 -0000 On Wednesday 29 September 2004 06:14 pm, Stephan Uphoff wrote: > On Wed, 2004-09-29 at 16:52, John Baldwin wrote: > > > > OK - here is a crude patch to fix some problems with mutex priority > > > > inheritance. My theory is that the clock thread gets stuck waiting on > > > > GIANT. > > > > > > > > During release/acquisition of a contested sleep mutex there are a few > > > > windows where a task can be preempted when actions (waking up blocked > > > > threads, ownership of the mutex, ..) need to be atomic as far as > > > > scheduling is concerned. Otherwise priority inheritance may fail. The > > > > patch uses critical_enter/critical_exit to protect these regions > > > > against preemption. > > > > > > > > It would be great if could run this in addition to the other patches. > > > > turnstile_claim() doesn't make any threads runnable and thus can't > > preempt. The other place is supposed to preempt, and it should be ok to > > do so. Note that since the turnstile chain lock is held, that includes a > > nested critical section and any preemption will be deferred until the > > turnstile lock is released via turnstile_release which happens in the > > middle of > > turnstile_unpend() after it has finished building a list of all the > > threads to be made runnable so that the turnstile object can be re-used > > safely. I don't think this patch will make much of a difference (if > > any). Can you provide a description of a case where you think the > > priority inheritance can fail if turnstile_unpend() doesn't run in a > > nested critical section? > > This is a bit of a mind bender. > I hope you have some aspirins close by ;-) > > Thread A holds a mutex x contested by Thread B and has priority pri(A). > Thread B holds a mutex y. > There is a thread C with priority pri(C) with pri(C) < pri(A). > > Thread A is in the process of releasing x. > It removes thread B from the turnstile and holds a pointer to B in a > private list. > Thread A sets the owner of the turnstile to NULL and releases all spin > locks. ( mtx_unlock_spin(&tc->tc_lock); line 148) > This means interrupts are now enabled. > > An interrupt occurs (or is already pending) and the interrupt handler > puts the associated interrupt thread I on the run queue. > This causes a preemption from A to I. > The interrupt thread I tries to acquire mutex y owned by B and blocks. > I donates its priority to B - but inheritance stops at B. > The next thread with the best priority is C and the cpu switches to C. > However B needs A to run to make it to the run-queue. > > If y is GIANT and I is the clock thread C could run forever in userspace > without being interrupted. Fair enough. The right place to fix this is in turnstile_unpend() though I think. I have had these patches that try to "clump" setrunqueue's before preempting lying around (but not thoroughly tested yet) that might fix this as well but in the turnstile code itself: --- //depot/projects/smpng/sys/kern/kern_thread.c 2004/09/22 15:31:15 +++ //depot/user/jhb/preemption/kern/kern_thread.c 2004/09/22 16:59:47 @@ -954,6 +954,7 @@ p->p_suspcount++; TD_SET_SUSPENDED(td); TAILQ_INSERT_TAIL(&p->p_suspended, td, td_runq); +#if 0 /* * Hack: If we are suspending but are on the sleep queue * then we are in msleep or the cv equivalent. We @@ -962,6 +963,7 @@ */ if (TD_ON_SLEEPQ(td)) TD_SET_SLEEPING(td); +#endif } void @@ -988,9 +990,11 @@ mtx_assert(&sched_lock, MA_OWNED); PROC_LOCK_ASSERT(p, MA_OWNED); if (!P_SHOULDSTOP(p)) { + critical_enter(); while ((td = TAILQ_FIRST(&p->p_suspended))) { thread_unsuspend_one(td); } + critical_exit(); } else if ((P_SHOULDSTOP(p) == P_STOPPED_SINGLE) && (p->p_numthreads == p->p_suspcount)) { /* @@ -1025,9 +1029,11 @@ * to continue however as this is a bad place to stop. */ if ((p->p_numthreads != 1) && (!P_SHOULDSTOP(p))) { - while (( td = TAILQ_FIRST(&p->p_suspended))) { + critical_enter(); + while ((td = TAILQ_FIRST(&p->p_suspended))) { thread_unsuspend_one(td); } + critical_exit(); } mtx_unlock_spin(&sched_lock); } --- //depot/projects/smpng/sys/kern/subr_sleepqueue.c 2004/08/20 17:10:02 +++ //depot/user/jhb/preemption/kern/subr_sleepqueue.c 2004/09/10 21:36:10 @@ -400,9 +400,10 @@ * just return. */ if (td->td_sleepqueue != NULL) { - MPASS(!TD_ON_SLEEPQ(td)); mtx_unlock_spin(&sc->sc_lock); mtx_lock_spin(&sched_lock); + MPASS(!TD_ON_SLEEPQ(td)); + MPASS(!TD_IS_SLEEPING(td)); return; } @@ -709,11 +710,13 @@ sleepq_release(wchan); /* Resume all the threads on the temporary list. */ + critical_enter(); while (!TAILQ_EMPTY(&list)) { td = TAILQ_FIRST(&list); TAILQ_REMOVE(&list, td, td_slpq); sleepq_resume_thread(td, pri); } + critical_exit(); } /* --- //depot/projects/smpng/sys/kern/subr_turnstile.c 2004/09/03 14:14:21 +++ //depot/user/jhb/preemption/kern/subr_turnstile.c 2004/09/10 21:36:10 @@ -727,6 +726,7 @@ * in turnstile_wait(). Set a flag to force it to try to acquire * the lock again instead of blocking. */ + critical_enter(); while (!TAILQ_EMPTY(&pending_threads)) { td = TAILQ_FIRST(&pending_threads); TAILQ_REMOVE(&pending_threads, td, td_lockq); @@ -742,6 +742,7 @@ MPASS(TD_IS_RUNNING(td) || TD_ON_RUNQ(td)); } } + critical_exit(); mtx_unlock_spin(&sched_lock); } --- //depot/projects/smpng/sys/vm/vm_glue.c 2004/09/22 15:31:15 +++ //depot/user/jhb/preemption/vm/vm_glue.c 2004/09/22 16:59:47 @@ -753,6 +753,7 @@ vm_thread_swapin(td); PROC_LOCK(p); + critical_enter(); mtx_lock_spin(&sched_lock); p->p_sflag &= ~PS_SWAPPINGIN; p->p_sflag |= PS_INMEM; @@ -767,6 +768,7 @@ /* Allow other threads to swap p out now. */ --p->p_lock; + critical_exit(); } #endif /* NO_SWAPPING */ } I.e., you could just move the critical_enter() in subr_turnstile.c earlier so it is before the mtx_unlock_spin() of the turnstile chain lock. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 18:30:56 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C29016A4CE for ; Thu, 30 Sep 2004 18:30:56 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 7232643D48 for ; Thu, 30 Sep 2004 18:30:55 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 2097 invoked by uid 89); 30 Sep 2004 18:30:53 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 30 Sep 2004 18:30:53 -0000 Received: (qmail 2076 invoked by uid 89); 30 Sep 2004 18:30:53 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 30 Sep 2004 18:30:53 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8UIUpmt022172; Thu, 30 Sep 2004 14:30:52 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <200409301017.54350.jhb@FreeBSD.org> References: <1095468747.31297.241.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <200409301017.54350.jhb@FreeBSD.org> Content-Type: text/plain Message-Id: <1096569051.21577.23.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Thu, 30 Sep 2004 14:30:51 -0400 Content-Transfer-Encoding: 7bit cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 18:30:56 -0000 On Thu, 2004-09-30 at 10:17, John Baldwin wrote: > Fair enough. The right place to fix this is in turnstile_unpend() though I > think. I have had these patches that try to "clump" setrunqueue's before > preempting lying around (but not thoroughly tested yet) that might fix this > as well but in the turnstile code itself: - snip - > --- //depot/projects/smpng/sys/kern/subr_turnstile.c 2004/09/03 14:14:21 > +++ //depot/user/jhb/preemption/kern/subr_turnstile.c 2004/09/10 21:36:10 > @@ -727,6 +726,7 @@ > * in turnstile_wait(). Set a flag to force it to try to acquire > * the lock again instead of blocking. > */ > + critical_enter(); > while (!TAILQ_EMPTY(&pending_threads)) { > td = TAILQ_FIRST(&pending_threads); > TAILQ_REMOVE(&pending_threads, td, td_lockq); > @@ -742,6 +742,7 @@ > MPASS(TD_IS_RUNNING(td) || TD_ON_RUNQ(td)); > } > } > + critical_exit(); > mtx_unlock_spin(&sched_lock); > } -snip - > > I.e., you could just move the critical_enter() in subr_turnstile.c earlier so > it is before the mtx_unlock_spin() of the turnstile chain lock. I agree - this would be the right place. I was originally planning to do some more work in kern_mutex and did not want to touch more than one file ;-) Can you check this in? Your other patches look like they are targeted to avoid senseless switching to improve performance - but should not have an impact on correct function. Right ? Hopefully I get some time to look at them more closely later on. Stephan From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 20:38:30 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E03A316A4D0 for ; Thu, 30 Sep 2004 20:38:30 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 24EF443D1D for ; Thu, 30 Sep 2004 20:38:30 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 13021 invoked from network); 30 Sep 2004 20:38:27 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 30 Sep 2004 20:38:27 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8UKcRCs055209; Thu, 30 Sep 2004 22:38:27 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i8UKcQti055208; Thu, 30 Sep 2004 22:38:26 +0200 (CEST) (envelope-from pho) Date: Thu, 30 Sep 2004 22:38:26 +0200 From: Peter Holm To: John Baldwin Message-ID: <20040930203826.GA55153@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <200409301017.54350.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200409301017.54350.jhb@FreeBSD.org> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Stephan Uphoff cc: Julian Elischer cc: freebsd-arch@FreeBSD.org Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 20:38:31 -0000 On Thu, Sep 30, 2004 at 10:17:54AM -0400, John Baldwin wrote: > On Wednesday 29 September 2004 06:14 pm, Stephan Uphoff wrote: > > On Wed, 2004-09-29 at 16:52, John Baldwin wrote: > > > > > OK - here is a crude patch to fix some problems with mutex priority > > > > > inheritance. My theory is that the clock thread gets stuck waiting on > > > > > GIANT. > > > > > > > > > > During release/acquisition of a contested sleep mutex there are a few > > > > > windows where a task can be preempted when actions (waking up blocked > > > > > threads, ownership of the mutex, ..) need to be atomic as far as > > > > > scheduling is concerned. Otherwise priority inheritance may fail. The > > > > > patch uses critical_enter/critical_exit to protect these regions > > > > > against preemption. > > > > > > > > > > It would be great if could run this in addition to the other patches. > > > > > > turnstile_claim() doesn't make any threads runnable and thus can't > > > preempt. The other place is supposed to preempt, and it should be ok to > > > do so. Note that since the turnstile chain lock is held, that includes a > > > nested critical section and any preemption will be deferred until the > > > turnstile lock is released via turnstile_release which happens in the > > > middle of > > > turnstile_unpend() after it has finished building a list of all the > > > threads to be made runnable so that the turnstile object can be re-used > > > safely. I don't think this patch will make much of a difference (if > > > any). Can you provide a description of a case where you think the > > > priority inheritance can fail if turnstile_unpend() doesn't run in a > > > nested critical section? > > > > This is a bit of a mind bender. > > I hope you have some aspirins close by ;-) > > > > Thread A holds a mutex x contested by Thread B and has priority pri(A). > > Thread B holds a mutex y. > > There is a thread C with priority pri(C) with pri(C) < pri(A). > > > > Thread A is in the process of releasing x. > > It removes thread B from the turnstile and holds a pointer to B in a > > private list. > > Thread A sets the owner of the turnstile to NULL and releases all spin > > locks. ( mtx_unlock_spin(&tc->tc_lock); line 148) > > This means interrupts are now enabled. > > > > An interrupt occurs (or is already pending) and the interrupt handler > > puts the associated interrupt thread I on the run queue. > > This causes a preemption from A to I. > > The interrupt thread I tries to acquire mutex y owned by B and blocks. > > I donates its priority to B - but inheritance stops at B. > > The next thread with the best priority is C and the cpu switches to C. > > However B needs A to run to make it to the run-queue. > > > > If y is GIANT and I is the clock thread C could run forever in userspace > > without being interrupted. > > Fair enough. The right place to fix this is in turnstile_unpend() though I > think. I have had these patches that try to "clump" setrunqueue's before > preempting lying around (but not thoroughly tested yet) that might fix this > as well but in the turnstile code itself: > > --- //depot/projects/smpng/sys/kern/kern_thread.c 2004/09/22 15:31:15 > +++ //depot/user/jhb/preemption/kern/kern_thread.c 2004/09/22 16:59:47 > @@ -954,6 +954,7 @@ > p->p_suspcount++; > TD_SET_SUSPENDED(td); > TAILQ_INSERT_TAIL(&p->p_suspended, td, td_runq); > +#if 0 > /* > * Hack: If we are suspending but are on the sleep queue > * then we are in msleep or the cv equivalent. We > @@ -962,6 +963,7 @@ > */ > if (TD_ON_SLEEPQ(td)) > TD_SET_SLEEPING(td); > +#endif > } > > void > @@ -988,9 +990,11 @@ > mtx_assert(&sched_lock, MA_OWNED); > PROC_LOCK_ASSERT(p, MA_OWNED); > if (!P_SHOULDSTOP(p)) { > + critical_enter(); > while ((td = TAILQ_FIRST(&p->p_suspended))) { > thread_unsuspend_one(td); > } > + critical_exit(); > } else if ((P_SHOULDSTOP(p) == P_STOPPED_SINGLE) && > (p->p_numthreads == p->p_suspcount)) { > /* > @@ -1025,9 +1029,11 @@ > * to continue however as this is a bad place to stop. > */ > if ((p->p_numthreads != 1) && (!P_SHOULDSTOP(p))) { > - while (( td = TAILQ_FIRST(&p->p_suspended))) { > + critical_enter(); > + while ((td = TAILQ_FIRST(&p->p_suspended))) { > thread_unsuspend_one(td); > } > + critical_exit(); > } > mtx_unlock_spin(&sched_lock); > } > --- //depot/projects/smpng/sys/kern/subr_sleepqueue.c 2004/08/20 17:10:02 > +++ //depot/user/jhb/preemption/kern/subr_sleepqueue.c 2004/09/10 21:36:10 > @@ -400,9 +400,10 @@ > * just return. > */ > if (td->td_sleepqueue != NULL) { > - MPASS(!TD_ON_SLEEPQ(td)); > mtx_unlock_spin(&sc->sc_lock); > mtx_lock_spin(&sched_lock); > + MPASS(!TD_ON_SLEEPQ(td)); > + MPASS(!TD_IS_SLEEPING(td)); > return; > } > > @@ -709,11 +710,13 @@ > sleepq_release(wchan); > > /* Resume all the threads on the temporary list. */ > + critical_enter(); > while (!TAILQ_EMPTY(&list)) { > td = TAILQ_FIRST(&list); > TAILQ_REMOVE(&list, td, td_slpq); > sleepq_resume_thread(td, pri); > } > + critical_exit(); > } > > /* > --- //depot/projects/smpng/sys/kern/subr_turnstile.c 2004/09/03 14:14:21 > +++ //depot/user/jhb/preemption/kern/subr_turnstile.c 2004/09/10 21:36:10 > @@ -727,6 +726,7 @@ > * in turnstile_wait(). Set a flag to force it to try to acquire > * the lock again instead of blocking. > */ > + critical_enter(); > while (!TAILQ_EMPTY(&pending_threads)) { > td = TAILQ_FIRST(&pending_threads); > TAILQ_REMOVE(&pending_threads, td, td_lockq); > @@ -742,6 +742,7 @@ > MPASS(TD_IS_RUNNING(td) || TD_ON_RUNQ(td)); > } > } > + critical_exit(); > mtx_unlock_spin(&sched_lock); > } > > --- //depot/projects/smpng/sys/vm/vm_glue.c 2004/09/22 15:31:15 > +++ //depot/user/jhb/preemption/vm/vm_glue.c 2004/09/22 16:59:47 > @@ -753,6 +753,7 @@ > vm_thread_swapin(td); > > PROC_LOCK(p); > + critical_enter(); > mtx_lock_spin(&sched_lock); > p->p_sflag &= ~PS_SWAPPINGIN; > p->p_sflag |= PS_INMEM; > @@ -767,6 +768,7 @@ > > /* Allow other threads to swap p out now. */ > --p->p_lock; > + critical_exit(); > } > #endif /* NO_SWAPPING */ > } > > > I.e., you could just move the critical_enter() in subr_turnstile.c earlier so > it is before the mtx_unlock_spin() of the turnstile chain lock. > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org This patch did not seem to make the freeze problem go away. -- Peter Holm From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 22:00:52 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3AF2916A4EC for ; Thu, 30 Sep 2004 22:00:52 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id AB05243D39 for ; Thu, 30 Sep 2004 22:00:51 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 13106 invoked by uid 89); 30 Sep 2004 22:00:50 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 30 Sep 2004 22:00:50 -0000 Received: (qmail 13091 invoked by uid 89); 30 Sep 2004 22:00:50 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 30 Sep 2004 22:00:50 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8UM0nmt023181; Thu, 30 Sep 2004 18:00:49 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <20040930075759.GA52233@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1095529353.31297.1192.camel@palm.tree.com> <1096135220.53798.17754.camel@palm.tree.com> <20040926075218.GA85983@peter.osted.lan> <1096339936.3733.279.camel@palm.tree.com> <20040928074926.GA99957@peter.osted.lan> <1096383103.3733.312.camel@palm.tree.com> <20040929085748.GA19695@peter.osted.lan> <1096467843.3733.1145.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <20040930075759.GA52233@peter.osted.lan> Content-Type: text/plain Message-Id: <1096581649.21577.88.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Thu, 30 Sep 2004 18:00:49 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 22:00:52 -0000 On Thu, 2004-09-30 at 03:57, Peter Holm wrote: > OK, did so. Doesn't seem to make any difference. > In order to spot a freeze I have instrumented hardclock() to report if > Giant is being held more than 60 seconds. I don't know if this is any > help, but here are examples of two freezes, both unfrozen by ping: I will try to reproduce your environment here. Are you running your tests as root? Stephan From owner-freebsd-arch@FreeBSD.ORG Thu Sep 30 23:23:06 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0D99316A4CE for ; Thu, 30 Sep 2004 23:23:06 +0000 (GMT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id CD0B843D48 for ; Thu, 30 Sep 2004 23:23:05 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.13.0/8.13.0) with ESMTP id i8UNRAPt024585 for ; Thu, 30 Sep 2004 16:27:10 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.13.0/8.13.0/Submit) id i8UNRAjc024584 for arch@freebsd.org; Thu, 30 Sep 2004 16:27:10 -0700 Date: Thu, 30 Sep 2004 16:27:10 -0700 From: Brooks Davis To: arch@freebsd.org Message-ID: <20040930232710.GA19905@odin.ac.hmc.edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="h31gzZEtNLTqOjlF" Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Virus-Scanned: by amavisd-new X-Spam-Status: No, hits=0.0 required=8.0 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on odin.ac.hmc.edu Subject: mtree before mounting /usr X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 23:23:06 -0000 --h31gzZEtNLTqOjlF Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I'm trying to remove the dependencies on tools in /usr from the /etc/rc.d/var script. I've managed to eliminate touch and newsyslog, but I would like some feedback on the best way to eliminate mtree. Currently mtree lives in /usr/sbin. I need to populate the empty md(4) based /var. There are two main approaches I can think of: - Move mtree to /sbin. We'd probalby have to leave a symlink behind, but it would work and be fairly easy. - Add support to bsdtar for reading mtree files and use that functionality to create pax archives of BSD.var.dist and BSD.sendmail.dist. Comments? -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --h31gzZEtNLTqOjlF Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFBXJZNXY6L6fI4GtQRAhp/AJ0W25BulM1hYb07fEGdSYOQAf4AcQCgmDdb L6y1p25NScA2q2ROqMhojX8= =jqrs -----END PGP SIGNATURE----- --h31gzZEtNLTqOjlF-- From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 04:18:27 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6EF9416A4CE for ; Fri, 1 Oct 2004 04:18:27 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 13E2A43D5A for ; Fri, 1 Oct 2004 04:18:26 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 18303 invoked by uid 89); 1 Oct 2004 04:13:04 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 1 Oct 2004 04:13:04 -0000 Received: (qmail 18222 invoked by uid 89); 1 Oct 2004 04:13:02 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 1 Oct 2004 04:13:02 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i914D1mt024697; Fri, 1 Oct 2004 00:13:01 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <1096496057.3733.2163.camel@palm.tree.com> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> Content-Type: text/plain Message-Id: <1096603981.21577.195.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 01 Oct 2004 00:13:01 -0400 Content-Transfer-Encoding: 7bit cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 04:18:27 -0000 On Wed, 2004-09-29 at 18:14, Stephan Uphoff wrote: > I was looking at the MUTEX_WAKE_ALL undefined case when I used the > critical section for turnstile_claim(). > However there are bigger problems with MUTEX_WAKE_ALL undefined > so you are right - the critical section for turnstile_claim is pretty > useless. Arghhh !!! MUTEX_WAKE_ALL is NOT an option in GENERIC. I recall verifying that it is defined twice. Guess I must have looked at the wrong source tree :-( This means yes - we have bigger problems! Example: Thread A holds a mutex x contested by Thread B and C and has priority pri(A). Thread C holds a mutex y and pri(B) < pri(C) Thread A releases the lock wakes thread B but lets C on the turnstile wait queue. An interrupt thread I tries to lock mutex y owned by C. However priority inheritance does not work since B needs to run first to take ownership of the lock. I is blocked :-( This was found using Peter Holm's test and a slight modification of this giant hog detector. (kern_clock.diff) I definitely won't have time to fix kern_mutex.c for the next few days so please add the line: options MUTEX_WAKE_ALL # Needed do not remove to your configuration files. I also had overlooked http://www.holm.cc/stress/log/cons80.html Showing that my patch for kern_switch.c (switch_patch) has a bug. I will send an updated patch later today. Stephan PS: I love the firewire debugging speed! From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 05:23:25 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E32D516A6CA for ; Fri, 1 Oct 2004 05:23:25 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 6424A43D41 for ; Fri, 1 Oct 2004 05:23:25 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 13830 invoked by uid 89); 1 Oct 2004 05:23:24 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:23:24 -0000 Received: (qmail 13818 invoked by uid 89); 1 Oct 2004 05:23:24 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:23:24 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i915NLmt025003; Fri, 1 Oct 2004 01:23:22 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: John Baldwin In-Reply-To: <1096603981.21577.195.camel@palm.tree.com> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> Content-Type: multipart/mixed; boundary="=-fjzIiysJoZMWtL83qsgh" Message-Id: <1096608201.21577.203.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 01 Oct 2004 01:23:21 -0400 cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 05:23:26 -0000 --=-fjzIiysJoZMWtL83qsgh Content-Type: text/plain Content-Transfer-Encoding: 7bit On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: > I also had overlooked > http://www.holm.cc/stress/log/cons80.html > Showing that my patch for kern_switch.c (switch_patch) has a bug. > I will send an updated patch later today. OK - here is the promised patch. --=-fjzIiysJoZMWtL83qsgh Content-Disposition: attachment; filename=switch_patch_v2 Content-Type: text/x-patch; name=switch_patch_v2; charset=ASCII Content-Transfer-Encoding: 7bit Index: kern_switch.c =================================================================== RCS file: /cvsroot/src/sys/kern/kern_switch.c,v retrieving revision 1.95 diff -u -r1.95 kern_switch.c --- kern_switch.c 19 Sep 2004 18:34:17 -0000 1.95 +++ kern_switch.c 1 Oct 2004 05:15:16 -0000 @@ -315,6 +315,106 @@ td->td_priority = newpri; setrunqueue(td, SRQ_BORING); } + + +/* + * This function is called when a thread is about to be put on a + * ksegrp run queue because it has been made runnable or its + * priority has been adjusted and the ksegrp does not have a + * free kse slot. It determines if a thread from the same ksegrp + * should be preempted. If so, it tries to switch threads + * if the thread is on the same cpu or notifies another cpu that + * it should switch threads. + */ + +static void +maybe_preempt_in_ksegrp(struct thread *td) +{ +#if defined(SMP) + int highest_pri; + struct ksegrp *kg; + cpumask_t cpumask,dontuse; + struct pcpu *pc; + struct pcpu *highest_pcpu; + struct thread *running_thread; + +#ifndef FULL_PREEMPTION + int pri; + + pri = td->td_priority; + + if (!(pri >= PRI_MIN_ITHD && pri <= PRI_MAX_ITHD)) + return; +#endif + + mtx_assert(&sched_lock, MA_OWNED); + + running_thread = curthread; + +#if !defined(KSEG_PEEMPT_BEST_CPU) + if(running_thread->td_ksegrp != td->td_ksegrp) +#endif + { + kg = td->td_ksegrp; + + /* Anyone waiting in front ? */ + if(td != TAILQ_FIRST(&kg->kg_runq)) { + return; /* Yes - wait your turn*/ + } + highest_pri = td->td_priority; + highest_pcpu = NULL; + dontuse = stopped_cpus | idle_cpus_mask; + + /* Find a cpu with the worst priority that runs at thread from the + * same ksegrp - if multiple exist give first the last run cpu and then + * the current cpu priority + */ + + SLIST_FOREACH(pc, &cpuhead, pc_allcpu) { + cpumask = pc->pc_cpumask; + if ( (cpumask & dontuse) == 0 && + pc->pc_curthread->td_ksegrp == kg) { + if (pc->pc_curthread->td_priority > highest_pri) { + highest_pri = pc->pc_curthread->td_priority; + highest_pcpu = pc; + } else if (pc->pc_curthread->td_priority == highest_pri && + highest_pcpu != NULL) { + if (td->td_lastcpu == pc->pc_cpuid || + (PCPU_GET(cpumask) == cpumask && + td->td_lastcpu != highest_pcpu->pc_cpuid)) { + highest_pcpu = pc; + } + } + } + } + + /* Check if we need to preempt someone */ + if (highest_pcpu == NULL) return; + + if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) { + highest_pcpu->pc_curthread->td_flags |= TDF_NEEDRESCHED; + ipi_selected(highest_pcpu->pc_cpumask, IPI_AST); + return; + } + } +#else + KASSERT(running_thread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread")); +#endif + + if (td->td_priority > running_thread->td_priority) + return; +#ifdef PREEMPTION + if (running_thread->td_critnest > 1) { + running_thread->td_pflags |= TDP_OWEPREEMPT; + } else { + mi_switch(SW_INVOL, NULL); + } +#else + running_thread->td_flags |= TDF_NEEDRESCHED; +#endif + return; +} + int limitcount; void setrunqueue(struct thread *td, int flags) @@ -422,6 +522,7 @@ } else { CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d", td, td->td_ksegrp, td->td_proc->p_pid); + maybe_preempt_in_ksegrp(td); } } --=-fjzIiysJoZMWtL83qsgh-- From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 05:55:33 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4795616A4CF for ; Fri, 1 Oct 2004 05:55:33 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id D6D2443D5A for ; Fri, 1 Oct 2004 05:55:32 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 32529 invoked by uid 89); 1 Oct 2004 05:55:31 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:55:31 -0000 Received: (qmail 32490 invoked by uid 89); 1 Oct 2004 05:55:31 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:55:31 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i915tUmt025109; Fri, 1 Oct 2004 01:55:30 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: "freebsd-arch@freebsd.org" Content-Type: text/plain Message-Id: <1096610130.21577.219.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 01 Oct 2004 01:55:30 -0400 Content-Transfer-Encoding: 7bit cc: Peter Holm cc: Julian Elischer Subject: sched_switch (sched_4bsd) may be preempted X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 05:55:33 -0000 sched_switch (sched_4bsd) may be preempted in setrunqueue or slot_fill. This could be ugly. Wrapping it into a critical section and resetting TDP_OWEPREEMPT should work. Hand trimmed patch: RCS file: /cvsroot/src/sys/kern/sched_4bsd.c,v retrieving revision 1.65 diff -u -r1.65 sched_4bsd.c --- sys/kern/sched_4bsd.c 16 Sep 2004 07:12:59 -0000 1.65 +++ sys/kern/sched_4bsd.c 1 Oct 2004 05:35:28 -0000 @@ -823,6 +823,7 @@ TD_SET_CAN_RUN(td); else { td->td_ksegrp->kg_avail_opennings++; + critical_enter(); if (TD_IS_RUNNING(td)) { /* Put us back on the run queue (kse and all). */ setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING); @@ -834,6 +835,8 @@ */ slot_fill(td->td_ksegrp); } + critical_exit(); + td->td_pflags &= ~TDP_OWEPREEMPT; } if (newtd == NULL) newtd = choosethread(); From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 06:04:22 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F143B16A4CE; Fri, 1 Oct 2004 06:04:22 +0000 (GMT) Received: from harmony.village.org (rover.village.org [168.103.84.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7A46843D45; Fri, 1 Oct 2004 06:04:22 +0000 (GMT) (envelope-from imp@bsdimp.com) Received: from localhost (harmony.village.org [10.0.0.6]) by harmony.village.org (8.13.1/8.13.1) with ESMTP id i9163SWw043087; Fri, 1 Oct 2004 00:03:28 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Fri, 01 Oct 2004 00:04:52 -0600 (MDT) Message-Id: <20041001.000452.99281901.imp@bsdimp.com> To: sah@softcardsystems.com From: "M. Warner Losh" In-Reply-To: References: <41533E0D.9000908@elischer.org> <1095976309.53798.8390.camel@palm.tree.com> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-arch@freebsd.org cc: re@freebsd.org cc: julian@elischer.org cc: ups@tree.com Subject: Re: AoE for 4.x X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 06:04:23 -0000 In message: Sam writes: : I haven't heard any major objections to my getting a major : number -- can someone please step up and help me out? 187. Warner From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 07:57:48 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 21D0016A4CE for ; Fri, 1 Oct 2004 07:57:48 +0000 (GMT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id CDDB043D53 for ; Fri, 1 Oct 2004 07:57:47 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) i917vkvA017412; Fri, 1 Oct 2004 00:57:46 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i917vjac017409; Fri, 1 Oct 2004 00:57:45 -0700 (PDT) (envelope-from dillon) Date: Fri, 1 Oct 2004 00:57:45 -0700 (PDT) From: Matthew Dillon Message-Id: <200410010757.i917vjac017409@apollo.backplane.com> To: Stephan Uphoff References: <1096610130.21577.219.camel@palm.tree.com> cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: sched_switch (sched_4bsd) may be preempted X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 07:57:48 -0000 I would put the entire scheduler core in a critical section, not just the part you think needs to be there. It's just too critical a subsystem to be able to make operational assumptions of that nature in. It is what I do in the DragonFly LWKT core, BTW, and crazy as I am I would never even consider trying to move the critical section further in. I would not reset TDP_OWEPREEMPT there. If I understand its function correctly you need to leave it intact in order to detect preemption request races against the scheduler. Since at that point newtd may be non-NULL and thus not cause another scheduling queue check to be made before the next switch, you cannot safely clear the flag where you are clearing it. If you want to optimize operation of the flag I recommend storing the preempting entity's priority at the same point where TDP_OWEPREEMPT is set and then do a quick priority comparison in critical_exit() to avoid unnecessary mi_switch()'s. I would also not put the TDP_OWEPREEMPT flag in the thread structure. It really belongs in the globaldata structure so it remains properly intact through the thread switch, else you have more potential races even while *in* the critical section. Your TDP_OWEPREEMPT flag has almost exactly the same function as DFly's gd_reqflags word and I spent a long time thinking through where I would store it, and came to the conclusion that the globaldata structure was the best place. e.g. so FreeBSD's critical_exit() code would become this (note: I might have the priority comparison backwards, I forget how FBsd does it): if (td->td_critnest == 1) { #ifdef PREEMPTION mtx_assert(&sched_lock, MA_NOTOWNED); if (gd->gd_pflags & GDP_OWEPREEMPT) { <<< CHG TO gd gd->gd_pflags &= ~GDP_OWEPREEMPT; <<< CHG TO gd if (gd->gd_preempt_priority < td->td_priority) { << ADD mtx_lock_spin(&sched_lock); mi_switch(SW_INVOL, NULL); mtx_unlock_spin(&sched_lock); } } #endif td->td_critnest = 0; cpu_critical_exit(td); And the code which sets GDP_OWEPREEMPT would become this: [checks whether preemption is desired] if (ctd->td_critnest > 1) { CTR1(KTR_PROC, "maybe_preempt: in critical section %d", ctd->td_critnest); if ((gd->gd_pflags & GDP_OWEPREEMPT) == 0 || << ADD (gd) pri < gd->gd_preempt_priority) { << ADD (gd) gd->gd_pflags |= GDP_OWEPREEMPT; << CHG (gd) gd->gd_preempt_priority = pri; << ADD (gd) } return (0); } -Matt Matthew Dillon :sched_switch (sched_4bsd) may be preempted in setrunqueue or slot_fill. :This could be ugly. :Wrapping it into a critical section and resetting TDP_OWEPREEMPT should :work. : :Hand trimmed patch: : :RCS file: /cvsroot/src/sys/kern/sched_4bsd.c,v :retrieving revision 1.65 :diff -u -r1.65 sched_4bsd.c :--- sys/kern/sched_4bsd.c 16 Sep 2004 07:12:59 -0000 1.65 :+++ sys/kern/sched_4bsd.c 1 Oct 2004 05:35:28 -0000 :@@ -823,6 +823,7 @@ : TD_SET_CAN_RUN(td); : else { : td->td_ksegrp->kg_avail_opennings++; :+ critical_enter(); : if (TD_IS_RUNNING(td)) { : /* Put us back on the run queue (kse and all). :*/ : setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING); :@@ -834,6 +835,8 @@ : */ : slot_fill(td->td_ksegrp); : } :+ critical_exit(); :+ td->td_pflags &= ~TDP_OWEPREEMPT; : } : if (newtd == NULL) : newtd = choosethread(); From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 11:08:21 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E74A16A4CE for ; Fri, 1 Oct 2004 11:08:21 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 0B9E743D46 for ; Fri, 1 Oct 2004 11:08:21 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 61626 invoked from network); 1 Oct 2004 11:08:19 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 1 Oct 2004 11:08:19 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91B8ICs058177; Fri, 1 Oct 2004 13:08:18 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i91B8HOt058176; Fri, 1 Oct 2004 13:08:17 +0200 (CEST) (envelope-from pho) Date: Fri, 1 Oct 2004 13:08:17 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20041001110817.GA58111@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1096603981.21577.195.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: John Baldwin cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 11:08:21 -0000 On Fri, Oct 01, 2004 at 12:13:01AM -0400, Stephan Uphoff wrote: > On Wed, 2004-09-29 at 18:14, Stephan Uphoff wrote: > > I was looking at the MUTEX_WAKE_ALL undefined case when I used the > > critical section for turnstile_claim(). > > However there are bigger problems with MUTEX_WAKE_ALL undefined > > so you are right - the critical section for turnstile_claim is pretty > > useless. > > Arghhh !!! > > MUTEX_WAKE_ALL is NOT an option in GENERIC. > I recall verifying that it is defined twice. Guess I must have looked at > the wrong source tree :-( > This means yes - we have bigger problems! > > Example: > > Thread A holds a mutex x contested by Thread B and C and has priority > pri(A). > > Thread C holds a mutex y and pri(B) < pri(C) > > Thread A releases the lock wakes thread B but lets C on the turnstile > wait queue. > > An interrupt thread I tries to lock mutex y owned by C. > > However priority inheritance does not work since B needs to run first to > take ownership of the lock. > > I is blocked :-( > > This was found using Peter Holm's test and a slight modification of this > giant hog detector. (kern_clock.diff) > > I definitely won't have time to fix kern_mutex.c for the next few days > so please add the line: > > options MUTEX_WAKE_ALL # Needed do not remove > I like to test one thing at a time, so I added MUTEX_WAKE_ALL to HEAD from Sep 30 09:58 UTC. This did not seem to change any thing :-( I'll proceed with adding your switch_patch_v2 patch + your sched_4bsd.c patch, but without MUTEX_WAKE_ALL. - Peter > to your configuration files. > > I also had overlooked > http://www.holm.cc/stress/log/cons80.html > Showing that my patch for kern_switch.c (switch_patch) has a bug. > I will send an updated patch later today. > > Stephan > > PS: I love the firewire debugging speed! -- Peter Holm From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 12:52:52 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9218616A4CE; Fri, 1 Oct 2004 12:52:52 +0000 (GMT) Received: from athena.softcardsystems.com (mail.softcardsystems.com [12.34.136.114]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1224B43D31; Fri, 1 Oct 2004 12:52:52 +0000 (GMT) (envelope-from sah@softcardsystems.com) Received: from athena (athena [12.34.136.114])i91Dpelj014178; Fri, 1 Oct 2004 08:51:40 -0500 Date: Fri, 1 Oct 2004 08:51:40 -0500 (EST) From: Sam X-X-Sender: sah@athena To: "M. Warner Losh" In-Reply-To: <20041001.000452.99281901.imp@bsdimp.com> Message-ID: References: <41533E0D.9000908@elischer.org> <1095976309.53798.8390.camel@palm.tree.com> <20041001.000452.99281901.imp@bsdimp.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: freebsd-arch@freebsd.org cc: re@freebsd.org cc: julian@elischer.org cc: ups@tree.com Subject: Re: AoE for 4.x X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 12:52:52 -0000 Is that block & char, or do i just need to specify the block? On Fri, 1 Oct 2004, M. Warner Losh wrote: > In message: > Sam writes: > : I haven't heard any major objections to my getting a major > : number -- can someone please step up and help me out? > > 187. > > Warner > From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 14:10:45 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2083016A4CE for ; Fri, 1 Oct 2004 14:10:45 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 7877943D41 for ; Fri, 1 Oct 2004 14:10:44 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 83835 invoked from network); 1 Oct 2004 14:10:42 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 1 Oct 2004 14:10:42 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91EAgV7001641; Fri, 1 Oct 2004 16:10:42 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i91EAfs4001640; Fri, 1 Oct 2004 16:10:41 +0200 (CEST) (envelope-from pho) Date: Fri, 1 Oct 2004 16:10:40 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20041001141040.GA1556@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1096608201.21577.203.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: John Baldwin cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 14:10:45 -0000 On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote: > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: > > > I also had overlooked > > http://www.holm.cc/stress/log/cons80.html > > Showing that my patch for kern_switch.c (switch_patch) has a bug. > > I will send an updated patch later today. > > OK - here is the promised patch. > For once I'm the bearer of good news. The switch_patch_v2 + the sched_4bsd patch ran the tests for more than one hour without any freeze. The sched_4bsd alone did not stop the freezes. I'm now testing the switch_patch_v2 alone and it's looking good for 55+ minutes of testing. -- Peter Holm From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 15:02:01 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9619516A4CE for ; Fri, 1 Oct 2004 15:02:01 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4E97143D1D for ; Fri, 1 Oct 2004 15:02:01 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 4356 invoked from network); 1 Oct 2004 15:02:00 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 1 Oct 2004 15:02:00 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91F1sEG027282 for ; Fri, 1 Oct 2004 11:01:55 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: arch@FreeBSD.org Date: Fri, 1 Oct 2004 11:00:42 -0400 User-Agent: KMail/1.6.2 MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <200410011100.42302.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx Subject: [PATCH] Rework how we store process times in the kernel and deferring calcru() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 15:02:01 -0000 I'll commit this soonish unless there are any objections. The basic idea is to store process times resource usage as raw data (i.e. as bintimes and tick counts) for both process usage and child usage and only calculate the timeval style times if they are explicitly asked for. This lets us avoid always calling calcru() to calculate the timeval values in exit1() for example. A more detailed listing of the changes follows: - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. This also includes an additional fix so that calcru() now correctly handles threads from the process that are executing on other CPUs. Also, the calcru() now only locks sched_lock internally while doing the rux_runtime fixup. It now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. As a side effect of storing the raw values, the accuracy of the process timing has been approved. This makes benchmarking somewhat tricky as the appearance is that with this patch user times go way up but system times go way down. Thus, the only benchmarks I did were to compare real times and to also compare the sum of the user and system times to the real times. Thus, here are the results on a kernel w/o debugging (when WITNESS + INVARIANTS were on, the extra overhead resulted in no statistical difference in the before and after). For real times (100 runs of 10000 fork/wait loops): x smpng.fast.real + proc.fast.real +--------------------------------------------------------------------------+ | + | | + | | + + | | + + | | + + | | + + | | + + | | + + x x | | + + x x | | + + x x | | + + x x | | + + x x x | | + + x x x | | + + x x x | | + + x x x | | + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x x | | + + + x x x x x | | + + + + x x x x x | | + + + + x x x x x | | + + + + x x x x x x | | + + + + + * x x x x x | | + + + + + + * x x x x x | | + + + + + + * x x x x x | | + + + + + + + * * x x x x x | |+ + + + + + + + * * * x x x x x x| | |___M__A_____| |____M_A______| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 100 2.97 3.08 2.99 2.9959 0.018968075 + 100 2.88 2.99 2.93 2.9362 0.017568337 Difference at 95.0% confidence -0.0597 +/- 0.0050674 -1.99272% +/- 0.169145% (Student's t, pooled s = 0.0182816) So, close to about a 2% improvement. As far as accuracy "improvements", the numbers comparing sum of user + sys compared to "real" time is: x smpng.fast.real + smpng.fast.total N Min Max Median Avg Stddev x 100 2.97 3.08 2.99 2.9959 0.018968075 + 100 2.83 2.93 2.86 2.8601 0.016111668 Difference at 95.0% confidence -0.1358 +/- 0.0048779 -4.53286% +/- 0.162819% (Student's t, pooled s = 0.0175979) And for the kernel with the patch: x proc.fast.real + proc.fast.total N Min Max Median Avg Stddev x 100 2.88 2.99 2.93 2.9362 0.017568337 + 100 2.85 2.96 2.92 2.9201 0.017551943 Difference at 95.0% confidence -0.0161 +/- 0.00486742 -0.548328% +/- 0.165773% (Student's t, pooled s = 0.0175601) Thus, the total counts are closer to the real times with the patch than without. The missing counts can be interrupt time or time for other processes, of course. Given that the box was idle and in the same situation for both tests and that these types of results were obtained across numerous repeated tests with several different benchmarks I think the difference in these last two is due to improved accuracy in the accounting. The patch is at http://www.freebsd.org/~jhb/patches/rusage_ext.patch and is largely based on a patch given to me by bde@. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 15:02:04 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2287116A4D8 for ; Fri, 1 Oct 2004 15:02:04 +0000 (GMT) Received: from mail2.speakeasy.net (mail2.speakeasy.net [216.254.0.202]) by mx1.FreeBSD.org (Postfix) with ESMTP id D1C8943D4C for ; Fri, 1 Oct 2004 15:02:03 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 27300 invoked from network); 1 Oct 2004 15:02:03 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 1 Oct 2004 15:02:02 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91F1sEH027282 for ; Fri, 1 Oct 2004 11:01:58 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: arch@FreeBSD.org Date: Fri, 1 Oct 2004 11:02:43 -0400 User-Agent: KMail/1.6.2 MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <200410011102.43394.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx Subject: [PATCH] Rework how we store process times in the kernel and deferring calcru() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 15:02:04 -0000 I'll commit this soonish unless there are any objections. The basic idea is to store process times resource usage as raw data (i.e. as bintimes and tick counts) for both process usage and child usage and only calculate the timeval style times if they are explicitly asked for. This lets us avoid always calling calcru() to calculate the timeval values in exit1() for example. A more detailed listing of the changes follows: - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. This also includes an additional fix so that calcru() now correctly handles threads from the process that are executing on other CPUs. Also, the calcru() now only locks sched_lock internally while doing the rux_runtime fixup. It now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. As a side effect of storing the raw values, the accuracy of the process timing has been approved. This makes benchmarking somewhat tricky as the appearance is that with this patch user times go way up but system times go way down. Thus, the only benchmarks I did were to compare real times and to also compare the sum of the user and system times to the real times. Thus, here are the results on a kernel w/o debugging (when WITNESS + INVARIANTS were on, the extra overhead resulted in no statistical difference in the before and after). For real times (100 runs of 10000 fork/wait loops): x smpng.fast.real + proc.fast.real +--------------------------------------------------------------------------+ | + | | + | | + + | | + + | | + + | | + + | | + + | | + + x x | | + + x x | | + + x x | | + + x x | | + + x x x | | + + x x x | | + + x x x | | + + x x x | | + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x | | + + + x x x x x | | + + + x x x x x | | + + + + x x x x x | | + + + + x x x x x | | + + + + x x x x x x | | + + + + + * x x x x x | | + + + + + + * x x x x x | | + + + + + + * x x x x x | | + + + + + + + * * x x x x x | |+ + + + + + + + * * * x x x x x x| | |___M__A_____| |____M_A______| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 100 2.97 3.08 2.99 2.9959 0.018968075 + 100 2.88 2.99 2.93 2.9362 0.017568337 Difference at 95.0% confidence -0.0597 +/- 0.0050674 -1.99272% +/- 0.169145% (Student's t, pooled s = 0.0182816) So, close to about a 2% improvement. As far as accuracy "improvements", the numbers comparing sum of user + sys compared to "real" time is: x smpng.fast.real + smpng.fast.total N Min Max Median Avg Stddev x 100 2.97 3.08 2.99 2.9959 0.018968075 + 100 2.83 2.93 2.86 2.8601 0.016111668 Difference at 95.0% confidence -0.1358 +/- 0.0048779 -4.53286% +/- 0.162819% (Student's t, pooled s = 0.0175979) And for the kernel with the patch: x proc.fast.real + proc.fast.total N Min Max Median Avg Stddev x 100 2.88 2.99 2.93 2.9362 0.017568337 + 100 2.85 2.96 2.92 2.9201 0.017551943 Difference at 95.0% confidence -0.0161 +/- 0.00486742 -0.548328% +/- 0.165773% (Student's t, pooled s = 0.0175601) Thus, the total counts are closer to the real times with the patch than without the patch. Given that these results were repeated numerous times with different benchmarks on an idle box in the same state I feel that these differences indicate an improvement in the accuracy of the accounting. The patch is at http://www.FreeBSD.org/~jhb/patches/rusage_ext.patch and is largely based on a patch originally submitted by bde@. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 15:14:39 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C96B16A4CE for ; Fri, 1 Oct 2004 15:14:39 +0000 (GMT) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id E025443D46 for ; Fri, 1 Oct 2004 15:14:38 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6388 invoked from network); 1 Oct 2004 15:14:38 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 1 Oct 2004 15:14:38 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91FEYNi027393; Fri, 1 Oct 2004 11:14:35 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Fri, 1 Oct 2004 11:14:42 -0400 User-Agent: KMail/1.6.2 References: <200410011102.43394.jhb@FreeBSD.org> In-Reply-To: <200410011102.43394.jhb@FreeBSD.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410011114.42446.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: arch@FreeBSD.org Subject: Re: [PATCH] Rework how we store process times in the kernel and deferring calcru() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 15:14:39 -0000 On Friday 01 October 2004 11:02 am, John Baldwin wrote: > I'll commit this soonish unless there are any objections. The basic idea > is to store process times resource usage as raw data (i.e. as bintimes and > tick counts) for both process usage and child usage and only calculate the > timeval style times if they are explicitly asked for. This lets us avoid > always calling calcru() to calculate the timeval values in exit1() for > example. A more detailed listing of the changes follows: Sorry for the dupe. kmail crashed and I wasn't sure the first one had made it, esp. given that it came back up with a partial version of this e-mail (hence the two different endings). -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 15:14:39 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1F13D16A4CF for ; Fri, 1 Oct 2004 15:14:39 +0000 (GMT) Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id E041443D49 for ; Fri, 1 Oct 2004 15:14:38 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 6388 invoked from network); 1 Oct 2004 15:14:38 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 1 Oct 2004 15:14:38 -0000 Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91FEYNi027393; Fri, 1 Oct 2004 11:14:35 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Fri, 1 Oct 2004 11:14:42 -0400 User-Agent: KMail/1.6.2 References: <200410011102.43394.jhb@FreeBSD.org> In-Reply-To: <200410011102.43394.jhb@FreeBSD.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410011114.42446.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: arch@FreeBSD.org Subject: Re: [PATCH] Rework how we store process times in the kernel and deferring calcru() X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 15:14:39 -0000 On Friday 01 October 2004 11:02 am, John Baldwin wrote: > I'll commit this soonish unless there are any objections. The basic idea > is to store process times resource usage as raw data (i.e. as bintimes and > tick counts) for both process usage and child usage and only calculate the > timeval style times if they are explicitly asked for. This lets us avoid > always calling calcru() to calculate the timeval values in exit1() for > example. A more detailed listing of the changes follows: Sorry for the dupe. kmail crashed and I wasn't sure the first one had made it, esp. given that it came back up with a partial version of this e-mail (hence the two different endings). -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 15:27:13 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3C93916A4CE for ; Fri, 1 Oct 2004 15:27:13 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id B962D43D39 for ; Fri, 1 Oct 2004 15:27:12 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 18323 invoked by uid 89); 1 Oct 2004 15:27:10 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 1 Oct 2004 15:27:10 -0000 Received: (qmail 18243 invoked by uid 89); 1 Oct 2004 15:27:09 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 1 Oct 2004 15:27:09 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i91FR8mt027692; Fri, 1 Oct 2004 11:27:08 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Matthew Dillon In-Reply-To: <200410010757.i917vjac017409@apollo.backplane.com> References: <1096610130.21577.219.camel@palm.tree.com> <200410010757.i917vjac017409@apollo.backplane.com> Content-Type: text/plain Message-Id: <1096644427.25800.26.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 01 Oct 2004 11:27:08 -0400 Content-Transfer-Encoding: 7bit cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: sched_switch (sched_4bsd) may be preempted X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 15:27:13 -0000 On Fri, 2004-10-01 at 03:57, Matthew Dillon wrote: > I would put the entire scheduler core in a critical section, not just > the part you think needs to be there. It's just too critical a subsystem > to be able to make operational assumptions of that nature in. It is > what I do in the DragonFly LWKT core, BTW, and crazy as I am I would > never even consider trying to move the critical section further in. The core is wrapped in the sched_lock. And since it is a spin lock it is running in a critical section with interrupts disabled. The additional (recursive) critical_enter is just an abusive way to tell maybe_preempt* that it should not immediately switch. ( Yes - eventually there should be a better way to do this) > > I would not reset TDP_OWEPREEMPT there. If I understand its function > correctly you need to leave it intact in order to detect preemption > request races against the scheduler. Since at that point newtd may > be non-NULL and thus not cause another scheduling queue check to be > made before the next switch, you cannot safely clear the flag where you > are clearing it. This is all running in critical section and we just decided to switch and either have or will pick the best thread. Interrupts are locked. The additional critical section just prevents recursion problems by delaying unwanted switches in maybe_preempt* . Resetting TDP_OWEPREEMPT is perfectly save since we switch to the thread chosen while everything has been locked. Stephan From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 16:13:29 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 02E2A16A4CE for ; Fri, 1 Oct 2004 16:13:29 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 791CE43D55 for ; Fri, 1 Oct 2004 16:13:28 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 1258 invoked by uid 89); 1 Oct 2004 16:13:26 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 1 Oct 2004 16:13:26 -0000 Received: (qmail 570 invoked by uid 89); 1 Oct 2004 16:13:14 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 1 Oct 2004 16:13:14 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i91GDEmt027945; Fri, 1 Oct 2004 12:13:14 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <20041001141040.GA1556@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> Content-Type: text/plain Message-Id: <1096647194.27811.12.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 01 Oct 2004 12:13:14 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: John Baldwin cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 16:13:29 -0000 On Fri, 2004-10-01 at 10:10, Peter Holm wrote: > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote: > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: > > > > > I also had overlooked > > > http://www.holm.cc/stress/log/cons80.html > > > Showing that my patch for kern_switch.c (switch_patch) has a bug. > > > I will send an updated patch later today. > > > > OK - here is the promised patch. > > > > For once I'm the bearer of good news. The switch_patch_v2 + the > sched_4bsd patch ran the tests for more than one hour without > any freeze. The sched_4bsd alone did not stop the freezes. I'm > now testing the switch_patch_v2 alone and it's looking good for > 55+ minutes of testing. Great ! I guess I should roll a cleaned up cumulative patch soon. Stephan From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 16:41:48 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D0B8F16A4CE for ; Fri, 1 Oct 2004 16:41:48 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 53ECD43D46 for ; Fri, 1 Oct 2004 16:41:48 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 19372 invoked from network); 1 Oct 2004 16:41:28 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 1 Oct 2004 16:41:28 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91GfRV7002534; Fri, 1 Oct 2004 18:41:27 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i91GfR6i002533; Fri, 1 Oct 2004 18:41:27 +0200 (CEST) (envelope-from pho) Date: Fri, 1 Oct 2004 18:41:27 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20041001164127.GA2468@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1096647194.27811.12.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 16:41:48 -0000 On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote: > On Fri, 2004-10-01 at 10:10, Peter Holm wrote: > > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote: > > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: > > > > > > > I also had overlooked > > > > http://www.holm.cc/stress/log/cons80.html > > > > Showing that my patch for kern_switch.c (switch_patch) has a bug. > > > > I will send an updated patch later today. > > > > > > OK - here is the promised patch. > > > > > > > For once I'm the bearer of good news. The switch_patch_v2 + the > > sched_4bsd patch ran the tests for more than one hour without > > any freeze. The sched_4bsd alone did not stop the freezes. I'm > > now testing the switch_patch_v2 alone and it's looking good for > > 55+ minutes of testing. > > Great ! > I guess I should roll a cleaned up cumulative patch soon. > > Stephan With switch_patch_v2 alone a freeze occured after more than one hour. So now I'm back to testing switch_patch_v2 + sched_4bsd. I'd let that run for a while, just to be sure. -- Peter Holm From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 16:52:01 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28E9716A4CE; Fri, 1 Oct 2004 16:52:01 +0000 (GMT) Received: from green.homeunix.org (pcp04368961pcs.nrockv01.md.comcast.net [69.140.212.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 71E3243D39; Fri, 1 Oct 2004 16:52:00 +0000 (GMT) (envelope-from green@green.homeunix.org) Received: from green.homeunix.org (green@localhost [127.0.0.1]) by green.homeunix.org (8.13.1/8.13.1) with ESMTP id i91Gpqo3006657; Fri, 1 Oct 2004 12:51:52 -0400 (EDT) (envelope-from green@green.homeunix.org) Received: (from green@localhost) by green.homeunix.org (8.13.1/8.13.1/Submit) id i91Gpp3O006656; Fri, 1 Oct 2004 12:51:51 -0400 (EDT) (envelope-from green) Date: Fri, 1 Oct 2004 12:51:51 -0400 From: Brian Fundakowski Feldman To: Stephan Uphoff Message-ID: <20041001165151.GJ997@green.homeunix.org> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1096647194.27811.12.camel@palm.tree.com> User-Agent: Mutt/1.5.6i cc: Peter Holm cc: Julian Elischer cc: John Baldwin cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 16:52:01 -0000 On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote: > On Fri, 2004-10-01 at 10:10, Peter Holm wrote: > > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote: > > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: > > > > > > > I also had overlooked > > > > http://www.holm.cc/stress/log/cons80.html > > > > Showing that my patch for kern_switch.c (switch_patch) has a bug. > > > > I will send an updated patch later today. > > > > > > OK - here is the promised patch. > > > > > > > For once I'm the bearer of good news. The switch_patch_v2 + the > > sched_4bsd patch ran the tests for more than one hour without > > any freeze. The sched_4bsd alone did not stop the freezes. I'm > > now testing the switch_patch_v2 alone and it's looking good for > > 55+ minutes of testing. > > Great ! > I guess I should roll a cleaned up cumulative patch soon. I suppose it might be a bit too hopeful, but is there any chance you're taking a look at SCHED_ULE problems, too? -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 17:56:46 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C9D9316A4CE; Fri, 1 Oct 2004 17:56:46 +0000 (GMT) Received: from ylpvm43.prodigy.net (ylpvm43-ext.prodigy.net [207.115.57.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8561143D45; Fri, 1 Oct 2004 17:56:46 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net [67.124.49.205])i91HurCE021701; Fri, 1 Oct 2004 13:56:54 -0400 Message-ID: <415D9A5B.10200@elischer.org> Date: Fri, 01 Oct 2004 10:56:43 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Stephan Uphoff References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> In-Reply-To: <1096647194.27811.12.camel@palm.tree.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Peter Holm cc: John Baldwin cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 17:56:46 -0000 Stephan Uphoff wrote: > On Fri, 2004-10-01 at 10:10, Peter Holm wrote: > >>On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote: >> >>>On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: >>> >>> >>>>I also had overlooked >>>> http://www.holm.cc/stress/log/cons80.html >>>>Showing that my patch for kern_switch.c (switch_patch) has a bug. >>>>I will send an updated patch later today. >>> >>>OK - here is the promised patch. >>> >> >>For once I'm the bearer of good news. The switch_patch_v2 + the >>sched_4bsd patch ran the tests for more than one hour without >>any freeze. The sched_4bsd alone did not stop the freezes. I'm >>now testing the switch_patch_v2 alone and it's looking good for >>55+ minutes of testing. > > > Great ! > I guess I should roll a cleaned up cumulative patch soon. > > Stephan I'm on the sidelines cheering.. I'm just coming off a hmmm.. 28 hour day from work.. ** need sleep ** From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 18:15:36 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 06F6D16A4F3 for ; Fri, 1 Oct 2004 18:15:35 +0000 (GMT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB4C843D45 for ; Fri, 1 Oct 2004 18:15:23 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) i91IFJvA019973; Fri, 1 Oct 2004 11:15:19 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i91IFJQq019971; Fri, 1 Oct 2004 11:15:19 -0700 (PDT) (envelope-from dillon) Date: Fri, 1 Oct 2004 11:15:19 -0700 (PDT) From: Matthew Dillon Message-Id: <200410011815.i91IFJQq019971@apollo.backplane.com> To: Stephan Uphoff References: <1096610130.21577.219.camel@palm.tree.com> <1096644427.25800.26.camel@palm.tree.com> cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: sched_switch (sched_4bsd) may be preempted X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 18:15:36 -0000 :The core is wrapped in the sched_lock. And since it is a spin lock it is :running in a critical section with interrupts disabled. : :The additional (recursive) critical_enter is just an abusive way to tell :maybe_preempt* that it should not immediately switch. :( Yes - eventually there should be a better way to do this) Umm. So you are saying that the code is intentionally breaking the API that should otherwise be protecting it from a preemptive thread switch by making the assumption that the only critical section count will come from an intentional scheduler mutex obtained ONLY for the purpose of calling sched_add(), and thus only two or more critical section counts means that the originally interrupted code desires no preemption? No wonder the scheduler is broken! That sounds like a recipe for disaster! But the solution is simple enough... move the maybe_preempt() call out of sched_add(). That is, remove the flawed assumption instead of adding further hacks to work around the flawed assumption. Instead, just conditionally set TDP_OWEPREEMPT there, don't actually try to switch. Then simply check for TDP_OWEPREEMPT either just after the scheduler mutex is released, or just before it would otherwise be released. The recursion is happening because the original code was badly designed, not because it is an inevitable consequence of implementing preemption. But this problem looks *REALLY* easy to fix... NOT by adding more hacks, but by fixing the originally flawed code. :> I would not reset TDP_OWEPREEMPT there. If I understand its function :> correctly you need to leave it intact in order to detect preemption :> request races against the scheduler. Since at that point newtd may :> be non-NULL and thus not cause another scheduling queue check to be :> made before the next switch, you cannot safely clear the flag where you :> are clearing it. : :This is all running in critical section and we just decided to switch :and either have or will pick the best thread. Interrupts are locked. The :additional critical section just prevents recursion problems by delaying :unwanted switches in maybe_preempt* . Resetting TDP_OWEPREEMPT is :perfectly save since we switch to the thread chosen while everything has :been locked. : : Stephan I sorta see that, but then again newtd is already set so you are assuming that no side effects have occured (from calling other scheduler related routines) since newtd was last chosen. But it is clear that there are a ton of opporunities for side effects either to occur or to occur in the future as the code continues to be modified, which makes this sort of assumption very dangerous and makes the resulting code very fragile. For example, if someone ever wanted to avoid physically disabling interrupts with a 'cli' in critical_enter() (and this is something that could very well happen since neither the original 4.x code or the DragonFly code disables interrupts in this case, as an optimization), that breaks all of your assumptions. In fact, the interrupt disablement is being done in the machine-dependant (MD) code, and you are assuming it in machine-independant (MI) code. This makes your assumption even MORE unsafe. Your goal, with all the problems that the scheduler is having now, should be to make the code more robust, NOT make it more fragile. -Matt Matthew Dillon From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 18:31:56 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 16D6E16A4CE for ; Fri, 1 Oct 2004 18:31:56 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id A178343D46 for ; Fri, 1 Oct 2004 18:31:55 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 23978 invoked by uid 89); 1 Oct 2004 18:31:54 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 1 Oct 2004 18:31:54 -0000 Received: (qmail 23958 invoked by uid 89); 1 Oct 2004 18:31:54 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 1 Oct 2004 18:31:54 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i91IVrmt028646; Fri, 1 Oct 2004 14:31:53 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Brian Fundakowski Feldman In-Reply-To: <20041001165151.GJ997@green.homeunix.org> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001165151.GJ997@green.homeunix.org> Content-Type: text/plain Message-Id: <1096655513.27811.66.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 01 Oct 2004 14:31:53 -0400 Content-Transfer-Encoding: 7bit cc: Peter Holm cc: Julian Elischer cc: John Baldwin cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 18:31:56 -0000 On Fri, 2004-10-01 at 12:51, Brian Fundakowski Feldman wrote: > On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote: > > On Fri, 2004-10-01 at 10:10, Peter Holm wrote: > > > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote: > > > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: > > > > > > > > > I also had overlooked > > > > > http://www.holm.cc/stress/log/cons80.html > > > > > Showing that my patch for kern_switch.c (switch_patch) has a bug. > > > > > I will send an updated patch later today. > > > > > > > > OK - here is the promised patch. > > > > > > > > > > For once I'm the bearer of good news. The switch_patch_v2 + the > > > sched_4bsd patch ran the tests for more than one hour without > > > any freeze. The sched_4bsd alone did not stop the freezes. I'm > > > now testing the switch_patch_v2 alone and it's looking good for > > > 55+ minutes of testing. > > > > Great ! > > I guess I should roll a cleaned up cumulative patch soon. > > I suppose it might be a bit too hopeful, but is there any chance you're > taking a look at SCHED_ULE problems, too? I have to get some work done on my own project before I run out of funding :-( This means I need to avoid looking at SCHED_ULE for at least the next week. Stephan From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 18:33:10 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1F8016A4D2 for ; Fri, 1 Oct 2004 18:33:10 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3131443D31 for ; Fri, 1 Oct 2004 18:33:10 +0000 (GMT) (envelope-from scottl@FreeBSD.org) Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.12.11/8.12.10) with ESMTP id i91IXsve078970; Fri, 1 Oct 2004 12:33:54 -0600 (MDT) (envelope-from scottl@FreeBSD.org) Message-ID: <415DA2A2.5010309@FreeBSD.org> Date: Fri, 01 Oct 2004 12:32:02 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040831 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Matthew Dillon References: <1096610130.21577.219.camel@palm.tree.com> <1096644427.25800.26.camel@palm.tree.com> <200410011815.i91IFJQq019971@apollo.backplane.com> In-Reply-To: <200410011815.i91IFJQq019971@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=0.0 required=3.8 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on pooker.samsco.org cc: Peter Holm cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: sched_switch (sched_4bsd) may be preempted X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 18:33:10 -0000 Matthew Dillon wrote: > :The core is wrapped in the sched_lock. And since it is a spin lock it is > :running in a critical section with interrupts disabled. > : > :The additional (recursive) critical_enter is just an abusive way to tell > :maybe_preempt* that it should not immediately switch. > :( Yes - eventually there should be a better way to do this) > > Umm. So you are saying that the code is intentionally breaking the > API that should otherwise be protecting it from a preemptive thread > switch by making the assumption that the only critical section count > will come from an intentional scheduler mutex obtained ONLY for the > purpose of calling sched_add(), and thus only two or more critical section > counts means that the originally interrupted code desires no preemption? > No wonder the scheduler is broken! That sounds like a recipe for disaster! > > But the solution is simple enough... move the maybe_preempt() call out > of sched_add(). That is, remove the flawed assumption instead of adding > further hacks to work around the flawed assumption. Instead, > just conditionally set TDP_OWEPREEMPT there, don't actually try to switch. > Then simply check for TDP_OWEPREEMPT either just after the scheduler > mutex is released, or just before it would otherwise be released. The > recursion is happening because the original code was badly designed, > not because it is an inevitable consequence of implementing preemption. This is a pretty bold assumption to make. I agree that when I first looked at this code a while back I was quite confused by the semantics, but after discussing it with John it makes a whole lot more sense. The whole design is based on being allowed to be switched away from if curthread->td_critnest is less than one. Simply holding a single spinlock or critical section will not prevent this, but this is only a problem from within the scheduler. If a thread enters the scheduler with a spinlock or critical section held, the act of the scheduler picking up sched_lock will bump up td_critnest and prevent preemption. This of course leaves a hole where the scheduler is entered without a spinlock held, and Stephen looks like he's cleaning up this hole, and doing a pretty reasonable job at it. The easy hack was to just wrap setrunqueue() in a critical section, but there were still a few problems with that. I agree that there are better ways to deal with this in the long run, but please don't distract us from making it work correctly right now. > But this problem looks *REALLY* easy to fix... NOT by adding more hacks, > but by fixing the originally flawed code. > > :> I would not reset TDP_OWEPREEMPT there. If I understand its function > :> correctly you need to leave it intact in order to detect preemption > :> request races against the scheduler. Since at that point newtd may > :> be non-NULL and thus not cause another scheduling queue check to be > :> made before the next switch, you cannot safely clear the flag where you > :> are clearing it. > : > :This is all running in critical section and we just decided to switch > :and either have or will pick the best thread. Interrupts are locked. The > :additional critical section just prevents recursion problems by delaying > :unwanted switches in maybe_preempt* . Resetting TDP_OWEPREEMPT is > :perfectly save since we switch to the thread chosen while everything has > :been locked. > : > : Stephan > > I sorta see that, but then again newtd is already set so you are > assuming that no side effects have occured (from calling other scheduler > related routines) since newtd was last chosen. But it is clear that > there are a ton of opporunities for side effects either to occur or > to occur in the future as the code continues to be modified, which makes > this sort of assumption very dangerous and makes the resulting code very > fragile. For example, if someone ever wanted to avoid physically > disabling interrupts with a 'cli' in critical_enter() (and this is > something that could very well happen since neither the original 4.x code > or the DragonFly code disables interrupts in this case, as an > optimization), that breaks all of your assumptions. In fact, the > interrupt disablement is being done in the machine-dependant (MD) code, > and you are assuming it in machine-independant (MI) code. This makes > your assumption even MORE unsafe. > > Your goal, with all the problems that the scheduler is having now, should > be to make the code more robust, NOT make it more fragile. Testing is showing that it is becoming more robust and following the original design goals. The real problem here is that the design wasn't well documented, and what was documented wasn't being read by most people. Scott From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 19:25:55 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ACAA516A4CF for ; Fri, 1 Oct 2004 19:25:55 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 0200C43D48 for ; Fri, 1 Oct 2004 19:25:55 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 57936 invoked from network); 1 Oct 2004 19:25:53 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 1 Oct 2004 19:25:53 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91JPqV7003465; Fri, 1 Oct 2004 21:25:52 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i91JPpuh003464; Fri, 1 Oct 2004 21:25:51 +0200 (CEST) (envelope-from pho) Date: Fri, 1 Oct 2004 21:25:51 +0200 From: Peter Holm To: Stephan Uphoff Message-ID: <20041001192551.GA3381@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="IS0zKkzwUGydFO0o" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1096647194.27811.12.camel@palm.tree.com> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 19:25:55 -0000 --IS0zKkzwUGydFO0o Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote: > On Fri, 2004-10-01 at 10:10, Peter Holm wrote: > > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote: > > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote: > > > > > > > I also had overlooked > > > > http://www.holm.cc/stress/log/cons80.html > > > > Showing that my patch for kern_switch.c (switch_patch) has a bug. > > > > I will send an updated patch later today. > > > > > > OK - here is the promised patch. > > > > > > > For once I'm the bearer of good news. The switch_patch_v2 + the > > sched_4bsd patch ran the tests for more than one hour without > > any freeze. The sched_4bsd alone did not stop the freezes. I'm > > now testing the switch_patch_v2 alone and it's looking good for > > 55+ minutes of testing. > > Great ! > I guess I should roll a cleaned up cumulative patch soon. > > Stephan I have now been running the stress test for more than 3½ hours, without any freezes. I have included the two of your changes I have been using. - Peter --IS0zKkzwUGydFO0o Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="stephan.combined.diff" Index: sys/kern/kern_switch.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_switch.c,v retrieving revision 1.95 diff -u -r1.95 kern_switch.c --- sys/kern/kern_switch.c 19 Sep 2004 18:34:17 -0000 1.95 +++ sys/kern/kern_switch.c 1 Oct 2004 19:06:03 -0000 @@ -315,6 +315,106 @@ td->td_priority = newpri; setrunqueue(td, SRQ_BORING); } + + +/* + * This function is called when a thread is about to be put on a + * ksegrp run queue because it has been made runnable or its + * priority has been adjusted and the ksegrp does not have a + * free kse slot. It determines if a thread from the same ksegrp + * should be preempted. If so, it tries to switch threads + * if the thread is on the same cpu or notifies another cpu that + * it should switch threads. + */ + +static void +maybe_preempt_in_ksegrp(struct thread *td) +{ +#if defined(SMP) + int highest_pri; + struct ksegrp *kg; + cpumask_t cpumask,dontuse; + struct pcpu *pc; + struct pcpu *highest_pcpu; + struct thread *running_thread; + +#ifndef FULL_PREEMPTION + int pri; + + pri = td->td_priority; + + if (!(pri >= PRI_MIN_ITHD && pri <= PRI_MAX_ITHD)) + return; +#endif + + mtx_assert(&sched_lock, MA_OWNED); + + running_thread = curthread; + +#if !defined(KSEG_PEEMPT_BEST_CPU) + if(running_thread->td_ksegrp != td->td_ksegrp) +#endif + { + kg = td->td_ksegrp; + + /* Anyone waiting in front ? */ + if(td != TAILQ_FIRST(&kg->kg_runq)) { + return; /* Yes - wait your turn*/ + } + highest_pri = td->td_priority; + highest_pcpu = NULL; + dontuse = stopped_cpus | idle_cpus_mask; + + /* Find a cpu with the worst priority that runs at thread from the + * same ksegrp - if multiple exist give first the last run cpu and then + * the current cpu priority + */ + + SLIST_FOREACH(pc, &cpuhead, pc_allcpu) { + cpumask = pc->pc_cpumask; + if ( (cpumask & dontuse) == 0 && + pc->pc_curthread->td_ksegrp == kg) { + if (pc->pc_curthread->td_priority > highest_pri) { + highest_pri = pc->pc_curthread->td_priority; + highest_pcpu = pc; + } else if (pc->pc_curthread->td_priority == highest_pri && + highest_pcpu != NULL) { + if (td->td_lastcpu == pc->pc_cpuid || + (PCPU_GET(cpumask) == cpumask && + td->td_lastcpu != highest_pcpu->pc_cpuid)) { + highest_pcpu = pc; + } + } + } + } + + /* Check if we need to preempt someone */ + if (highest_pcpu == NULL) return; + + if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) { + highest_pcpu->pc_curthread->td_flags |= TDF_NEEDRESCHED; + ipi_selected(highest_pcpu->pc_cpumask, IPI_AST); + return; + } + } +#else + KASSERT(running_thread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread")); +#endif + + if (td->td_priority > running_thread->td_priority) + return; +#ifdef PREEMPTION + if (running_thread->td_critnest > 1) { + running_thread->td_pflags |= TDP_OWEPREEMPT; + } else { + mi_switch(SW_INVOL, NULL); + } +#else + running_thread->td_flags |= TDF_NEEDRESCHED; +#endif + return; +} + int limitcount; void setrunqueue(struct thread *td, int flags) @@ -422,6 +522,7 @@ } else { CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d", td, td->td_ksegrp, td->td_proc->p_pid); + maybe_preempt_in_ksegrp(td); } } Index: sys/kern/sched_4bsd.c =================================================================== RCS file: /home/ncvs/src/sys/kern/sched_4bsd.c,v retrieving revision 1.65 diff -u -r1.65 sched_4bsd.c --- sys/kern/sched_4bsd.c 16 Sep 2004 07:12:59 -0000 1.65 +++ sys/kern/sched_4bsd.c 1 Oct 2004 19:06:03 -0000 @@ -823,6 +823,7 @@ TD_SET_CAN_RUN(td); else { td->td_ksegrp->kg_avail_opennings++; + critical_enter(); if (TD_IS_RUNNING(td)) { /* Put us back on the run queue (kse and all). */ setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING); @@ -834,6 +835,8 @@ */ slot_fill(td->td_ksegrp); } + critical_exit(); + td->td_pflags &= ~TDP_OWEPREEMPT; } if (newtd == NULL) newtd = choosethread(); --IS0zKkzwUGydFO0o-- From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 21:22:41 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4780616A4CE; Fri, 1 Oct 2004 21:22:41 +0000 (GMT) Received: from harmony.village.org (rover.village.org [168.103.84.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0678C43D1D; Fri, 1 Oct 2004 21:22:37 +0000 (GMT) (envelope-from imp@bsdimp.com) Received: from localhost (harmony.village.org [10.0.0.6]) by harmony.village.org (8.13.1/8.13.1) with ESMTP id i91LJMVr053371; Fri, 1 Oct 2004 15:19:22 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Fri, 01 Oct 2004 15:20:46 -0600 (MDT) Message-Id: <20041001.152046.32721253.imp@bsdimp.com> To: sah@softcardsystems.com From: "M. Warner Losh" In-Reply-To: References: <20041001.000452.99281901.imp@bsdimp.com> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: freebsd-arch@freebsd.org cc: re@freebsd.org cc: julian@elischer.org cc: ups@tree.com Subject: Re: AoE for 4.x X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 21:22:41 -0000 In message: Sam writes: : Is that block & char, or do i just need to specify the block? There's just one major number type in 4.x. I believe it is the character device, but every time I say it is foo, someone else proves me wrong. Warner From owner-freebsd-arch@FreeBSD.ORG Fri Oct 1 21:29:58 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D136E16A4CE; Fri, 1 Oct 2004 21:29:58 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7C00F43D1F; Fri, 1 Oct 2004 21:29:58 +0000 (GMT) (envelope-from scottl@FreeBSD.org) Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.12.11/8.12.10) with ESMTP id i91LUqvn079587; Fri, 1 Oct 2004 15:30:52 -0600 (MDT) (envelope-from scottl@FreeBSD.org) Message-ID: <415DCC1A.1030305@FreeBSD.org> Date: Fri, 01 Oct 2004 15:28:58 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040831 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "M. Warner Losh" References: <20041001.000452.99281901.imp@bsdimp.com> <20041001.152046.32721253.imp@bsdimp.com> In-Reply-To: <20041001.152046.32721253.imp@bsdimp.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=0.0 required=3.8 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on pooker.samsco.org cc: sah@softcardsystems.com cc: ups@tree.com cc: re@FreeBSD.org cc: julian@elischer.org cc: freebsd-arch@FreeBSD.org Subject: Re: AoE for 4.x X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 21:29:59 -0000 M. Warner Losh wrote: > In message: > Sam writes: > : Is that block & char, or do i just need to specify the block? > > There's just one major number type in 4.x. I believe it is the > character device, but every time I say it is foo, someone else proves > me wrong. > > Warner > Correct, only char majors for 4.x. The bdevmaj field is usually given '-1'. Scott From owner-freebsd-arch@FreeBSD.ORG Sat Oct 2 05:33:55 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4E72616A4CE for ; Sat, 2 Oct 2004 05:33:55 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id D1AF143D1F for ; Sat, 2 Oct 2004 05:33:54 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 93856 invoked from network); 2 Oct 2004 05:33:53 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 2 Oct 2004 05:33:53 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i925XqV7006388; Sat, 2 Oct 2004 07:33:52 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i925XpqN006387; Sat, 2 Oct 2004 07:33:51 +0200 (CEST) (envelope-from pho) Date: Sat, 2 Oct 2004 07:33:51 +0200 From: Peter Holm To: Peter Holm Message-ID: <20041002053351.GA6259@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001192551.GA3381@peter.osted.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20041001192551.GA3381@peter.osted.lan> User-Agent: Mutt/1.4.1i cc: "freebsd-arch@freebsd.org" cc: Julian Elischer cc: Stephan Uphoff Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 05:33:55 -0000 On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote: > > > > > > For once I'm the bearer of good news. The switch_patch_v2 + the > > > sched_4bsd patch ran the tests for more than one hour without > > > any freeze. The sched_4bsd alone did not stop the freezes. I'm > > > now testing the switch_patch_v2 alone and it's looking good for > > > 55+ minutes of testing. > > > > Great ! > > I guess I should roll a cleaned up cumulative patch soon. > > > > Stephan > > I have now been running the stress test for more than 3½ hours, without > any freezes. I have included the two of your changes I have been using. > > - Peter After more testing, I'm sad to report that the freeze is still there. The patch has however decreased the number of freezes dramatically: During 14 hours of testing 3 separate freezes has been seen: 24 Giant held for more than 60 sec by td 0xc244e900, pid 27683 31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098 79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531 - Peter From owner-freebsd-arch@FreeBSD.ORG Sat Oct 2 07:11:25 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1FB8F16A4CE; Sat, 2 Oct 2004 07:11:25 +0000 (GMT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 61D4F43D2F; Sat, 2 Oct 2004 07:11:24 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id i927B8ab082444; Sat, 2 Oct 2004 09:11:09 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: "M. Warner Losh" From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 01 Oct 2004 15:20:46 MDT." <20041001.152046.32721253.imp@bsdimp.com> Date: Sat, 02 Oct 2004 09:11:08 +0200 Message-ID: <82443.1096701068@critter.freebsd.dk> Sender: phk@critter.freebsd.dk cc: sah@softcardsystems.com cc: ups@tree.com cc: re@freebsd.org cc: julian@elischer.org cc: freebsd-arch@freebsd.org Subject: Re: AoE for 4.x X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 07:11:25 -0000 In message <20041001.152046.32721253.imp@bsdimp.com>, "M. Warner Losh" writes: >In message: > Sam writes: >: Is that block & char, or do i just need to specify the block? > >There's just one major number type in 4.x. I believe it is the >character device, but every time I say it is foo, someone else proves >me wrong. It is char. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sat Oct 2 13:55:58 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 472E016A4CE for ; Sat, 2 Oct 2004 13:55:58 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id C66F343D3F for ; Sat, 2 Oct 2004 13:55:57 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 16261 invoked by uid 89); 2 Oct 2004 13:55:56 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 2 Oct 2004 13:55:56 -0000 Received: (qmail 16251 invoked by uid 89); 2 Oct 2004 13:55:56 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 2 Oct 2004 13:55:56 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i92Dtrmt032748; Sat, 2 Oct 2004 09:55:54 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <20041002053351.GA6259@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001192551.GA3381@peter.osted.lan> <20041002053351.GA6259@peter.osted.lan> Content-Type: text/plain; charset=ISO-8859-1 Message-Id: <1096725353.27811.836.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Sat, 02 Oct 2004 09:55:53 -0400 Content-Transfer-Encoding: 8bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 13:55:58 -0000 On Sat, 2004-10-02 at 01:33, Peter Holm wrote: > On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote: > > > > > > > > For once I'm the bearer of good news. The switch_patch_v2 + the > > > > sched_4bsd patch ran the tests for more than one hour without > > > > any freeze. The sched_4bsd alone did not stop the freezes. I'm > > > > now testing the switch_patch_v2 alone and it's looking good for > > > > 55+ minutes of testing. > > > > > > Great ! > > > I guess I should roll a cleaned up cumulative patch soon. > > > > > > Stephan > > > > I have now been running the stress test for more than 3½ hours, without > > any freezes. I have included the two of your changes I have been using. > > > > - Peter > > After more testing, I'm sad to report that the freeze is still there. > The patch has however decreased the number of freezes dramatically: > > During 14 hours of testing 3 separate freezes has been seen: > > 24 Giant held for more than 60 sec by td 0xc244e900, pid 27683 > 31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098 > 79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531 You should also run with MUTEX_WAKE_ALL in your config file AND the mutex patch. I think this is it but will verify later today. Sorry -have to run - will roll the cumulative patch tonight (EST). Stephan From owner-freebsd-arch@FreeBSD.ORG Sat Oct 2 18:14:25 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 34A0D16A4CF for ; Sat, 2 Oct 2004 18:14:25 +0000 (GMT) Received: from ylpvm29.prodigy.net (ylpvm29-ext.prodigy.net [207.115.57.60]) by mx1.FreeBSD.org (Postfix) with ESMTP id 92A6343D3F for ; Sat, 2 Oct 2004 18:14:24 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net [67.124.49.205])i92IE5IV016379; Sat, 2 Oct 2004 14:14:06 -0400 Message-ID: <415EEFFE.5080309@elischer.org> Date: Sat, 02 Oct 2004 11:14:22 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Peter Holm References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001192551.GA3381@peter.osted.lan> <20041002053351.GA6259@peter.osted.lan> In-Reply-To: <20041002053351.GA6259@peter.osted.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: Stephan Uphoff cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 18:14:25 -0000 Peter Holm wrote: > On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote: > >>>>For once I'm the bearer of good news. The switch_patch_v2 + the >>>>sched_4bsd patch ran the tests for more than one hour without >>>>any freeze. The sched_4bsd alone did not stop the freezes. I'm >>>>now testing the switch_patch_v2 alone and it's looking good for >>>>55+ minutes of testing. >>> >>>Great ! >>>I guess I should roll a cleaned up cumulative patch soon. >>> >>> Stephan >> >>I have now been running the stress test for more than 3½ hours, without >>any freezes. I have included the two of your changes I have been using. >> >>- Peter > > > After more testing, I'm sad to report that the freeze is still there. > The patch has however decreased the number of freezes dramatically: > > During 14 hours of testing 3 separate freezes has been seen: > > 24 Giant held for more than 60 sec by td 0xc244e900, pid 27683 > 31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098 > 79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531 > > - Peter > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" When this happes, drop to debugger.. using: kdb_enter("Giant too long"); and dump out teh thread backtrace, and the output of show ktr iff you have ktr enabled.. (as we discussed before) From owner-freebsd-arch@FreeBSD.ORG Sat Oct 2 18:14:53 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 640FC16A4CE for ; Sat, 2 Oct 2004 18:14:53 +0000 (GMT) Received: from ylpvm29.prodigy.net (ylpvm29-ext.prodigy.net [207.115.57.60]) by mx1.FreeBSD.org (Postfix) with ESMTP id 13A3243D41 for ; Sat, 2 Oct 2004 18:14:53 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net [67.124.49.205])i92IEYIV016914; Sat, 2 Oct 2004 14:14:34 -0400 Message-ID: <415EF01B.7000800@elischer.org> Date: Sat, 02 Oct 2004 11:14:51 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: Peter Holm References: <1095468747.31297.241.camel@palm.tree.com> <1096477932.3733.1471.camel@palm.tree.com> <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001192551.GA3381@peter.osted.lan> <20041002053351.GA6259@peter.osted.lan> In-Reply-To: <20041002053351.GA6259@peter.osted.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit cc: Stephan Uphoff cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 18:14:53 -0000 Peter Holm wrote: > On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote: > >>>>For once I'm the bearer of good news. The switch_patch_v2 + the >>>>sched_4bsd patch ran the tests for more than one hour without >>>>any freeze. The sched_4bsd alone did not stop the freezes. I'm >>>>now testing the switch_patch_v2 alone and it's looking good for >>>>55+ minutes of testing. >>> >>>Great ! >>>I guess I should roll a cleaned up cumulative patch soon. >>> >>> Stephan >> >>I have now been running the stress test for more than 3½ hours, without >>any freezes. I have included the two of your changes I have been using. >> >>- Peter > > > After more testing, I'm sad to report that the freeze is still there. > The patch has however decreased the number of freezes dramatically: > > During 14 hours of testing 3 separate freezes has been seen: > > 24 Giant held for more than 60 sec by td 0xc244e900, pid 27683 > 31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098 > 79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531 oh yeah.... output of show locks too. > > - Peter > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Sat Oct 2 18:31:24 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4A9A816A4CE for ; Sat, 2 Oct 2004 18:31:24 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id C696E43D39 for ; Sat, 2 Oct 2004 18:31:23 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 12625 invoked from network); 2 Oct 2004 18:31:21 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 2 Oct 2004 18:31:21 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i92IVKXh001253; Sat, 2 Oct 2004 20:31:20 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i92IVKVC001252; Sat, 2 Oct 2004 20:31:20 +0200 (CEST) (envelope-from pho) Date: Sat, 2 Oct 2004 20:31:20 +0200 From: Peter Holm To: Julian Elischer Message-ID: <20041002183120.GA1202@peter.osted.lan> References: <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001192551.GA3381@peter.osted.lan> <20041002053351.GA6259@peter.osted.lan> <415EEFFE.5080309@elischer.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="T4sUOijqQbZv57TR" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <415EEFFE.5080309@elischer.org> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Stephan Uphoff cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 18:31:24 -0000 --T4sUOijqQbZv57TR Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Sat, Oct 02, 2004 at 11:14:22AM -0700, Julian Elischer wrote: > Peter Holm wrote: > >On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote: > > > >>>>For once I'm the bearer of good news. The switch_patch_v2 + the > >>>>sched_4bsd patch ran the tests for more than one hour without > >>>>any freeze. The sched_4bsd alone did not stop the freezes. I'm > >>>>now testing the switch_patch_v2 alone and it's looking good for > >>>>55+ minutes of testing. > >>> > >>>Great ! > >>>I guess I should roll a cleaned up cumulative patch soon. > >>> > >>> Stephan > >> > >>I have now been running the stress test for more than 3½ hours, without > >>any freezes. I have included the two of your changes I have been using. > >> > >>- Peter > > > > > >After more testing, I'm sad to report that the freeze is still there. > >The patch has however decreased the number of freezes dramatically: > > > >During 14 hours of testing 3 separate freezes has been seen: > > > >24 Giant held for more than 60 sec by td 0xc244e900, pid 27683 > >31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098 > >79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531 > > > >- Peter > >_______________________________________________ > >freebsd-arch@freebsd.org mailing list > >http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > When this happes, drop to debugger.. > > using: > > kdb_enter("Giant too long"); > > and dump out teh thread backtrace, and the output of show ktr > iff you have ktr enabled.. (as we discussed before) OK, right now I'm testing with all of Stephan's patches + the MUTEX_WAKE_ALL flag. Uptime is 3 3/4 hour and looking good. -- Peter Holm --T4sUOijqQbZv57TR Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="changes.diff" Index: sys/kern/kern_mutex.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_mutex.c,v retrieving revision 1.149 diff -u -r1.149 kern_mutex.c --- sys/kern/kern_mutex.c 2 Sep 2004 18:59:15 -0000 1.149 +++ sys/kern/kern_mutex.c 2 Oct 2004 14:46:26 -0000 @@ -492,7 +492,9 @@ if (v == MTX_CONTESTED) { MPASS(ts != NULL); m->mtx_lock = (uintptr_t)td | MTX_CONTESTED; + critical_enter(); turnstile_claim(ts); + critical_exit(); break; } #endif @@ -651,6 +653,9 @@ #else MPASS(ts != NULL); #endif + + critical_enter(); + #ifndef PREEMPTION /* XXX */ td1 = turnstile_head(ts); @@ -671,6 +676,7 @@ } #endif turnstile_unpend(ts); + critical_exit(); #ifndef PREEMPTION /* Index: sys/kern/kern_switch.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_switch.c,v retrieving revision 1.95 diff -u -r1.95 kern_switch.c --- sys/kern/kern_switch.c 19 Sep 2004 18:34:17 -0000 1.95 +++ sys/kern/kern_switch.c 2 Oct 2004 14:46:27 -0000 @@ -315,6 +315,106 @@ td->td_priority = newpri; setrunqueue(td, SRQ_BORING); } + + +/* + * This function is called when a thread is about to be put on a + * ksegrp run queue because it has been made runnable or its + * priority has been adjusted and the ksegrp does not have a + * free kse slot. It determines if a thread from the same ksegrp + * should be preempted. If so, it tries to switch threads + * if the thread is on the same cpu or notifies another cpu that + * it should switch threads. + */ + +static void +maybe_preempt_in_ksegrp(struct thread *td) +{ +#if defined(SMP) + int highest_pri; + struct ksegrp *kg; + cpumask_t cpumask,dontuse; + struct pcpu *pc; + struct pcpu *highest_pcpu; + struct thread *running_thread; + +#ifndef FULL_PREEMPTION + int pri; + + pri = td->td_priority; + + if (!(pri >= PRI_MIN_ITHD && pri <= PRI_MAX_ITHD)) + return; +#endif + + mtx_assert(&sched_lock, MA_OWNED); + + running_thread = curthread; + +#if !defined(KSEG_PEEMPT_BEST_CPU) + if(running_thread->td_ksegrp != td->td_ksegrp) +#endif + { + kg = td->td_ksegrp; + + /* Anyone waiting in front ? */ + if(td != TAILQ_FIRST(&kg->kg_runq)) { + return; /* Yes - wait your turn*/ + } + highest_pri = td->td_priority; + highest_pcpu = NULL; + dontuse = stopped_cpus | idle_cpus_mask; + + /* Find a cpu with the worst priority that runs at thread from the + * same ksegrp - if multiple exist give first the last run cpu and then + * the current cpu priority + */ + + SLIST_FOREACH(pc, &cpuhead, pc_allcpu) { + cpumask = pc->pc_cpumask; + if ( (cpumask & dontuse) == 0 && + pc->pc_curthread->td_ksegrp == kg) { + if (pc->pc_curthread->td_priority > highest_pri) { + highest_pri = pc->pc_curthread->td_priority; + highest_pcpu = pc; + } else if (pc->pc_curthread->td_priority == highest_pri && + highest_pcpu != NULL) { + if (td->td_lastcpu == pc->pc_cpuid || + (PCPU_GET(cpumask) == cpumask && + td->td_lastcpu != highest_pcpu->pc_cpuid)) { + highest_pcpu = pc; + } + } + } + } + + /* Check if we need to preempt someone */ + if (highest_pcpu == NULL) return; + + if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) { + highest_pcpu->pc_curthread->td_flags |= TDF_NEEDRESCHED; + ipi_selected(highest_pcpu->pc_cpumask, IPI_AST); + return; + } + } +#else + KASSERT(running_thread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread")); +#endif + + if (td->td_priority > running_thread->td_priority) + return; +#ifdef PREEMPTION + if (running_thread->td_critnest > 1) { + running_thread->td_pflags |= TDP_OWEPREEMPT; + } else { + mi_switch(SW_INVOL, NULL); + } +#else + running_thread->td_flags |= TDF_NEEDRESCHED; +#endif + return; +} + int limitcount; void setrunqueue(struct thread *td, int flags) @@ -422,6 +522,7 @@ } else { CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d", td, td->td_ksegrp, td->td_proc->p_pid); + maybe_preempt_in_ksegrp(td); } } Index: sys/kern/sched_4bsd.c =================================================================== RCS file: /home/ncvs/src/sys/kern/sched_4bsd.c,v retrieving revision 1.65 diff -u -r1.65 sched_4bsd.c --- sys/kern/sched_4bsd.c 16 Sep 2004 07:12:59 -0000 1.65 +++ sys/kern/sched_4bsd.c 2 Oct 2004 14:46:29 -0000 @@ -823,6 +823,7 @@ TD_SET_CAN_RUN(td); else { td->td_ksegrp->kg_avail_opennings++; + critical_enter(); if (TD_IS_RUNNING(td)) { /* Put us back on the run queue (kse and all). */ setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING); @@ -834,6 +835,8 @@ */ slot_fill(td->td_ksegrp); } + critical_exit(); + td->td_pflags &= ~TDP_OWEPREEMPT; } if (newtd == NULL) newtd = choosethread(); --- sys/i386/conf/GENERIC Sun Sep 19 02:52:22 2004 +++ sys/i386/conf/PHO Sat Oct 2 16:06:19 2004 @@ -66,6 +66,7 @@ options KDB # Enable kernel debugger support. options DDB # Support DDB. options GDB # Support remote GDB. +options BREAK_TO_DEBUGGER options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS options WITNESS # Enable checks to detect deadlocks and cycles @@ -285,3 +286,4 @@ device firewire # FireWire bus code device sbp # SCSI over FireWire (Requires scbus and da) device fwe # Ethernet over FireWire (non-standard!) +options MUTEX_WAKE_ALL # Needed do not remove --T4sUOijqQbZv57TR-- From owner-freebsd-arch@FreeBSD.ORG Sat Oct 2 23:37:41 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5812E16A4CE for ; Sat, 2 Oct 2004 23:37:41 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id D465C43D1F for ; Sat, 2 Oct 2004 23:37:40 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 26509 invoked by uid 89); 2 Oct 2004 23:37:39 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 2 Oct 2004 23:37:39 -0000 Received: (qmail 26487 invoked by uid 89); 2 Oct 2004 23:37:39 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 2 Oct 2004 23:37:39 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id i92Nbcmt034963; Sat, 2 Oct 2004 19:37:38 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Holm In-Reply-To: <20041002183120.GA1202@peter.osted.lan> References: <1096489576.3733.1868.camel@palm.tree.com> <200409291652.29990.jhb@FreeBSD.org> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <1096608201.21577.203.camel@palm.tree.com> <20041001141040.GA1556@peter.osted.lan> <1096647194.27811.12.camel@palm.tree.com> <20041001192551.GA3381@peter.osted.lan> <415EEFFE.5080309@elischer.org> <20041002183120.GA1202@peter.osted.lan> Content-Type: text/plain Message-Id: <1096760257.34527.14.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Sat, 02 Oct 2004 19:37:37 -0400 Content-Transfer-Encoding: 7bit cc: Julian Elischer cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 23:37:41 -0000 On Sat, 2004-10-02 at 14:31, Peter Holm wrote: > OK, right now I'm testing with all of Stephan's patches + the > MUTEX_WAKE_ALL flag. Uptime is 3 3/4 hour and looking good. Great. Your attached diff contained all the fixes needed and I don't see the need to post a cumulative patch. The only thing left to do is migrate a critical sections from kern_mutex.c to subr_turnstile.c for readability. (no functional changes) Maybe it would also better to just force MUTEX_WAKE_ALL in kern_mutex.c (#ifndef MUTEX_WAKE_ALL \n#define MUTEX_WAKE_ALL\n#endif) to avoid temporary configuration file pollution? Stephan