From owner-freebsd-arch@FreeBSD.ORG  Sun Sep 26 03:50:48 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8D6A216A4CE
	for <freebsd-arch@freebsd.org>; Sun, 26 Sep 2004 03:50:47 +0000 (GMT)
Received: from ylpvm43.prodigy.net (ylpvm43-ext.prodigy.net [207.115.57.74])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7A08543D39
	for <freebsd-arch@freebsd.org>; Sun, 26 Sep 2004 03:50:47 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net
	[67.124.49.205])i8Q3orCE026142;	Sat, 25 Sep 2004 23:50:54 -0400
Message-ID: <41563C95.2020501@elischer.org>
Date: Sat, 25 Sep 2004 20:50:45 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Stephan Uphoff <ups@tree.com>
References: <1095468747.31297.241.camel@palm.tree.com>
	<414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com>
In-Reply-To: <1096135220.53798.17754.camel@palm.tree.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: peter@holm.cc
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Sep 2004 03:50:48 -0000

Stephan Uphoff wrote:

>>Maybe something brutal like:
>>	if ((curthread->td_ksegrp == kg) &&
>>	   (td->td_priority > curthread->td_priority))
>>		curthread->td_flags |= TDF_NEEDRESCHED;
>>
>>in setrunqueue for
>>the else case of "if (kg->kg_avail_opennings > 0)"
>>would do the trick (without preemption) for the easy but probably more
>>common cases?
>>
>>Maybe I can find some time next week to think about a clean
>>fix. I find it always helpful having a small task in mind while reading
>>source code.
> 
> 
> I wrote a fix that should cover all cases.
> However I would like to test it a little bit before posting the patch.
> Is there any multi-threaded kernel torture program that you can
> recommend?


Peter Holm (CC'd) has a really cool set of torture tests.
he has also seen all sorts of failures others have not (yet) triggered. :-)

I'm 'busy" for the next couple of weeks so you may want to communicate directly 
with him and see if you and he together can figure out some of the things he's
been seeing :-)

his tests are at:
http://www.holm.cc/stress/src/stress.tgz

> 
> Thanks
> 
> 	Stephan
> 


From owner-freebsd-arch@FreeBSD.ORG  Sun Sep 26 07:52:28 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C631F16A4CE
	for <freebsd-arch@freebsd.org>; Sun, 26 Sep 2004 07:52:28 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 4DB1B43D58
	for <freebsd-arch@freebsd.org>; Sun, 26 Sep 2004 07:52:26 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 57081 invoked from network); 26 Sep 2004 07:52:24 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 26 Sep 2004 07:52:24 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8Q7qLCs086060;
	Sun, 26 Sep 2004 09:52:22 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i8Q7qIuj086059;
	Sun, 26 Sep 2004 09:52:18 +0200 (CEST)
	(envelope-from pho)
Date: Sun, 26 Sep 2004 09:52:18 +0200
From: Peter Holm <peter@holm.cc>
To: Julian Elischer <julian@elischer.org>
Message-ID: <20040926075218.GA85983@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com> <41563C95.2020501@elischer.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <41563C95.2020501@elischer.org>
User-Agent: Mutt/1.4.1i
cc: peter@holm.cc
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Sep 2004 07:52:28 -0000

On Sat, Sep 25, 2004 at 08:50:45PM -0700, Julian Elischer wrote:
> Stephan Uphoff wrote:
> 
> >>Maybe something brutal like:
> >>	if ((curthread->td_ksegrp == kg) &&
> >>	   (td->td_priority > curthread->td_priority))
> >>		curthread->td_flags |= TDF_NEEDRESCHED;
> >>
> >>in setrunqueue for
> >>the else case of "if (kg->kg_avail_opennings > 0)"
> >>would do the trick (without preemption) for the easy but probably more
> >>common cases?
> >>
> >>Maybe I can find some time next week to think about a clean
> >>fix. I find it always helpful having a small task in mind while reading
> >>source code.
> >
> >
> >I wrote a fix that should cover all cases.
> >However I would like to test it a little bit before posting the patch.
> >Is there any multi-threaded kernel torture program that you can
> >recommend?
> 
> 
> Peter Holm (CC'd) has a really cool set of torture tests.
> he has also seen all sorts of failures others have not (yet) triggered. :-)
> 
> I'm 'busy" for the next couple of weeks so you may want to communicate 
> directly with him and see if you and he together can figure out some of the 
> things he's
> been seeing :-)
> 
> his tests are at:
> http://www.holm.cc/stress/src/stress.tgz
> 
> >
> >Thanks
> >
> >	Stephan
> >

I'll be glad to test any patches.
-- 
Peter Holm

From owner-freebsd-arch@FreeBSD.ORG  Sun Sep 26 17:26:35 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DAEC116A4CE
	for <arch@freebsd.org>; Sun, 26 Sep 2004 17:26:35 +0000 (GMT)
Received: from harmony.village.org (rover.village.org [168.103.84.182])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8229643D48
	for <arch@freebsd.org>; Sun, 26 Sep 2004 17:26:35 +0000 (GMT)
	(envelope-from imp@bsdimp.com)
Received: from localhost (harmony.village.org [10.0.0.6])
	by harmony.village.org (8.13.1/8.13.1) with ESMTP id i8QHNYRr067513;
	Sun, 26 Sep 2004 11:23:35 -0600 (MDT)
	(envelope-from imp@bsdimp.com)
Date: Sun, 26 Sep 2004 11:24:43 -0600 (MDT)
Message-Id: <20040926.112443.96451447.imp@bsdimp.com>
To: phk@phk.freebsd.dk
From: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <41458.1096016465@critter.freebsd.dk>
References: <41458.1096016465@critter.freebsd.dk>
X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: arch@freebsd.org
Subject: Re: I'm counting my threads, one, two, three, four, five... [1]
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Sep 2004 17:26:36 -0000

In message: <41458.1096016465@critter.freebsd.dk>
            "Poul-Henning Kamp" <phk@phk.freebsd.dk> writes:
: I belive this gives us the handle we need to unload drivers and remove
: hardware without panicing in the lower layers of the kernel.  The
: higher layers may still have a thing or two to learn in this respect.

I've been extremely worried about the dev interface into the driver
for a long time.  This proposal looks excellent, and I can't think of
anything else to add to it.  It is good to see all the concerns in
this area you and I have talked about over the years appear to be
addressed by this.

The biggest problem now is that I need to address the device_t level
locking.  I think with network layer locking and dev_t locking being
under control, it is close to time to tackle it.

The other big problem may happen in the device detach routines of
bus drivers not being happy with new-found sleeps.

However, these two problems always existed :-(

Well done!

Warner

From owner-freebsd-arch@FreeBSD.ORG  Mon Sep 27 13:05:08 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5862916A4CE
	for <freebsd-arch@freebsd.org>; Mon, 27 Sep 2004 13:05:08 +0000 (GMT)
Received: from green.homeunix.org (pcp04368961pcs.nrockv01.md.comcast.net
	[69.140.212.7])	by mx1.FreeBSD.org (Postfix) with ESMTP id C417B43D39
	for <freebsd-arch@freebsd.org>; Mon, 27 Sep 2004 13:05:07 +0000 (GMT)
	(envelope-from green@green.homeunix.org)
Received: from green.homeunix.org (green@localhost [127.0.0.1])
	by green.homeunix.org (8.13.1/8.13.1) with ESMTP id i8RD54Bw022815;
	Mon, 27 Sep 2004 09:05:04 -0400 (EDT)
	(envelope-from green@green.homeunix.org)
Received: (from green@localhost)
	by green.homeunix.org (8.13.1/8.13.1/Submit) id i8RD54U5022814;
	Mon, 27 Sep 2004 09:05:04 -0400 (EDT)
	(envelope-from green)
Date: Mon, 27 Sep 2004 09:05:03 -0400
From: Brian Fundakowski Feldman <green@freebsd.org>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20040927130503.GD1164@green.homeunix.org>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1096135220.53798.17754.camel@palm.tree.com>
User-Agent: Mutt/1.5.6i
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2004 13:05:08 -0000

On Sat, Sep 25, 2004 at 02:00:20PM -0400, Stephan Uphoff wrote:
> On Sat, 2004-09-18 at 13:42, Stephan Uphoff wrote:
> > On Fri, 2004-09-17 at 21:20, Julian Elischer wrote:
> > > Stephan Uphoff wrote:
> > > >I am also stomped by the special case of adding a thread X with better
> > > >priority than the current thread to the runqueue if they belong to the
> > > >same ksegroup. In this case both kg_last_assigned and kg_avail_opennings
> > > >might be zero and setrunqueue() will not call sched_add().
> > > >Because of this it looks like the current thread will neither be
> > > >preempted not will TDF_NEEDRESCHED be set to force rescheduling at the
> > > >kernel boundary.
> > > >This situation should resolve itself at the next sched_switch - however
> > > >this might take a long time. (Especially if essential interrupt threads
> > > >are blocked by mutexes held by thread X)
> > > >
> > >
> > > you are correct. I am not yet  preempting a running thread with a lesser
> > > priority if they are siblings
> > > (unless there is a slot available) Thsi is not becasue I don't want to
> > > do it, but simply because it has not been done yet..
> > > we did have NO preemption, so having "some" preemption is still better
> > > than where we were.
> > >  Special case code to check curthread for a preemption could be done but
> > > at the moment  the decision code for
> > > whether to preempt or not is in maybe_preempt() and I don't want  to
> > > duplicate that. it is on th edrawing board though.
> > > The other thing is, that even if we should be able to preempt a running
> > > thread, there is no guarantee that it is on THIS
> > > CPU.  It may be on another CPU and that gets nasty in a hurry.
> > 
> > Yes .. this could get nasty.
> > This happens when the thread is bound to another cpu or someone changed
> > thr_concurrency - otherwise the current thread must be a sibling right ?
> > 
> > Maybe something brutal like:
> > 	if ((curthread->td_ksegrp == kg) &&
> > 	   (td->td_priority > curthread->td_priority))
> > 		curthread->td_flags |= TDF_NEEDRESCHED;
> > 
> > in setrunqueue for
> > the else case of "if (kg->kg_avail_opennings > 0)"
> > would do the trick (without preemption) for the easy but probably more
> > common cases?
> > 
> > Maybe I can find some time next week to think about a clean
> > fix. I find it always helpful having a small task in mind while reading
> > source code.
> 
> I wrote a fix that should cover all cases.
> However I would like to test it a little bit before posting the patch.
> Is there any multi-threaded kernel torture program that you can
> recommend?

It wasn't particularly designed as such but the utility in the
src/tools/regression/gaithrstress/ directory is very quick at
provoking thread/SMP/scheduler bugs if you give it a high thread
count (and use a pretty fast DNS, I suppose).

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\

From owner-freebsd-arch@FreeBSD.ORG  Mon Sep 27 14:15:57 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8E9E316A4CE
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 14:15:57 +0000 (GMT)
Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 519BC43D1F
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 14:15:57 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 27172 invoked from network); 27 Sep 2004 14:15:56 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <julian@elischer.org>; 27 Sep 2004 14:15:56 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8REFoCE012318;
	Mon, 27 Sep 2004 10:15:52 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Mon, 27 Sep 2004 10:16:13 -0400
User-Agent: KMail/1.6.2
References: <1096133353.53798.17613.camel@palm.tree.com>
In-Reply-To: <1096133353.53798.17613.camel@palm.tree.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200409271016.13345.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
cc: Julian Elischer <julian@elischer.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2004 14:15:57 -0000

On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote:
> When a thread is about to return to user space it resets its priority to
> the user level priority.
> However after lowering the permission its priority it needs to check if
> its priority is still better than all other runable threads.
> This is currently not implemented.
> Without the check the thread can block kernel or user threads with
> better priority until a switch is forced by by an interrupt.
>
> The attached patch checks the relevant runqueues and threads without
> slots in the same ksegrp and forces a thread switch if the currently
> running thread is no longer the best thread to run after it changed its
> priority.
>
> The patch should improve interactive response under heavy load somewhat.
> It needs a lot of testing.

Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED based 
on on a comparison against user_pri rather than td_priority inside of 
sched_add()?  Having the flag set by sched_add() is supposed to make this 
sort of check unnecessary.  Even 4.x has the same bug I think as a process 
can make another process runnable after it's priority has been boosted by a 
tsleep() and need_resched() is only called based on a comparison of p_pri.  
Ah, 4.x doesn't have the bug because it caches the priority of curproc when 
it enters the kernel and compares against that.  Thus, I think the correct 
fix is more like this:

Index: sched_4bsd.c
===================================================================
RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v
retrieving revision 1.63
diff -u -r1.63 sched_4bsd.c
--- sched_4bsd.c        11 Sep 2004 10:07:22 -0000      1.63
+++ sched_4bsd.c        27 Sep 2004 14:12:03 -0000
@@ -272,7 +272,7 @@
 {

        mtx_assert(&sched_lock, MA_OWNED);
-       if (td->td_priority < curthread->td_priority)
+       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
                curthread->td_flags |= TDF_NEEDRESCHED;
 }

Index: sched_ule.c
===================================================================
RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
retrieving revision 1.129
diff -u -r1.129 sched_ule.c
--- sched_ule.c 11 Sep 2004 10:07:22 -0000      1.129
+++ sched_ule.c 27 Sep 2004 14:13:01 -0000
@@ -723,7 +723,7 @@
         */
        pcpu = pcpu_find(cpu);
        td = pcpu->pc_curthread;
-       if (ke->ke_thread->td_priority < td->td_priority ||
+       if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri ||
            td == pcpu->pc_idlethread) {
                td->td_flags |= TDF_NEEDRESCHED;
                ipi_selected(1 << cpu, IPI_AST);

An even better fix might be to fix td_base_pri by having it be set on kernel 
entry similar to how 4.x sets curpriority.  The above fix should be 
sufficient for now, however.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Mon Sep 27 17:20:30 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8407216A4CF
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 17:20:30 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id B3DCC43D3F
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 17:20:29 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 18258 invoked by uid 89); 27 Sep 2004 17:20:28 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 27 Sep 2004 17:20:28 -0000
Received: (qmail 18241 invoked by uid 89); 27 Sep 2004 17:20:28 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 27 Sep 2004 17:20:28 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8RHKQmt003500;
	Mon, 27 Sep 2004 13:20:27 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200409271016.13345.jhb@FreeBSD.org>
References: <1096133353.53798.17613.camel@palm.tree.com>
	 <200409271016.13345.jhb@FreeBSD.org>
Content-Type: text/plain
Message-Id: <1096305626.95152.163.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Mon, 27 Sep 2004 13:20:26 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2004 17:20:30 -0000

On Mon, 2004-09-27 at 10:16, John Baldwin wrote:
> On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote:
> > When a thread is about to return to user space it resets its priority to
> > the user level priority.
> > However after lowering the permission its priority it needs to check if
> > its priority is still better than all other runable threads.
> > This is currently not implemented.
> > Without the check the thread can block kernel or user threads with
> > better priority until a switch is forced by by an interrupt.
> >
> > The attached patch checks the relevant runqueues and threads without
> > slots in the same ksegrp and forces a thread switch if the currently
> > running thread is no longer the best thread to run after it changed its
> > priority.
> >
> > The patch should improve interactive response under heavy load somewhat.
> > It needs a lot of testing.
> 
> Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED based 
> on on a comparison against user_pri rather than td_priority inside of 
> sched_add()?  Having the flag set by sched_add() is supposed to make this 
> sort of check unnecessary.  Even 4.x has the same bug I think as a process 
> can make another process runnable after it's priority has been boosted by a 
> tsleep() and need_resched() is only called based on a comparison of p_pri.  
> Ah, 4.x doesn't have the bug because it caches the priority of curproc when 
> it enters the kernel and compares against that.  Thus, I think the correct 
> fix is more like this:
> 
> Index: sched_4bsd.c
> ===================================================================
> RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v
> retrieving revision 1.63
> diff -u -r1.63 sched_4bsd.c
> --- sched_4bsd.c        11 Sep 2004 10:07:22 -0000      1.63
> +++ sched_4bsd.c        27 Sep 2004 14:12:03 -0000
> @@ -272,7 +272,7 @@
>  {
> 
>         mtx_assert(&sched_lock, MA_OWNED);
> -       if (td->td_priority < curthread->td_priority)
> +       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
>                 curthread->td_flags |= TDF_NEEDRESCHED;
>  }
> 
> Index: sched_ule.c
> ===================================================================
> RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
> retrieving revision 1.129
> diff -u -r1.129 sched_ule.c
> --- sched_ule.c 11 Sep 2004 10:07:22 -0000      1.129
> +++ sched_ule.c 27 Sep 2004 14:13:01 -0000
> @@ -723,7 +723,7 @@
>          */
>         pcpu = pcpu_find(cpu);
>         td = pcpu->pc_curthread;
> -       if (ke->ke_thread->td_priority < td->td_priority ||
> +       if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri ||
>             td == pcpu->pc_idlethread) {
>                 td->td_flags |= TDF_NEEDRESCHED;
>                 ipi_selected(1 << cpu, IPI_AST);
> 
> An even better fix might be to fix td_base_pri by having it be set on kernel 
> entry similar to how 4.x sets curpriority.  The above fix should be 
> sufficient for now, however.

I don't think that this is enough since TDF_NEEDRESCHED is thread
specific and not cpu specific.
However the thread marked with TDF_NEEDRESCHED might not be the next
thread leaving the kernel.
( Can't really talk about ULE since I am trying to avoid looking at
another shiny irresistible time sink this week ;-)

I think we agree that that td_priority should be set to td_base_pri on
kernel entry. Since td_base_pri is changed by sleep and condvar
functions it should also be reset on kernel entry. (Probably from a new
ksegrp field). Condvar waits should currently non cause the base
priority to change to the current priority of the thread - otherwise
td_base_pri could get stuck at a really bad user priority.
( td->td_base_pri might end up being worse than
td->td_ksegrp->kg_user_pri when the ksegrp priority improves)

	Stephan


From owner-freebsd-arch@FreeBSD.ORG  Mon Sep 27 18:56:56 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DCEF016A4CE
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 18:56:56 +0000 (GMT)
Received: from mail3.speakeasy.net (mail3.speakeasy.net [216.254.0.203])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 96A1E43D1D
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 18:56:56 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 7086 invoked from network); 27 Sep 2004 18:56:56 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <julian@elischer.org>; 27 Sep 2004 18:56:42 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8RIsVY8014057;
	Mon, 27 Sep 2004 14:56:34 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: Stephan Uphoff <ups@tree.com>
Date: Mon, 27 Sep 2004 14:43:22 -0400
User-Agent: KMail/1.6.2
References: <1096133353.53798.17613.camel@palm.tree.com>
	<200409271016.13345.jhb@FreeBSD.org>
	<1096305626.95152.163.camel@palm.tree.com>
In-Reply-To: <1096305626.95152.163.camel@palm.tree.com>
MIME-Version: 1.0
Content-Disposition: inline
Message-Id: <200409271443.22667.jhb@FreeBSD.org>
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2004 18:56:57 -0000

On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote:
> On Mon, 2004-09-27 at 10:16, John Baldwin wrote:
> > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote:
> > > When a thread is about to return to user space it resets its priority
> > > to the user level priority.
> > > However after lowering the permission its priority it needs to check if
> > > its priority is still better than all other runable threads.
> > > This is currently not implemented.
> > > Without the check the thread can block kernel or user threads with
> > > better priority until a switch is forced by by an interrupt.
> > >
> > > The attached patch checks the relevant runqueues and threads without
> > > slots in the same ksegrp and forces a thread switch if the currently
> > > running thread is no longer the best thread to run after it changed its
> > > priority.
> > >
> > > The patch should improve interactive response under heavy load
> > > somewhat. It needs a lot of testing.
> >
> > Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED
> > based on on a comparison against user_pri rather than td_priority inside
> > of sched_add()?  Having the flag set by sched_add() is supposed to make
> > this sort of check unnecessary.  Even 4.x has the same bug I think as a
> > process can make another process runnable after it's priority has been
> > boosted by a tsleep() and need_resched() is only called based on a
> > comparison of p_pri. Ah, 4.x doesn't have the bug because it caches the
> > priority of curproc when it enters the kernel and compares against that. 
> > Thus, I think the correct fix is more like this:
> >
> > Index: sched_4bsd.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v
> > retrieving revision 1.63
> > diff -u -r1.63 sched_4bsd.c
> > --- sched_4bsd.c        11 Sep 2004 10:07:22 -0000      1.63
> > +++ sched_4bsd.c        27 Sep 2004 14:12:03 -0000
> > @@ -272,7 +272,7 @@
> >  {
> >
> >         mtx_assert(&sched_lock, MA_OWNED);
> > -       if (td->td_priority < curthread->td_priority)
> > +       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
> >                 curthread->td_flags |= TDF_NEEDRESCHED;
> >  }
> >
> > Index: sched_ule.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
> > retrieving revision 1.129
> > diff -u -r1.129 sched_ule.c
> > --- sched_ule.c 11 Sep 2004 10:07:22 -0000      1.129
> > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000
> > @@ -723,7 +723,7 @@
> >          */
> >         pcpu = pcpu_find(cpu);
> >         td = pcpu->pc_curthread;
> > -       if (ke->ke_thread->td_priority < td->td_priority ||
> > +       if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri ||
> >             td == pcpu->pc_idlethread) {
> >                 td->td_flags |= TDF_NEEDRESCHED;
> >                 ipi_selected(1 << cpu, IPI_AST);
> >
> > An even better fix might be to fix td_base_pri by having it be set on
> > kernel entry similar to how 4.x sets curpriority.  The above fix should
> > be sufficient for now, however.
>
> I don't think that this is enough since TDF_NEEDRESCHED is thread
> specific and not cpu specific.

Hmm, it is CPU specific in 4.x.  It could be changed back to being a per-cpu 
flag easily.

> However the thread marked with TDF_NEEDRESCHED might not be the next
> thread leaving the kernel.
> ( Can't really talk about ULE since I am trying to avoid looking at
> another shiny irresistible time sink this week ;-)
>
> I think we agree that that td_priority should be set to td_base_pri on
> kernel entry. Since td_base_pri is changed by sleep and condvar
> functions it should also be reset on kernel entry. (Probably from a new
> ksegrp field). Condvar waits should currently non cause the base
> priority to change to the current priority of the thread - otherwise
> td_base_pri could get stuck at a really bad user priority.
> ( td->td_base_pri might end up being worse than
> td->td_ksegrp->kg_user_pri when the ksegrp priority improves)

Well, I think instead that td_base_pri should be set to td_priority on kernel 
entry (rather than the other way around).  td_priority should be unchanged 
just because it enters the kernel.  I think the sleep functions could then 
leave td_base_pri alone.  (I think setting it there is wrong because 
td_base_pri is not quite the same as curpriority in 4.x.)  What td_base_pri 
is really supposed to provide, btw, is the priority that the thread should go 
back to once it has unlocked a mutex and had its priorty boosted while it 
held the mutex.  Arguably it should just be using kg_user_pri for this, but 
then you loose priority "boosts" from tsleep(), which is why td_base_pri is 
set in msleep().  I guess what should happen is something more like this:

kernel_entry()
{
	KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri);
	td->td_base_pri = td->td_priority;
}

msleep()
{
	sched_prio(...);
	td_base_pri = td->td_priority;
}

The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous 
e-mail.  Also, in sched_prio(), if our priority is ever raised (numerically, 
logically less important), we should set TDF_NEEDRESCHED since we may need to 
switch (4.x does this in maybe_needresched()).  Then, TDF_NEEDRESCHED could 
become a per-cpu flag and have it not be cleared in mi_switch() but be 
cleared only in userret().  Hmm, I think all of the TDF_NEEDRESCHED handling 
actually beings in sched_userret() btw.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Mon Sep 27 21:28:11 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4FDBE16A4CE
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 21:28:11 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id A451D43D46
	for <freebsd-arch@FreeBSD.org>; Mon, 27 Sep 2004 21:28:10 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 18284 invoked by uid 89); 27 Sep 2004 21:28:09 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 27 Sep 2004 21:28:09 -0000
Received: (qmail 18262 invoked by uid 89); 27 Sep 2004 21:28:09 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 27 Sep 2004 21:28:09 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8RLS7mt004724;
	Mon, 27 Sep 2004 17:28:07 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200409271443.22667.jhb@FreeBSD.org>
References: <1096133353.53798.17613.camel@palm.tree.com>
	 <200409271016.13345.jhb@FreeBSD.org>
	 <1096305626.95152.163.camel@palm.tree.com>
	 <200409271443.22667.jhb@FreeBSD.org>
Content-Type: text/plain
Message-Id: <1096320486.3733.58.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Mon, 27 Sep 2004 17:28:07 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2004 21:28:11 -0000

On Mon, 2004-09-27 at 14:43, John Baldwin wrote:
> On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote:
> > On Mon, 2004-09-27 at 10:16, John Baldwin wrote:
> > > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote:
> > > > When a thread is about to return to user space it resets its priority
> > > > to the user level priority.
> > > > However after lowering the permission its priority it needs to check if
> > > > its priority is still better than all other runable threads.
> > > > This is currently not implemented.
> > > > Without the check the thread can block kernel or user threads with
> > > > better priority until a switch is forced by by an interrupt.
> > > >
> > > > The attached patch checks the relevant runqueues and threads without
> > > > slots in the same ksegrp and forces a thread switch if the currently
> > > > running thread is no longer the best thread to run after it changed its
> > > > priority.
> > > >
> > > > The patch should improve interactive response under heavy load
> > > > somewhat. It needs a lot of testing.
> > >
> > > Perhaps the better fix is to teach the schedulers to set TDF_NEEDRESCHED
> > > based on on a comparison against user_pri rather than td_priority inside
> > > of sched_add()?  Having the flag set by sched_add() is supposed to make
> > > this sort of check unnecessary.  Even 4.x has the same bug I think as a
> > > process can make another process runnable after it's priority has been
> > > boosted by a tsleep() and need_resched() is only called based on a
> > > comparison of p_pri. Ah, 4.x doesn't have the bug because it caches the
> > > priority of curproc when it enters the kernel and compares against that. 
> > > Thus, I think the correct fix is more like this:
> > >
> > > Index: sched_4bsd.c
> > > ===================================================================
> > > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v
> > > retrieving revision 1.63
> > > diff -u -r1.63 sched_4bsd.c
> > > --- sched_4bsd.c        11 Sep 2004 10:07:22 -0000      1.63
> > > +++ sched_4bsd.c        27 Sep 2004 14:12:03 -0000
> > > @@ -272,7 +272,7 @@
> > >  {
> > >
> > >         mtx_assert(&sched_lock, MA_OWNED);
> > > -       if (td->td_priority < curthread->td_priority)
> > > +       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
> > >                 curthread->td_flags |= TDF_NEEDRESCHED;
> > >  }
> > >
> > > Index: sched_ule.c
> > > ===================================================================
> > > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
> > > retrieving revision 1.129
> > > diff -u -r1.129 sched_ule.c
> > > --- sched_ule.c 11 Sep 2004 10:07:22 -0000      1.129
> > > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000
> > > @@ -723,7 +723,7 @@
> > >          */
> > >         pcpu = pcpu_find(cpu);
> > >         td = pcpu->pc_curthread;
> > > -       if (ke->ke_thread->td_priority < td->td_priority ||
> > > +       if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri ||
> > >             td == pcpu->pc_idlethread) {
> > >                 td->td_flags |= TDF_NEEDRESCHED;
> > >                 ipi_selected(1 << cpu, IPI_AST);
> > >
> > > An even better fix might be to fix td_base_pri by having it be set on
> > > kernel entry similar to how 4.x sets curpriority.  The above fix should
> > > be sufficient for now, however.
> >
> > I don't think that this is enough since TDF_NEEDRESCHED is thread
> > specific and not cpu specific.
> 
> Hmm, it is CPU specific in 4.x.  It could be changed back to being a per-cpu 
> flag easily.

But this might not help :-(.
Example:

Thread A is running in the kernel and is preempted by an interrupt
Thread I. Thread I wakes up thread B.
If I->td_ksegrp->kg_user_pri <= B->td_priority TDF_NEEDRESCHED will not
be set.
If A->td_priority < B->td_priority thread A will run once I is finished
serving interrupts.
Thread A can now leave the kernel also A->td_ksegrp->kg_user_pri >
B->td_priority may be true.

> > However the thread marked with TDF_NEEDRESCHED might not be the next
> > thread leaving the kernel.
> > ( Can't really talk about ULE since I am trying to avoid looking at
> > another shiny irresistible time sink this week ;-)
> >
> > I think we agree that that td_priority should be set to td_base_pri on
> > kernel entry. Since td_base_pri is changed by sleep and condvar
> > functions it should also be reset on kernel entry. (Probably from a new
> > ksegrp field). Condvar waits should currently non cause the base
> > priority to change to the current priority of the thread - otherwise
> > td_base_pri could get stuck at a really bad user priority.
> > ( td->td_base_pri might end up being worse than
> > td->td_ksegrp->kg_user_pri when the ksegrp priority improves)
> 
> Well, I think instead that td_base_pri should be set to td_priority on kernel 
> entry (rather than the other way around).  td_priority should be unchanged 
> just because it enters the kernel.  

I guess we disagree here.
There are just to many resource dependencies in the kernel that can lead
to priority inversion (vnode locks, disk buffer ownership, etc).
It would be nice to delay the priority boost until a thread acquires
such a resource (or even trace resource dependencies and implement
priority inheritance) ... but this would be a huge task.
Boosting the priority on kernel entry is easy and less error prone. I
guess we had this discussion last week and we just disagree on the
issue. 

> I think the sleep functions could then 
> leave td_base_pri alone.  (I think setting it there is wrong because 
> td_base_pri is not quite the same as curpriority in 4.x.)  What td_base_pri 
> is really supposed to provide, btw, is the priority that the thread should go 
> back to once it has unlocked a mutex and had its priorty boosted while it 
> held the mutex.  Arguably it should just be using kg_user_pri for this, but 
> then you loose priority "boosts" from tsleep(), which is why td_base_pri is 
> set in msleep().  I guess what should happen is something more like this:
> 
> kernel_entry()
> {
> 	KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri);
> 	td->td_base_pri = td->td_priority;
> }
> 
> msleep()
> {
> 	sched_prio(...);
> 	td_base_pri = td->td_priority;
> }
> 
> The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous 
> e-mail.  Also, in sched_prio(), if our priority is ever raised (numerically, 
> logically less important), we should set TDF_NEEDRESCHED since we may need to 
> switch (4.x does this in maybe_needresched()).  Then, TDF_NEEDRESCHED could 
> become a per-cpu flag and have it not be cleared in mi_switch() but be 
> cleared only in userret().  Hmm, I think all of the TDF_NEEDRESCHED handling 
> actually beings in sched_userret() btw.

Wouldn't this lead to unnecessary round-robin switches between threads
with the same priority on sched_4bsd?

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Mon Sep 27 22:10:43 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 42EBD16A4E1; Mon, 27 Sep 2004 22:10:43 +0000 (GMT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id CD74143D2D; Mon, 27 Sep 2004 22:10:13 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (julian.vicor-nb.com [208.206.78.97])
	by mail.vicor-nb.com (Postfix) with ESMTP
	id A86637A41E; Mon, 27 Sep 2004 15:10:13 -0700 (PDT)
Message-ID: <41588FC5.6090203@elischer.org>
Date: Mon, 27 Sep 2004 15:10:13 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Stephan Uphoff <ups@tree.com>
References: <1096133353.53798.17613.camel@palm.tree.com>
	<200409271016.13345.jhb@FreeBSD.org>
	<1096305626.95152.163.camel@palm.tree.com>
	<200409271443.22667.jhb@FreeBSD.org> <1096320486.3733.58.camel@palm.tree.com>
In-Reply-To: <1096320486.3733.58.camel@palm.tree.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2004 22:10:43 -0000


Stephan Uphoff wrote:

>On Mon, 2004-09-27 at 14:43, John Baldwin wrote:
>  
>
>>>>@@ -272,7 +272,7 @@
>>>> {
>>>>
>>>>        mtx_assert(&sched_lock, MA_OWNED);
>>>>-       if (td->td_priority < curthread->td_priority)
>>>>+       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
>>>>                curthread->td_flags |= TDF_NEEDRESCHED;
>>>> }
>>>>        
>>>>

in sched_userret() we do:
        kg = td->td_ksegrp;
        if (td->td_priority != kg->kg_user_pri) {
                mtx_lock_spin(&sched_lock);
                td->td_priority = kg->kg_user_pri;
                mtx_unlock_spin(&sched_lock);
        }

but we don't actually take any action in the case where the thread is 
heading out to userland with
a priority of less importance than a waiting thread. That happens in 
AST() where we also set it down
but only in the  case of TDF_NEEDRESCHED being set.

it would make more sense to ALWAYS to the TDF_NEEDRESCHED clause, in 
userret()
based on the user priority... i.e. the priority would be reduced going 
to userland.
Unfortunatly this would stop one of the reasons to for priorityu 
raisning in BSD.

The priority of a thread that waits for IO is raised not only to make it 
start again in the kernel
as an interactive thread, but also so that it can run into userland too 
and get some priority
for actually USING the new data/input..

it maybe that we should consider the priority as a number of different 
components that
should not be added together until needed. The "interractive IO priority 
boost" that comes from
having doen an msleep(). should wear off very quickly.. maybe we knock 
it down again at the
first or second clock tick  but to do that we nned to track that 
"interractive boost" separatly
from the general priority.


>>>>Index: sched_ule.c
>>>>===================================================================
>>>>RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
>>>>retrieving revision 1.129
>>>>diff -u -r1.129 sched_ule.c
>>>>--- sched_ule.c 11 Sep 2004 10:07:22 -0000      1.129
>>>>+++ sched_ule.c 27 Sep 2004 14:13:01 -0000
>>>>@@ -723,7 +723,7 @@
>>>>         */
>>>>        pcpu = pcpu_find(cpu);
>>>>        td = pcpu->pc_curthread;
>>>>-       if (ke->ke_thread->td_priority < td->td_priority ||
>>>>+       if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri ||
>>>>            td == pcpu->pc_idlethread) {
>>>>                td->td_flags |= TDF_NEEDRESCHED;
>>>>                ipi_selected(1 << cpu, IPI_AST);
>>>>
>>>>An even better fix might be to fix td_base_pri by having it be set on
>>>>kernel entry similar to how 4.x sets curpriority.  The above fix should
>>>>be sufficient for now, however.
>>>>        
>>>>
>>>I don't think that this is enough since TDF_NEEDRESCHED is thread
>>>specific and not cpu specific.
>>>      
>>>
>>Hmm, it is CPU specific in 4.x.  It could be changed back to being a per-cpu 
>>flag easily.
>>    
>>
>
>I guess we disagree here.
>There are just to many resource dependencies in the kernel that can lead
>to priority inversion (vnode locks, disk buffer ownership, etc).
>It would be nice to delay the priority boost until a thread acquires
>such a resource (or even trace resource dependencies and implement
>priority inheritance) ... but this would be a huge task.
>Boosting the priority on kernel entry is easy and less error prone. I
>guess we had this discussion last week and we just disagree on the
>issue. 
>
>  
>
>>I think the sleep functions could then 
>>leave td_base_pri alone.  (I think setting it there is wrong because 
>>td_base_pri is not quite the same as curpriority in 4.x.)  What td_base_pri 
>>is really supposed to provide, btw, is the priority that the thread should go 
>>back to once it has unlocked a mutex and had its priorty boosted while it 
>>held the mutex.  Arguably it should just be using kg_user_pri for this, but 
>>then you loose priority "boosts" from tsleep(), which is why td_base_pri is 
>>set in msleep().  I guess what should happen is something more like this:
>>
>>kernel_entry()
>>{
>>	KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri);
>>	td->td_base_pri = td->td_priority;
>>}
>>
>>msleep()
>>{
>>	sched_prio(...);
>>	td_base_pri = td->td_priority;
>>}
>>
>>The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous 
>>e-mail.  Also, in sched_prio(), if our priority is ever raised (numerically, 
>>logically less important), we should set TDF_NEEDRESCHED since we may need to 
>>switch (4.x does this in maybe_needresched()).  Then, TDF_NEEDRESCHED could 
>>become a per-cpu flag and have it not be cleared in mi_switch() but be 
>>cleared only in userret().  Hmm, I think all of the TDF_NEEDRESCHED handling 
>>actually beings in sched_userret() btw.
>>    
>>
>
>Wouldn't this lead to unnecessary round-robin switches between threads
>with the same priority on sched_4bsd?
>

maybe..
I added code to be "kind" to preempted threads (by puting them back on 
the head of their queue)
but overall the 4bsd scheduler doesn't translate very well into a 
multithreaded and multi processer world..
if you have multiple threads in a process, how much of the interractive 
boost do you assign to the thread and how
much to the process, which may be doing work derived from the IO but in 
another thread?
It gets even more difficult when you realise that user threads can 
switch between kernel threads without the
kernel being aware.  I have ocnsiderred of the kernel SHUOLD be aware of 
user theads. By which I mean that while we might
have only a few FUll kernel threads (with stacks etc) it might be a 
worthwhile thing to keep a small structure in teh kernel
to correspond with each user thread, that can hold a coup,e of basic 
parameters.  Sort of a hybrid between the current
"Only the UTS knows about all threads" and "full kernel knowledge of all 
threads"..  The kernel knows abot them and their history,
but doesn't need to supply full running resources for them, just a small 
(probably maybe only 8 ints worth may be enough)
amount of info that can be looked up on kernel entry using the mailbox.


>
>	Stephan
>
>_______________________________________________
>freebsd-arch@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>  
>

From owner-freebsd-arch@FreeBSD.ORG  Tue Sep 28 02:27:54 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AD8BF16A4CE
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 02:27:54 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 358A543D3F
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 02:27:54 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 17397 invoked by uid 89); 28 Sep 2004 02:27:52 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:27:52 -0000
Received: (qmail 17357 invoked by uid 89); 28 Sep 2004 02:27:52 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:27:52 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8S2Rqmt005983;
	Mon, 27 Sep 2004 22:27:52 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <41588FC5.6090203@elischer.org>
References: <1096133353.53798.17613.camel@palm.tree.com>
	<200409271016.13345.jhb@FreeBSD.org>
	<1096305626.95152.163.camel@palm.tree.com>
	<200409271443.22667.jhb@FreeBSD.org><41588FC5.6090203@elischer.org>
Content-Type: text/plain
Message-Id: <1096338471.3733.254.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Mon, 27 Sep 2004 22:27:51 -0400
Content-Transfer-Encoding: 7bit
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2004 02:27:54 -0000

On Mon, 2004-09-27 at 18:10, Julian Elischer wrote:
> Stephan Uphoff wrote:
> 
> >On Mon, 2004-09-27 at 14:43, John Baldwin wrote:
> >  
> >
> >>>>@@ -272,7 +272,7 @@
> >>>> {
> >>>>
> >>>>        mtx_assert(&sched_lock, MA_OWNED);
> >>>>-       if (td->td_priority < curthread->td_priority)
> >>>>+       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
> >>>>                curthread->td_flags |= TDF_NEEDRESCHED;
> >>>> }
> >>>>        
> >>>>
> 
> in sched_userret() we do:
>         kg = td->td_ksegrp;
>         if (td->td_priority != kg->kg_user_pri) {
>                 mtx_lock_spin(&sched_lock);
>                 td->td_priority = kg->kg_user_pri;
>                 mtx_unlock_spin(&sched_lock);
>         }
> 
> but we don't actually take any action in the case where the thread is 
> heading out to userland with
> a priority of less importance than a waiting thread. That happens in 
> AST() where we also set it down
> but only in the  case of TDF_NEEDRESCHED being set.
> 
> it would make more sense to ALWAYS to the TDF_NEEDRESCHED clause, in 
> userret()
> based on the user priority... i.e. the priority would be reduced going 
> to userland.
> Unfortunatly this would stop one of the reasons to for priorityu 
> raisning in BSD.
> 
> The priority of a thread that waits for IO is raised not only to make it 
> start again in the kernel
> as an interactive thread, but also so that it can run into userland too 
> and get some priority
> for actually USING the new data/input..

Thanks - I wasn't aware of this.
Isn't there a high potential for abuse?
A client/server programs constantly refreshing priority by waiting for
requests/replies comes to mind. If a client/server pair constantly talks
to each other they could eat a lot of cpu time.
I have to think about this some more.

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Tue Sep 28 02:52:19 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B4C4F16A4CE
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 02:52:19 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 58C2A43D49
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 02:52:19 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 18056 invoked by uid 89); 28 Sep 2004 02:52:18 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:52:18 -0000
Received: (qmail 18036 invoked by uid 89); 28 Sep 2004 02:52:18 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 28 Sep 2004 02:52:18 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8S2qGmt006097;
	Mon, 27 Sep 2004 22:52:17 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20040926075218.GA85983@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com>
	<20040926075218.GA85983@peter.osted.lan>
Content-Type: multipart/mixed; boundary="=-rj4c4DOGe8hfA+4+RE6p"
Message-Id: <1096339936.3733.279.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Mon, 27 Sep 2004 22:52:16 -0400
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2004 02:52:19 -0000


--=-rj4c4DOGe8hfA+4+RE6p
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sun, 2004-09-26 at 03:52, Peter Holm wrote:
> On Sat, Sep 25, 2004 at 08:50:45PM -0700, Julian Elischer wrote:
> > Stephan Uphoff wrote:
> > 
> > >>Maybe something brutal like:
> > >>	if ((curthread->td_ksegrp == kg) &&
> > >>	   (td->td_priority > curthread->td_priority))
> > >>		curthread->td_flags |= TDF_NEEDRESCHED;
> > >>
> > >>in setrunqueue for
> > >>the else case of "if (kg->kg_avail_opennings > 0)"
> > >>would do the trick (without preemption) for the easy but probably more
> > >>common cases?
> > >>
> > >>Maybe I can find some time next week to think about a clean
> > >>fix. I find it always helpful having a small task in mind while reading
> > >>source code.
> > >
> > >
> > >I wrote a fix that should cover all cases.
> > >However I would like to test it a little bit before posting the patch.
> > >Is there any multi-threaded kernel torture program that you can
> > >recommend?
> > 
> > 
> > Peter Holm (CC'd) has a really cool set of torture tests.
> > he has also seen all sorts of failures others have not (yet) triggered. :-)
> > 
> > I'm 'busy" for the next couple of weeks so you may want to communicate 
> > directly with him and see if you and he together can figure out some of the 
> > things he's
> > been seeing :-)
> > 
> > his tests are at:
> > http://www.holm.cc/stress/src/stress.tgz
> > 
> > >
> > >Thanks
> > >
> > >	Stephan
> > >
> 
> I'll be glad to test any patches.

Great.
Can you try the attached patch to see if it changes any of your
previously observed behaviour?

Thanks
	Stephan


--=-rj4c4DOGe8hfA+4+RE6p
Content-Disposition: attachment; filename=switch_patch
Content-Type: text/x-patch; name=switch_patch; charset=ASCII
Content-Transfer-Encoding: 7bit

Index: sys/kern/kern_switch.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_switch.c,v
retrieving revision 1.95
diff -u -r1.95 kern_switch.c
--- sys/kern/kern_switch.c	19 Sep 2004 18:34:17 -0000	1.95
+++ sys/kern/kern_switch.c	28 Sep 2004 02:48:43 -0000
@@ -315,6 +315,94 @@
 	td->td_priority = newpri;
 	setrunqueue(td, SRQ_BORING);
 }
+
+
+/*
+ * This function is called when a thread is about to be put on a
+ * ksegrp run queue because it has been made runnable or its 
+ * priority has been adjusted and the ksegrp does not have a 
+ * free kse slot.  It determines if a thread from the same ksegrp
+ * should be preempted.  If so, it tries to switch threads
+ * if the thread is on the same cpu or notifies another cpu that
+ * it should switch threads. 
+ */
+
+static void
+maybe_preempt_in_ksegrp(struct thread *td)
+{
+#if  defined(SMP)
+	int highest_pri;
+	struct ksegrp *kg;
+	cpumask_t cpumask,dontuse;
+	struct pcpu *pc;
+	struct pcpu *highest_pcpu;
+
+  	mtx_assert(&sched_lock, MA_OWNED);
+
+#if !defined(KSEG_PEEMPT_BEST_CPU)
+	if(curthread->td_ksegrp != td->td_ksegrp)
+#endif
+		{
+			kg = td->td_ksegrp;
+
+			/* Anyone waiting in front ? */
+			if(td != TAILQ_FIRST(&kg->kg_runq))  {
+				return; /* Yes - wait your turn*/
+			}
+			highest_pri  = td->td_priority;
+			highest_pcpu = NULL;
+			dontuse      = stopped_cpus | idle_cpus_mask;
+
+			/* Find a cpu with the worst priority that runs at thread from the
+			 * same  ksegrp - if multiple exist give first the last run cpu and then
+			 * the current cpu priority 
+			 */
+
+			SLIST_FOREACH(pc, &cpuhead, pc_allcpu) {
+				cpumask = pc->pc_cpumask;
+				if ( (cpumask & dontuse) == 0 && 
+				     pc->pc_curthread->td_ksegrp == kg) {
+					if (pc->pc_curthread->td_priority > highest_pri) {
+						highest_pri  = pc->pc_curthread->td_priority;
+						highest_pcpu = pc;
+					} else if (pc->pc_curthread->td_priority == highest_pri &&
+						   highest_pcpu != NULL) {
+						if (td->td_lastcpu == pc->pc_cpuid ||
+						    (PCPU_GET(cpumask) == cpumask &&
+						     td->td_lastcpu != highest_pcpu->pc_cpuid)) {
+							highest_pcpu = pc;
+						}
+					}
+				}
+			}
+			
+			/* Check if we need to preempt someone */
+			if (highest_pcpu == NULL) return;
+
+			if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) {
+				pc->pc_curthread->td_flags |= TDF_NEEDRESCHED;
+				ipi_selected(highest_pcpu->pc_cpumask, IPI_AST);
+				return;
+			}
+		}
+#else
+	KASSERT(curthread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread"));
+#endif
+
+	if  (td->td_priority <= curthread->td_priority)
+		return;
+#ifdef PREEMPTION
+	if (td->td_critnest > 1) {
+		td->td_pflags |= TDP_OWEPREEMPT;
+	} else {
+		mi_switch(SW_INVOL, NULL);
+	}
+#else
+	curthread->td_flags |= TDF_NEEDRESCHED;
+#endif
+	return;
+}
+
 int limitcount;
 void
 setrunqueue(struct thread *td, int flags)
@@ -422,6 +510,7 @@
 	} else {
 		CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d",
 			td, td->td_ksegrp, td->td_proc->p_pid);
+		maybe_preempt_in_ksegrp(td);
 	}
 }
 

--=-rj4c4DOGe8hfA+4+RE6p--

From owner-freebsd-arch@FreeBSD.ORG  Tue Sep 28 07:49:29 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CA26716A4CE
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 07:49:29 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 571C543D31
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 07:49:29 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 44107 invoked from network); 28 Sep 2004 07:49:27 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 28 Sep 2004 07:49:27 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8S7nRCs099983;
	Tue, 28 Sep 2004 09:49:27 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i8S7nQxS099982;
	Tue, 28 Sep 2004 09:49:26 +0200 (CEST)
	(envelope-from pho)
Date: Tue, 28 Sep 2004 09:49:26 +0200
From: Peter Holm <peter@holm.cc>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20040928074926.GA99957@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com> <41563C95.2020501@elischer.org>
	<20040926075218.GA85983@peter.osted.lan>
	<1096339936.3733.279.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1096339936.3733.279.camel@palm.tree.com>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2004 07:49:29 -0000

On Mon, Sep 27, 2004 at 10:52:16PM -0400, Stephan Uphoff wrote:
> On Sun, 2004-09-26 at 03:52, Peter Holm wrote:
> > On Sat, Sep 25, 2004 at 08:50:45PM -0700, Julian Elischer wrote:
> > > Stephan Uphoff wrote:
> > > 
> > > >>Maybe something brutal like:
> > > >>	if ((curthread->td_ksegrp == kg) &&
> > > >>	   (td->td_priority > curthread->td_priority))
> > > >>		curthread->td_flags |= TDF_NEEDRESCHED;
> > > >>
> > > >>in setrunqueue for
> > > >>the else case of "if (kg->kg_avail_opennings > 0)"
> > > >>would do the trick (without preemption) for the easy but probably more
> > > >>common cases?
> > > >>
> > > >>Maybe I can find some time next week to think about a clean
> > > >>fix. I find it always helpful having a small task in mind while reading
> > > >>source code.
> > > >
> > > >
> > > >I wrote a fix that should cover all cases.
> > > >However I would like to test it a little bit before posting the patch.
> > > >Is there any multi-threaded kernel torture program that you can
> > > >recommend?
> > > 
> > > 
> > > Peter Holm (CC'd) has a really cool set of torture tests.
> > > he has also seen all sorts of failures others have not (yet) triggered. :-)
> > > 
> > > I'm 'busy" for the next couple of weeks so you may want to communicate 
> > > directly with him and see if you and he together can figure out some of the 
> > > things he's
> > > been seeing :-)
> > > 
> > > his tests are at:
> > > http://www.holm.cc/stress/src/stress.tgz
> > > 
> > > >
> > > >Thanks
> > > >
> > > >	Stephan
> > > >
> > 
> > I'll be glad to test any patches.
> 
> Great.
> Can you try the attached patch to see if it changes any of your
> previously observed behaviour?
> 

The system still freezes and can be unfrozen by a ping:
http://www.holm.cc/stress/log/stephan.html

-- 
Peter Holm

From owner-freebsd-arch@FreeBSD.ORG  Tue Sep 28 14:52:02 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AF7D916A4CE
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 14:52:02 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 31A2843D55
	for <freebsd-arch@freebsd.org>; Tue, 28 Sep 2004 14:52:02 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 18039 invoked by uid 89); 28 Sep 2004 14:51:53 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 28 Sep 2004 14:51:53 -0000
Received: (qmail 17736 invoked by uid 89); 28 Sep 2004 14:51:45 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 28 Sep 2004 14:51:45 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8SEphmt009097;
	Tue, 28 Sep 2004 10:51:43 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20040928074926.GA99957@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com>
	<20040926075218.GA85983@peter.osted.lan>
	<1096339936.3733.279.camel@palm.tree.com>
	<20040928074926.GA99957@peter.osted.lan>
Content-Type: text/plain
Message-Id: <1096383103.3733.312.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Tue, 28 Sep 2004 10:51:43 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2004 14:52:02 -0000

On Tue, 2004-09-28 at 03:49, Peter Holm wrote:
> The system still freezes and can be unfrozen by a ping:
> http://www.holm.cc/stress/log/stephan.html

Could you try the sched_userret_patch in addition to the switch_patch?
( Patch is in email to arch on 25 September with the subject
"sched_userret priority adjustment patch for sched_4bsd" 

Are you running with the current GENERIC configuration?
( If not could you send me your config file?)

My debug target is currently diskless - I will reconfigure it later this
week and hopefully will be able to reproduce your freezes.

Thanks

	Stephan 

From owner-freebsd-arch@FreeBSD.ORG  Tue Sep 28 15:43:21 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 692CA16A4CE
	for <freebsd-arch@FreeBSD.org>; Tue, 28 Sep 2004 15:43:21 +0000 (GMT)
Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2B10B43D54
	for <freebsd-arch@FreeBSD.org>; Tue, 28 Sep 2004 15:43:21 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 27103 invoked from network); 28 Sep 2004 15:43:20 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <julian@elischer.org>; 28 Sep 2004 15:43:19 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8SFhF2M002165;
	Tue, 28 Sep 2004 11:43:15 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Tue, 28 Sep 2004 10:56:00 -0400
User-Agent: KMail/1.6.2
References: <1096133353.53798.17613.camel@palm.tree.com>
	<200409271443.22667.jhb@FreeBSD.org> <1096320486.3733.58.camel@palm.tree.com>
In-Reply-To: <1096320486.3733.58.camel@palm.tree.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200409281056.00870.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
cc: Julian Elischer <julian@elischer.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2004 15:43:21 -0000

On Monday 27 September 2004 05:28 pm, Stephan Uphoff wrote:
> On Mon, 2004-09-27 at 14:43, John Baldwin wrote:
> > On Monday 27 September 2004 01:20 pm, Stephan Uphoff wrote:
> > > On Mon, 2004-09-27 at 10:16, John Baldwin wrote:
> > > > On Saturday 25 September 2004 01:29 pm, Stephan Uphoff wrote:
> > > > > When a thread is about to return to user space it resets its
> > > > > priority to the user level priority.
> > > > > However after lowering the permission its priority it needs to
> > > > > check if its priority is still better than all other runable
> > > > > threads. This is currently not implemented.
> > > > > Without the check the thread can block kernel or user threads with
> > > > > better priority until a switch is forced by by an interrupt.
> > > > >
> > > > > The attached patch checks the relevant runqueues and threads
> > > > > without slots in the same ksegrp and forces a thread switch if the
> > > > > currently running thread is no longer the best thread to run after
> > > > > it changed its priority.
> > > > >
> > > > > The patch should improve interactive response under heavy load
> > > > > somewhat. It needs a lot of testing.
> > > >
> > > > Perhaps the better fix is to teach the schedulers to set
> > > > TDF_NEEDRESCHED based on on a comparison against user_pri rather than
> > > > td_priority inside of sched_add()?  Having the flag set by
> > > > sched_add() is supposed to make this sort of check unnecessary.  Even
> > > > 4.x has the same bug I think as a process can make another process
> > > > runnable after it's priority has been boosted by a tsleep() and
> > > > need_resched() is only called based on a comparison of p_pri. Ah, 4.x
> > > > doesn't have the bug because it caches the priority of curproc when
> > > > it enters the kernel and compares against that. Thus, I think the
> > > > correct fix is more like this:
> > > >
> > > > Index: sched_4bsd.c
> > > > ===================================================================
> > > > RCS file: /usr/cvs/src/sys/kern/sched_4bsd.c,v
> > > > retrieving revision 1.63
> > > > diff -u -r1.63 sched_4bsd.c
> > > > --- sched_4bsd.c        11 Sep 2004 10:07:22 -0000      1.63
> > > > +++ sched_4bsd.c        27 Sep 2004 14:12:03 -0000
> > > > @@ -272,7 +272,7 @@
> > > >  {
> > > >
> > > >         mtx_assert(&sched_lock, MA_OWNED);
> > > > -       if (td->td_priority < curthread->td_priority)
> > > > +       if (td->td_priority < curthread->td_ksegrp->kg_user_pri)
> > > >                 curthread->td_flags |= TDF_NEEDRESCHED;
> > > >  }
> > > >
> > > > Index: sched_ule.c
> > > > ===================================================================
> > > > RCS file: /usr/cvs/src/sys/kern/sched_ule.c,v
> > > > retrieving revision 1.129
> > > > diff -u -r1.129 sched_ule.c
> > > > --- sched_ule.c 11 Sep 2004 10:07:22 -0000      1.129
> > > > +++ sched_ule.c 27 Sep 2004 14:13:01 -0000
> > > > @@ -723,7 +723,7 @@
> > > >          */
> > > >         pcpu = pcpu_find(cpu);
> > > >         td = pcpu->pc_curthread;
> > > > -       if (ke->ke_thread->td_priority < td->td_priority ||
> > > > +       if (ke->ke_thread->td_priority < td->td_ksegrp->kg_user_pri
> > > > || td == pcpu->pc_idlethread) {
> > > >                 td->td_flags |= TDF_NEEDRESCHED;
> > > >                 ipi_selected(1 << cpu, IPI_AST);
> > > >
> > > > An even better fix might be to fix td_base_pri by having it be set on
> > > > kernel entry similar to how 4.x sets curpriority.  The above fix
> > > > should be sufficient for now, however.
> > >
> > > I don't think that this is enough since TDF_NEEDRESCHED is thread
> > > specific and not cpu specific.
> >
> > Hmm, it is CPU specific in 4.x.  It could be changed back to being a
> > per-cpu flag easily.
>
> But this might not help :-(.
> Example:
>
> Thread A is running in the kernel and is preempted by an interrupt
> Thread I. Thread I wakes up thread B.
> If I->td_ksegrp->kg_user_pri <= B->td_priority TDF_NEEDRESCHED will not
> be set.
> If A->td_priority < B->td_priority thread A will run once I is finished
> serving interrupts.
> Thread A can now leave the kernel also A->td_ksegrp->kg_user_pri >
> B->td_priority may be true.

If A has a priority boost from tsleep() this is intentional, however.  The 
priroity boosts from tsleep() are _supposed_ to do this so as to favor 
interactive tasks.  Note that if you add the code to always raise td_priority 
while in the kernel as below you may end up defeating this well-known feature 
of the 4BSD scheduler.

> > > However the thread marked with TDF_NEEDRESCHED might not be the next
> > > thread leaving the kernel.
> > > ( Can't really talk about ULE since I am trying to avoid looking at
> > > another shiny irresistible time sink this week ;-)
> > >
> > > I think we agree that that td_priority should be set to td_base_pri on
> > > kernel entry. Since td_base_pri is changed by sleep and condvar
> > > functions it should also be reset on kernel entry. (Probably from a new
> > > ksegrp field). Condvar waits should currently non cause the base
> > > priority to change to the current priority of the thread - otherwise
> > > td_base_pri could get stuck at a really bad user priority.
> > > ( td->td_base_pri might end up being worse than
> > > td->td_ksegrp->kg_user_pri when the ksegrp priority improves)
> >
> > Well, I think instead that td_base_pri should be set to td_priority on
> > kernel entry (rather than the other way around).  td_priority should be
> > unchanged just because it enters the kernel.
>
> I guess we disagree here.
> There are just to many resource dependencies in the kernel that can lead
> to priority inversion (vnode locks, disk buffer ownership, etc).
> It would be nice to delay the priority boost until a thread acquires
> such a resource (or even trace resource dependencies and implement
> priority inheritance) ... but this would be a huge task.
> Boosting the priority on kernel entry is easy and less error prone. I
> guess we had this discussion last week and we just disagree on the
> issue.

Well, I think you don't understand exactly what td_base_pri is supposed to 
do. :)  If you want to boost td_priority on kernel entry that is fine, but is 
completely orthogonal to this discussion.  If you wanted to that then you 
might do something like this:

kernel_entry()
{
	if (td->td_priority > PRI_KERN_MAX)
		sched_prio(td, PRI_KERN_MAX);
	td_base_pri = td->td_priority;
}

i.e., just add the boost to the kernel_entry function.

> > I think the sleep functions could then
> > leave td_base_pri alone.  (I think setting it there is wrong because
> > td_base_pri is not quite the same as curpriority in 4.x.)  What
> > td_base_pri is really supposed to provide, btw, is the priority that the
> > thread should go back to once it has unlocked a mutex and had its priorty
> > boosted while it held the mutex.  Arguably it should just be using
> > kg_user_pri for this, but then you loose priority "boosts" from tsleep(),
> > which is why td_base_pri is set in msleep().  I guess what should happen
> > is something more like this:
> >
> > kernel_entry()
> > {
> > 	KASSERT(td->td_priority == td->td_ksegrp->kg_user_pri);
> > 	td->td_base_pri = td->td_priority;
> > }
> >
> > msleep()
> > {
> > 	sched_prio(...);
> > 	td_base_pri = td->td_priority;
> > }
> >
> > The TDF_NEEDRESCHED checks should be using kg_user_pri as in my previous
> > e-mail.  Also, in sched_prio(), if our priority is ever raised
> > (numerically, logically less important), we should set TDF_NEEDRESCHED
> > since we may need to switch (4.x does this in maybe_needresched()). 
> > Then, TDF_NEEDRESCHED could become a per-cpu flag and have it not be
> > cleared in mi_switch() but be cleared only in userret().  Hmm, I think
> > all of the TDF_NEEDRESCHED handling actually beings in sched_userret()
> > btw.
>
> Wouldn't this lead to unnecessary round-robin switches between threads
> with the same priority on sched_4bsd?

It can on 4.x then.  The idea is that the occasional spurious context switch 
is cheaper than doing a lot of work on each kernel exit.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 08:57:52 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 566BC16A4CE
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 08:57:52 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id DE83643D2F
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 08:57:51 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 93406 invoked from network); 29 Sep 2004 08:57:50 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 29 Sep 2004 08:57:50 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8T8vnCs019885;
	Wed, 29 Sep 2004 10:57:49 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i8T8vmjA019884;
	Wed, 29 Sep 2004 10:57:48 +0200 (CEST)
	(envelope-from pho)
Date: Wed, 29 Sep 2004 10:57:48 +0200
From: Peter Holm <peter@holm.cc>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20040929085748.GA19695@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<414B8D5E.7000700@elischer.org> <1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com> <41563C95.2020501@elischer.org>
	<20040926075218.GA85983@peter.osted.lan>
	<1096339936.3733.279.camel@palm.tree.com>
	<20040928074926.GA99957@peter.osted.lan>
	<1096383103.3733.312.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1096383103.3733.312.camel@palm.tree.com>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 08:57:52 -0000

On Tue, Sep 28, 2004 at 10:51:43AM -0400, Stephan Uphoff wrote:
> On Tue, 2004-09-28 at 03:49, Peter Holm wrote:
> > The system still freezes and can be unfrozen by a ping:
> > http://www.holm.cc/stress/log/stephan.html
> 
> Could you try the sched_userret_patch in addition to the switch_patch?
> ( Patch is in email to arch on 25 September with the subject
> "sched_userret priority adjustment patch for sched_4bsd" 
> 

Done.

> Are you running with the current GENERIC configuration?
> ( If not could you send me your config file?)
> 

Yes. GENERIC + BREAK_TO_DEBUGGER.  

> My debug target is currently diskless - I will reconfigure it later this
> week and hopefully will be able to reproduce your freezes.
> 
> Thanks
> 
> 	Stephan 

Sorry for the late reply, but I ran into a few other problems along the way:

http://www.holm.cc/stress/log/cons79.html
http://www.holm.cc/stress/log/cons80.html

It's hard for me to tell if your patch has made any difference.
The freeze is still there.  I'll try to make the same test once more
without your patches to see if I get the same pattern in freezes.

- Peter

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 14:24:10 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6007D16A4CE
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 14:24:10 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id EA70843D46
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 14:24:09 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 27716 invoked by uid 89); 29 Sep 2004 14:24:06 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:24:06 -0000
Received: (qmail 27698 invoked by uid 89); 29 Sep 2004 14:24:05 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:24:05 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TEO3mt015217;
	Wed, 29 Sep 2004 10:24:03 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20040929085748.GA19695@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com>
	<20040926075218.GA85983@peter.osted.lan>
	<1096339936.3733.279.camel@palm.tree.com>
	<20040928074926.GA99957@peter.osted.lan>
	<1096383103.3733.312.camel@palm.tree.com>
	<20040929085748.GA19695@peter.osted.lan>
Content-Type: text/plain
Message-Id: <1096467843.3733.1145.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Wed, 29 Sep 2004 10:24:03 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 14:24:10 -0000

On Wed, 2004-09-29 at 04:57, Peter Holm wrote:
> On Tue, Sep 28, 2004 at 10:51:43AM -0400, Stephan Uphoff wrote:
> > On Tue, 2004-09-28 at 03:49, Peter Holm wrote:
> > > The system still freezes and can be unfrozen by a ping:
> > > http://www.holm.cc/stress/log/stephan.html
> > 
> > Could you try the sched_userret_patch in addition to the switch_patch?
> > ( Patch is in email to arch on 25 September with the subject
> > "sched_userret priority adjustment patch for sched_4bsd" 
> > 
> 
> Done.
> 
> > Are you running with the current GENERIC configuration?
> > ( If not could you send me your config file?)
> > 
> 
> Yes. GENERIC + BREAK_TO_DEBUGGER.  
> 
> > My debug target is currently diskless - I will reconfigure it later this
> > week and hopefully will be able to reproduce your freezes.
> > 
> > Thanks
> > 
> > 	Stephan 
> 
> Sorry for the late reply, but I ran into a few other problems along the way:

Late reply? !!!
What late reply?
Are you trying to scare me? ;-)

> http://www.holm.cc/stress/log/cons79.html
> http://www.holm.cc/stress/log/cons80.html
> 
> It's hard for me to tell if your patch has made any difference.
> The freeze is still there.  I'll try to make the same test once more
> without your patches to see if I get the same pattern in freezes.

I found some problems yesterday with mutex priority inheritance that
could potentially cause your freeze patterns.

I will try to roll a preliminary patch as soon as the caffeine does its
magic.


	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 14:39:01 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 839DA16A4CF
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 14:39:01 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id E793643D48
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 14:39:00 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 5008 invoked by uid 89); 29 Sep 2004 14:38:56 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:38:56 -0000
Received: (qmail 4937 invoked by uid 89); 29 Sep 2004 14:38:55 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 14:38:55 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TEcsmt015285;
	Wed, 29 Sep 2004 10:38:54 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200409281056.00870.jhb@FreeBSD.org>
References: <1096133353.53798.17613.camel@palm.tree.com>
	 <200409271443.22667.jhb@FreeBSD.org>
	 <1096320486.3733.58.camel@palm.tree.com>
	 <200409281056.00870.jhb@FreeBSD.org>
Content-Type: text/plain
Message-Id: <1096468734.3733.1177.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Wed, 29 Sep 2004 10:38:54 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 14:39:01 -0000

On Tue, 2004-09-28 at 10:56, John Baldwin wrote:
> If A has a priority boost from tsleep() this is intentional, however.  The 
> priroity boosts from tsleep() are _supposed_ to do this so as to favor 
> interactive tasks.  Note that if you add the code to always raise td_priority 
> while in the kernel as below you may end up defeating this well-known feature 
> of the 4BSD scheduler.

OK - you and Julian convinced me that this is a feature that I should
have known about. Without test cases or interactivity benchmarks
discussions if this is still a desirable feature are probably useless.
I will revisit the this once test cases materialize or I have time to
think about a benchmark (Not likely anytime soon).

	Stephan 


From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 17:12:16 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4CA7516A4CE
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 17:12:16 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id B1B7E43D46
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 17:12:15 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 7868 invoked by uid 89); 29 Sep 2004 17:12:14 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 17:12:14 -0000
Received: (qmail 7842 invoked by uid 89); 29 Sep 2004 17:12:14 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 17:12:14 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8THCDmt015815;
	Wed, 29 Sep 2004 13:12:13 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <1096467843.3733.1145.camel@palm.tree.com>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1095529353.31297.1192.camel@palm.tree.com>
	 <1096135220.53798.17754.camel@palm.tree.com>
	 <20040926075218.GA85983@peter.osted.lan>
	 <1096339936.3733.279.camel@palm.tree.com>
	 <20040928074926.GA99957@peter.osted.lan>
	 <1096383103.3733.312.camel@palm.tree.com>
	 <20040929085748.GA19695@peter.osted.lan>
	 <1096467843.3733.1145.camel@palm.tree.com>
Content-Type: text/plain
Message-Id: <1096477932.3733.1471.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Wed, 29 Sep 2004 13:12:13 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 17:12:16 -0000

On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote:
> On Wed, 2004-09-29 at 04:57, Peter Holm wrote:
> > It's hard for me to tell if your patch has made any difference.
> > The freeze is still there.  I'll try to make the same test once more
> > without your patches to see if I get the same pattern in freezes.
> 
> I found some problems yesterday with mutex priority inheritance that
> could potentially cause your freeze patterns.
> 
> I will try to roll a preliminary patch as soon as the caffeine does its
> magic.

OK - here is a crude patch to fix some problems with mutex priority
inheritance. My theory is that the clock thread gets stuck waiting on
GIANT.

During release/acquisition of a contested sleep mutex there are a few
windows where a task can be preempted when actions (waking up blocked
threads, ownership of the mutex, ..) need to be atomic as far as
scheduling is concerned. Otherwise priority inheritance may fail. The
patch uses critical_enter/critical_exit to protect these regions against
preemption.

It would be great if could run this in addition to the other patches.

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 19:50:14 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0F45616A4CE
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 19:50:14 +0000 (GMT)
Received: from mail.gmx.net (mail.gmx.de [213.165.64.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 4C75243D46
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 19:50:13 +0000 (GMT)
	(envelope-from idontknowmyself@gmx.net)
Received: (qmail 12598 invoked by uid 65534); 29 Sep 2004 19:50:12 -0000
Received: from p5080C7E6.dip0.t-ipconnect.de (EHLO chaos) (80.128.199.230)
  by mail.gmx.net (mp019) with SMTP; 29 Sep 2004 21:50:12 +0200
X-Authenticated: #17701688
To: freebsd-arch@freebsd.org
Date: Wed, 29 Sep 2004 21:53:01 +0200
From: idontknowmyself@gmx.net
Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-15
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Message-ID: <opse3tinmlvaixhr@chaos>
User-Agent: Opera M2/7.54 (Win32, build 3865)
Subject: freebsd on mvme147
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 19:50:14 -0000

hi there
could you tell me if there is any freebsd port for motorola mvme147  
processor architectures?
i would be very pleased to know that
thank you
oskar roeding

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 20:11:02 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7F85A16A4CE
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 20:11:02 +0000 (GMT)
Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4C78B43D48
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 20:11:02 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 28384 invoked from network); 29 Sep 2004 20:11:02 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <julian@elischer.org>; 29 Sep 2004 20:10:59 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8TKAukO012092;
	Wed, 29 Sep 2004 16:10:56 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Wed, 29 Sep 2004 10:55:47 -0400
User-Agent: KMail/1.6.2
References: <1096133353.53798.17613.camel@palm.tree.com>
	<200409281056.00870.jhb@FreeBSD.org>
	<1096468734.3733.1177.camel@palm.tree.com>
In-Reply-To: <1096468734.3733.1177.camel@palm.tree.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200409291055.48387.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
cc: Julian Elischer <julian@elischer.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 20:11:02 -0000

On Wednesday 29 September 2004 10:38 am, Stephan Uphoff wrote:
> On Tue, 2004-09-28 at 10:56, John Baldwin wrote:
> > If A has a priority boost from tsleep() this is intentional, however. 
> > The priroity boosts from tsleep() are _supposed_ to do this so as to
> > favor interactive tasks.  Note that if you add the code to always raise
> > td_priority while in the kernel as below you may end up defeating this
> > well-known feature of the 4BSD scheduler.
>
> OK - you and Julian convinced me that this is a feature that I should
> have known about. Without test cases or interactivity benchmarks
> discussions if this is still a desirable feature are probably useless.
> I will revisit the this once test cases materialize or I have time to
> think about a benchmark (Not likely anytime soon).

That's ok.  This discussion has been very fruitful on my end at least as 
talking this out has helped me get a much better grasp on how this stuff 
works on 4.x and should be done in 5.x to obtain at least somewhat similar 
behavior.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 20:26:09 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1856516A4CE
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 20:26:09 +0000 (GMT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 06CC743D53
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 20:26:09 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (julian.vicor-nb.com [208.206.78.97])
	by mail.vicor-nb.com (Postfix) with ESMTP
	id E17637A446; Wed, 29 Sep 2004 13:26:08 -0700 (PDT)
Message-ID: <415B1A60.5040306@elischer.org>
Date: Wed, 29 Sep 2004 13:26:08 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516
X-Accept-Language: en, hu
MIME-Version: 1.0
To: idontknowmyself@gmx.net
References: <opse3tinmlvaixhr@chaos>
In-Reply-To: <opse3tinmlvaixhr@chaos>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-arch@freebsd.org
Subject: Re: freebsd on mvme147
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 20:26:09 -0000


idontknowmyself@gmx.net wrote:

> hi there
> could you tell me if there is any freebsd port for motorola mvme147  
> processor architectures? 


NetBSD (www.netbsd.org) has it..

>
> i would be very pleased to know that
> thank you
> oskar roeding
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 20:26:19 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 96B5B16A4CE
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 20:26:19 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 2D3A443D2F
	for <freebsd-arch@freebsd.org>; Wed, 29 Sep 2004 20:26:19 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 10052 invoked by uid 89); 29 Sep 2004 20:26:18 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 20:26:18 -0000
Received: (qmail 10028 invoked by uid 89); 29 Sep 2004 20:26:17 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 20:26:17 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TKQGmt017003;
	Wed, 29 Sep 2004 16:26:16 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <1096477932.3733.1471.camel@palm.tree.com>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1095529353.31297.1192.camel@palm.tree.com>
	 <1096135220.53798.17754.camel@palm.tree.com>
	 <20040926075218.GA85983@peter.osted.lan>
	 <1096339936.3733.279.camel@palm.tree.com>
	 <20040928074926.GA99957@peter.osted.lan>
	 <1096383103.3733.312.camel@palm.tree.com>
	 <20040929085748.GA19695@peter.osted.lan>
	 <1096467843.3733.1145.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
Content-Type: multipart/mixed; boundary="=-MHoXvfgwm/AM79gvF4x9"
Message-Id: <1096489576.3733.1868.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Wed, 29 Sep 2004 16:26:16 -0400
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 20:26:19 -0000


--=-MHoXvfgwm/AM79gvF4x9
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Forgot to attach the patch ...

	Stephan

On Wed, 2004-09-29 at 13:12, Stephan Uphoff wrote:
> On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote:
> > On Wed, 2004-09-29 at 04:57, Peter Holm wrote:
> > > It's hard for me to tell if your patch has made any difference.
> > > The freeze is still there.  I'll try to make the same test once more
> > > without your patches to see if I get the same pattern in freezes.
> > 
> > I found some problems yesterday with mutex priority inheritance that
> > could potentially cause your freeze patterns.
> > 
> > I will try to roll a preliminary patch as soon as the caffeine does its
> > magic.
> 
> OK - here is a crude patch to fix some problems with mutex priority
> inheritance. My theory is that the clock thread gets stuck waiting on
> GIANT.
> 
> During release/acquisition of a contested sleep mutex there are a few
> windows where a task can be preempted when actions (waking up blocked
> threads, ownership of the mutex, ..) need to be atomic as far as
> scheduling is concerned. Otherwise priority inheritance may fail. The
> patch uses critical_enter/critical_exit to protect these regions against
> preemption.
> 
> It would be great if could run this in addition to the other patches.
> 
> 	Stephan


--=-MHoXvfgwm/AM79gvF4x9
Content-Disposition: attachment; filename=mutex_patch
Content-Type: text/x-patch; name=mutex_patch; charset=ASCII
Content-Transfer-Encoding: 7bit

Index: kern_mutex.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_mutex.c,v
retrieving revision 1.149
diff -u -r1.149 kern_mutex.c
--- kern_mutex.c	2 Sep 2004 18:59:15 -0000	1.149
+++ kern_mutex.c	29 Sep 2004 16:50:36 -0000
@@ -492,7 +492,9 @@
 		if (v == MTX_CONTESTED) {
 			MPASS(ts != NULL);
 			m->mtx_lock = (uintptr_t)td | MTX_CONTESTED;
+			critical_enter();
 			turnstile_claim(ts);
+			critical_exit();
 			break;
 		}
 #endif
@@ -651,6 +653,9 @@
 #else
 	MPASS(ts != NULL);
 #endif
+
+	critical_enter();
+
 #ifndef PREEMPTION
 	/* XXX */
 	td1 = turnstile_head(ts);
@@ -671,6 +676,7 @@
 	}
 #endif
 	turnstile_unpend(ts);
+	critical_exit();
 
 #ifndef PREEMPTION
 	/*

--=-MHoXvfgwm/AM79gvF4x9--

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 20:40:58 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 0FD7416A4CE; Wed, 29 Sep 2004 20:40:58 +0000 (GMT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E33D343D45; Wed, 29 Sep 2004 20:40:57 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (julian.vicor-nb.com [208.206.78.97])
	by mail.vicor-nb.com (Postfix) with ESMTP
	id B7A877A446; Wed, 29 Sep 2004 13:40:57 -0700 (PDT)
Message-ID: <415B1DD9.2050409@elischer.org>
Date: Wed, 29 Sep 2004 13:40:57 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516
X-Accept-Language: en, hu
MIME-Version: 1.0
To: John Baldwin <jhb@FreeBSD.org>
References: <1096133353.53798.17613.camel@palm.tree.com>
	<200409281056.00870.jhb@FreeBSD.org>
	<1096468734.3733.1177.camel@palm.tree.com>
	<200409291055.48387.jhb@FreeBSD.org>
In-Reply-To: <200409291055.48387.jhb@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: Stephan Uphoff <ups@tree.com>
cc: David Xu <davidxu@freebsd.org>
cc: freebsd-arch@FreeBSD.org
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 20:40:58 -0000


John Baldwin wrote:

>On Wednesday 29 September 2004 10:38 am, Stephan Uphoff wrote:
>  
>
>>On Tue, 2004-09-28 at 10:56, John Baldwin wrote:
>>    
>>
>>>If A has a priority boost from tsleep() this is intentional, however. 
>>>The priroity boosts from tsleep() are _supposed_ to do this so as to
>>>favor interactive tasks.  Note that if you add the code to always raise
>>>td_priority while in the kernel as below you may end up defeating this
>>>well-known feature of the 4BSD scheduler.
>>>      
>>>
>>OK - you and Julian convinced me that this is a feature that I should
>>have known about. Without test cases or interactivity benchmarks
>>discussions if this is still a desirable feature are probably useless.
>>I will revisit the this once test cases materialize or I have time to
>>think about a benchmark (Not likely anytime soon).
>>    
>>
>
>That's ok.  This discussion has been very fruitful on my end at least as 
>talking this out has helped me get a much better grasp on how this stuff 
>works on 4.x and should be done in 5.x to obtain at least somewhat similar 
>behavior.
>


well if you've worked it out,.. do let the rest of us know :-)

I do think that there are several points that need work..
1/ kse threads are ephemeral, and so they don't gather any 'history'.
  therefore it needs to be gathered somewher eelse.. (e.g. the ksegrp, 
but what does that actually mean?)
2/  what if the kg has both long-running and interractive threads?
3/  sibling thread affinity and how that affects priority and scheduling.


We COULD store information in the mailbox..
but then we need to trust the user with it..
So then where do we store it?

I have considerrred a store of 'cached' and "hashed"  (like the buffer 
cache) sched-info structs that are recycled
in a least-recently used manner.. when you get a thread with a mailbox 
you look for a sched-stats block
corresponding with that mailbox address and use it..
if yu don't find it then you know that thread has not run for a long time..
 so you grab the least-recently used one and recycle it as that thread 
hasn't run for a while. 
Basically the kernel could keep stats on behalf of the most active KSE 
threads in an efficient manner.
The small stats structs would need to be only about 8 words..
(4 for 2 x doubel links. one for mailbox addr/key, and 3 for sched stats.)
In effect the kernel keeps tabs on the most active user threads without 
the UTS knowing about it.


From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 20:58:32 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 66A3716A4CE
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 20:58:32 +0000 (GMT)
Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3797D43D39
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 20:58:32 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 6873 invoked from network); 29 Sep 2004 20:58:31 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <julian@elischer.org>; 29 Sep 2004 20:58:31 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8TKwRWq012425;
	Wed, 29 Sep 2004 16:58:28 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Wed, 29 Sep 2004 16:52:29 -0400
User-Agent: KMail/1.6.2
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
In-Reply-To: <1096489576.3733.1868.camel@palm.tree.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200409291652.29990.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: Peter Holm <peter@holm.cc>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
cc: Julian Elischer <julian@elischer.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 20:58:32 -0000

On Wednesday 29 September 2004 04:26 pm, Stephan Uphoff wrote:
> Forgot to attach the patch ...
>
> 	Stephan
>
> On Wed, 2004-09-29 at 13:12, Stephan Uphoff wrote:
> > On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote:
> > > On Wed, 2004-09-29 at 04:57, Peter Holm wrote:
> > > > It's hard for me to tell if your patch has made any difference.
> > > > The freeze is still there.  I'll try to make the same test once more
> > > > without your patches to see if I get the same pattern in freezes.
> > >
> > > I found some problems yesterday with mutex priority inheritance that
> > > could potentially cause your freeze patterns.
> > >
> > > I will try to roll a preliminary patch as soon as the caffeine does its
> > > magic.
> >
> > OK - here is a crude patch to fix some problems with mutex priority
> > inheritance. My theory is that the clock thread gets stuck waiting on
> > GIANT.
> >
> > During release/acquisition of a contested sleep mutex there are a few
> > windows where a task can be preempted when actions (waking up blocked
> > threads, ownership of the mutex, ..) need to be atomic as far as
> > scheduling is concerned. Otherwise priority inheritance may fail. The
> > patch uses critical_enter/critical_exit to protect these regions against
> > preemption.
> >
> > It would be great if could run this in addition to the other patches.

turnstile_claim() doesn't make any threads runnable and thus can't preempt.
The other place is supposed to preempt, and it should be ok to do so.  Note 
that since the turnstile chain lock is held, that includes a nested critical 
section and any preemption will be deferred until the turnstile lock is 
released via turnstile_release which happens in the middle of 
turnstile_unpend() after it has finished building a list of all the threads 
to be made runnable so that the turnstile object can be re-used safely.  I 
don't think this patch will make much of a difference (if any).  Can you 
provide a description of a case where you think the priority inheritance can 
fail if turnstile_unpend() doesn't run in a nested critical section?

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 21:00:44 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id EAE4D16A4CE; Wed, 29 Sep 2004 21:00:44 +0000 (GMT)
Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7B34F43D58; Wed, 29 Sep 2004 21:00:43 +0000 (GMT)
	(envelope-from deischen@freebsd.org)
Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11])
	i8TL0aNl017870;	Wed, 29 Sep 2004 17:00:36 -0400 (EDT)
Date: Wed, 29 Sep 2004 17:00:36 -0400 (EDT)
From: Daniel Eischen <deischen@freebsd.org>
X-X-Sender: eischen@sea.ntplx.net
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <415B1DD9.2050409@elischer.org>
Message-ID: <Pine.GSO.4.43.0409291646040.15102-100000@sea.ntplx.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net)
cc: freebsd-arch@freebsd.org
cc: David Xu <davidxu@freebsd.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: sched_userret priority adjustment patch for sched_4bsd
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: Daniel Eischen <deischen@freebsd.org>
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 21:00:45 -0000

On Wed, 29 Sep 2004, Julian Elischer wrote:

>
>
> John Baldwin wrote:
>
> >
> >That's ok.  This discussion has been very fruitful on my end at least as
> >talking this out has helped me get a much better grasp on how this stuff
> >works on 4.x and should be done in 5.x to obtain at least somewhat similar
> >behavior.
> >
>
>
> well if you've worked it out,.. do let the rest of us know :-)
>
> I do think that there are several points that need work..
> 1/ kse threads are ephemeral, and so they don't gather any 'history'.
>   therefore it needs to be gathered somewher eelse.. (e.g. the ksegrp,
> but what does that actually mean?)
> 2/  what if the kg has both long-running and interractive threads?
> 3/  sibling thread affinity and how that affects priority and scheduling.
>
>
> We COULD store information in the mailbox..
> but then we need to trust the user with it..
> So then where do we store it?
>
> I have considerrred a store of 'cached' and "hashed"  (like the buffer
> cache) sched-info structs that are recycled
> in a least-recently used manner.. when you get a thread with a mailbox
> you look for a sched-stats block
> corresponding with that mailbox address and use it..
> if yu don't find it then you know that thread has not run for a long time..
>  so you grab the least-recently used one and recycle it as that thread
> hasn't run for a while.
> Basically the kernel could keep stats on behalf of the most active KSE
> threads in an efficient manner.
> The small stats structs would need to be only about 8 words..
> (4 for 2 x doubel links. one for mailbox addr/key, and 3 for sched stats.)
> In effect the kernel keeps tabs on the most active user threads without
> the UTS knowing about it.

Remember that the UTS (IAW POSIX) should be in charge of which
threads run _within_ a process.  Across processes, and for system
scope threads, that's another story.

I think it would be cool if the UTS could store its version
of priority in the thread mailbox, and the kernel would use
this as a hint for which threads should get worked on when
blocked in the kernel.  For instance, if a thread is currently
running with high priority and it makes a system call, that's
a chance for the kernel to continue other blocked threads.
But if the other blocked threads are all of lower (UTS)
priority, you might not want to continue them (or upcall)
when the currently running thread has a higher priority.

-- 
Dan Eischen

From owner-freebsd-arch@FreeBSD.ORG  Wed Sep 29 22:14:37 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5118C16A4CE
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 22:14:37 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id A96AC43D2F
	for <freebsd-arch@FreeBSD.org>; Wed, 29 Sep 2004 22:14:36 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 22904 invoked by uid 89); 29 Sep 2004 22:14:34 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 22:14:34 -0000
Received: (qmail 22231 invoked by uid 89); 29 Sep 2004 22:14:19 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 29 Sep 2004 22:14:19 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8TMEHmt017471;
	Wed, 29 Sep 2004 18:14:18 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200409291652.29990.jhb@FreeBSD.org>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
	 <1096489576.3733.1868.camel@palm.tree.com>
	 <200409291652.29990.jhb@FreeBSD.org>
Content-Type: text/plain
Message-Id: <1096496057.3733.2163.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Wed, 29 Sep 2004 18:14:17 -0400
Content-Transfer-Encoding: 7bit
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2004 22:14:37 -0000

On Wed, 2004-09-29 at 16:52, John Baldwin wrote:
> > > OK - here is a crude patch to fix some problems with mutex priority
> > > inheritance. My theory is that the clock thread gets stuck waiting on
> > > GIANT.
> > >
> > > During release/acquisition of a contested sleep mutex there are a few
> > > windows where a task can be preempted when actions (waking up blocked
> > > threads, ownership of the mutex, ..) need to be atomic as far as
> > > scheduling is concerned. Otherwise priority inheritance may fail. The
> > > patch uses critical_enter/critical_exit to protect these regions against
> > > preemption.
> > >
> > > It would be great if could run this in addition to the other patches.
> 
> turnstile_claim() doesn't make any threads runnable and thus can't preempt.
> The other place is supposed to preempt, and it should be ok to do so.  Note 
> that since the turnstile chain lock is held, that includes a nested critical 
> section and any preemption will be deferred until the turnstile lock is 
> released via turnstile_release which happens in the middle of 
> turnstile_unpend() after it has finished building a list of all the threads 
> to be made runnable so that the turnstile object can be re-used safely.  I 
> don't think this patch will make much of a difference (if any).  Can you 
> provide a description of a case where you think the priority inheritance can 
> fail if turnstile_unpend() doesn't run in a nested critical section?

This is a bit of a mind bender.
I hope you have some aspirins close by ;-)

Thread A holds a mutex x contested by Thread B and has priority pri(A).
Thread B holds a mutex y.
There is a thread C with priority pri(C) with pri(C) < pri(A).

Thread A is in the process of releasing x.
It removes thread B from the turnstile and holds a pointer to B in a
private list.
Thread A sets the owner of the turnstile to NULL and releases all spin
locks. ( mtx_unlock_spin(&tc->tc_lock); line 148)
This means interrupts are now enabled.

An interrupt occurs (or is already pending) and the interrupt handler
puts the associated interrupt thread I on the run queue.
This causes a preemption from A to I.
The interrupt thread I tries to acquire mutex y owned by B and blocks.
I donates its priority to B - but inheritance stops at B.
The next thread with the best priority is C and the cpu switches to C.
However B needs A to run to make it to the run-queue.

If y is GIANT and I is the clock thread C could run forever in userspace
without being interrupted.

There is another scenario that does not require an interrupt (preemption
in setrunqueue(td, SRQ_BORING), two blocked threads ...).

I was looking at the MUTEX_WAKE_ALL undefined case when I used the
critical section for turnstile_claim().
However there are bigger problems with MUTEX_WAKE_ALL undefined
so you are right - the critical section for turnstile_claim is pretty
useless.

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Thu Sep 30 07:58:04 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 51A7916A4CE
	for <freebsd-arch@freebsd.org>; Thu, 30 Sep 2004 07:58:04 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id DBB3743D4C
	for <freebsd-arch@freebsd.org>; Thu, 30 Sep 2004 07:58:03 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 84475 invoked from network); 30 Sep 2004 07:58:01 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 30 Sep 2004 07:58:01 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8U7w0Cs052399;
	Thu, 30 Sep 2004 09:58:00 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i8U7w0Va052398;
	Thu, 30 Sep 2004 09:58:00 +0200 (CEST)
	(envelope-from pho)
Date: Thu, 30 Sep 2004 09:57:59 +0200
From: Peter Holm <peter@holm.cc>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20040930075759.GA52233@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1095529353.31297.1192.camel@palm.tree.com>
	<1096135220.53798.17754.camel@palm.tree.com>
	<20040926075218.GA85983@peter.osted.lan>
	<1096339936.3733.279.camel@palm.tree.com>
	<20040928074926.GA99957@peter.osted.lan>
	<1096383103.3733.312.camel@palm.tree.com>
	<20040929085748.GA19695@peter.osted.lan>
	<1096467843.3733.1145.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="Kj7319i9nmIyA2yE"
Content-Disposition: inline
In-Reply-To: <1096477932.3733.1471.camel@palm.tree.com>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2004 07:58:04 -0000


--Kj7319i9nmIyA2yE
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Wed, Sep 29, 2004 at 01:12:13PM -0400, Stephan Uphoff wrote:
> On Wed, 2004-09-29 at 10:24, Stephan Uphoff wrote:
> > On Wed, 2004-09-29 at 04:57, Peter Holm wrote:
> > > It's hard for me to tell if your patch has made any difference.
> > > The freeze is still there.  I'll try to make the same test once more
> > > without your patches to see if I get the same pattern in freezes.
> > 
> > I found some problems yesterday with mutex priority inheritance that
> > could potentially cause your freeze patterns.
> > 
> > I will try to roll a preliminary patch as soon as the caffeine does its
> > magic.
> 
> OK - here is a crude patch to fix some problems with mutex priority
> inheritance. My theory is that the clock thread gets stuck waiting on
> GIANT.
> 
> During release/acquisition of a contested sleep mutex there are a few
> windows where a task can be preempted when actions (waking up blocked
> threads, ownership of the mutex, ..) need to be atomic as far as
> scheduling is concerned. Otherwise priority inheritance may fail. The
> patch uses critical_enter/critical_exit to protect these regions against
> preemption.
> 
> It would be great if could run this in addition to the other patches.
> 
> 	Stephan

OK, did so. Doesn't seem to make any difference.
In order to spot a freeze I have instrumented hardclock() to report if
Giant is being held more than 60 seconds. I don't know if this is any
help, but here are examples of two freezes, both unfrozen by ping:

Mounted root from ufs:/dev/ad0s1a.
Giant held for more than 60 sec by td 0xc1ad5180, pid 1100
~KDB: enter: Line break on console
[thread 100105]
Stopped at      kdb_enter+0x2b: nop
db> where 1100
sched_switch(c1ad5180,0,1) at sched_switch+0x14f
mi_switch(1,0) at mi_switch+0x264
turnstile_wait(c17ec700,c10429cc,c1ad9900,c10429cc,2,c07f9e66,219) at turnstile_wait+0x2ec
_mtx_lock_sleep(c10429cc,c1ad5180,0,c0812243,88d) at _mtx_lock_sleep+0x167
_mtx_lock_flags(c10429cc,0,c0812243,88d) at _mtx_lock_flags+0x85
vm_map_entry_delete(c1a63708,c2071374,cf289a10,c074a75b,c1a63708) at vm_map_entry_delete+0x7e
vm_map_delete(c1a63708,0,bfc00000,c1a63708,c1a63708) at vm_map_delete+0x18f
vm_map_remove(c1a63708,0,bfc00000) at vm_map_remove+0x42
exec_new_vmspace(cf289b94,c0897da0,c07f81d2,31e,c17d6318) at exec_new_vmspace+0x175
exec_elf32_imgact(cf289b94,c08c4e58,c08c4ef8,0,0) at exec_elf32_imgact+0x1b3
kern_execve(c1ad5180,8067470,806739c,8067404,0) at kern_execve+0x30e
execve(c1ad5180,cf289d14,3,0,286) at execve+0x18
syscall(2f,2f,2f,8067470,806739c) at syscall+0x213
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (59, FreeBSD ELF32, execve), eip = 0x2812b01f, esp = 0xbfbfe68c, ebp = 0xbfbfe6b8 ---
db> show locks 1100
exclusive sx user map r = 0 (0xc1a6374c) locked @ vm/vm_map.c:2313
exclusive sleep mutex Giant r = 0 (0xc08bc2c0) locked @ vm/vm_object.c:453
db> c
pid 1353: corrected slot count (2->1)
Giant held for more than 60 sec by td 0xc1cba900, pid 1453
~KDB: enter: Line break on console
[thread 100134]
Stopped at      kdb_enter+0x2b: nop
db> where 1453
sched_switch(c1cba900,0,1) at sched_switch+0x14f
mi_switch(1,0) at mi_switch+0x264
turnstile_wait(c1af1e00,c08f5ea0,c1caa000,c08f5ea0,2,c07f9e66,219) at turnstile_wait+0x2ec
_mtx_lock_sleep(c08f5ea0,c1cba900,0,c080174c,d99) at _mtx_lock_sleep+0x167
_mtx_lock_flags(c08f5ea0,0,c080174c,d99,c1a755ac) at _mtx_lock_flags+0x85
vfs_clean_pages(c66638e8,c66638e8,8048cd9,c7f16e82,1) at vfs_clean_pages+0x7c
bdwrite(c66638e8) at bdwrite+0x2d0
ffs_write(cf407c14) at ffs_write+0x558
vn_write(c1d78ae4,cf407c88,c1a59880,0,c1cba900) at vn_write+0x1f8
dofilewrite(c1cba900,c1d78ae4,1,8048cd9,1) at dofilewrite+0xa8
write(c1cba900,cf407d14,3,45,292) at write+0x39
syscall(804002f,bfbf002f,bfbf002f,8049568,bfbfebc0) at syscall+0x213
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (4, FreeBSD ELF32, write), eip = 0x280c151f, esp = 0xbfbfeb3c, ebp = 0xbfbfeb78 ---
db> show locks 1453
exclusive sleep mutex vm object (standard object) r = 0 (0xc1a755ac) locked @ kern/vfs_bio.c:3480
exclusive sleep mutex Giant r = 0 (0xc08bc2c0) locked @ kern/vfs_vnops.c:582
db> c

-- 
Peter Holm

--Kj7319i9nmIyA2yE
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="kern_clock.diff"

cvs diff: Diffing .
Index: kern_clock.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.172
diff -u -r1.172 kern_clock.c
--- kern_clock.c	10 Jul 2004 21:36:01 -0000	1.172
+++ kern_clock.c	30 Sep 2004 07:19:44 -0000
@@ -193,6 +193,14 @@
 	mtx_unlock_spin_flags(&sched_lock, MTX_QUIET);
 }
 
+int pho = 0;
+int pho_giant = 0;
+struct thread *pho_giant_td = NULL;
+#define mtx_unowned(m)     ((m)->mtx_lock == MTX_UNOWNED)
+#define mtx_owner(m)    (mtx_unowned((m)) ? NULL \
+        : (struct thread *)((m)->mtx_lock & MTX_FLAGMASK))
+
+
 /*
  * The real-time timer, interrupting hz times per second.
  */
@@ -239,6 +247,23 @@
 	if (need_softclock)
 		swi_sched(softclock_ih, 0);
 
+	if (pho > 0) 
+		if (--pho == 0) panic("testing ...");
+
+	if (!(mtx_unowned(&Giant))) {
+		if (pho_giant_td != mtx_owner(&Giant)) {
+			pho_giant_td = mtx_owner(&Giant);
+			pho_giant = 0;
+		}
+		if (++pho_giant == 60*hz) {
+			printf("Giant held for more than %d sec by td %p, pid %d\n",
+				pho_giant / hz, pho_giant_td,
+				pho_giant_td->td_proc->p_pid);
+			pho_giant = 0;
+		}
+	} else
+		pho_giant = 0;
+
 #ifdef SW_WATCHDOG
 	if (watchdog_enabled > 0 && --watchdog_ticks <= 0)
 		watchdog_fire();

--Kj7319i9nmIyA2yE--

From owner-freebsd-arch@FreeBSD.ORG  Thu Sep 30 13:05:21 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id B7E7216A4CE; Thu, 30 Sep 2004 13:05:21 +0000 (GMT)
Received: from athena.softcardsystems.com (mail.softcardsystems.com
	[12.34.136.114])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 4E0F643D3F; Thu, 30 Sep 2004 13:05:21 +0000 (GMT)
	(envelope-from sah@softcardsystems.com)
Received: from athena (athena [12.34.136.114])i8UE4Ika006325;
	Thu, 30 Sep 2004 09:04:18 -0500
Date: Thu, 30 Sep 2004 09:04:18 -0500 (EST)
From: Sam <sah@softcardsystems.com>
X-X-Sender: sah@athena
To: Stephan Uphoff <ups@tree.com>
In-Reply-To: <1095976309.53798.8390.camel@palm.tree.com>
Message-ID: <Pine.LNX.4.60.0409300903070.6230@athena>
References: <Pine.LNX.4.60.0409211531450.32120@athena> 
 <41508FEB.6030203@elischer.org><20040923191423.GE61631@FreeBSD.org> 
 <Pine.LNX.4.60.0409231519030.19882@athena> <41532FA0.6030405@elischer.org>
  <Pine.LNX.4.60.0409231620240.19882@athena>  <41533E0D.9000908@elischer.org>
 <1095976309.53798.8390.camel@palm.tree.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
cc: re@freebsd.org
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: AoE for 4.x
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2004 13:05:21 -0000

I haven't heard any major objections to my getting a major
number -- can someone please step up and help me out?

Sam

On Thu, 23 Sep 2004, Stephan Uphoff wrote:

> Since a complete disk operation in AoE is encapsulated in a single
> Ethernet request/response pair - The data size of a read/write operation
> is smaller than a single page.
>
> I don't think any existing framework can deal with this efficiently.
>
> 	Stephan
>
> On Thu, 2004-09-23 at 17:20, Julian Elischer wrote:
>> you could look at the sbp driver that is part of the firewire code..
>> I think that may be the closest analog.
>>
>>
>> Sam wrote:
>>
>>> On Thu, 23 Sep 2004, Julian Elischer wrote:
>>>
>>>> I think that if you have a working driver we can assign you a number.
>>>> I do have some questions however..
>>>>
>>>> this is AoE.. is it not possible at all to combne it with either the CAM
>>>> framework (such as the atapicam stuff) or the existing ATA stuff..
>>>> Don't take this the wrong way.. it's just a question..
>>>> CAM is being used to talk to drives over firewire, usb, ata, scsi,
>>>> fibrechannel.
>>>> it would seem that to unify this would be something that we should
>>>> look at..
>>>> Of course CAM itslef is showing its age in soem places and it could
>>>> do with some work itself..
>>>
>>>
>>> It might be possible to plug into the CAM; I only briefly
>>> glanced at it and it didn't appear appropriate.  The ATA
>>> layer definitely isn't as parts of ATA don't make sense
>>> in this context (Read DMA, Read Multiple, eg) and AoE
>>> devices don't conform to the simple hardware probe/attach
>>> methodology (as I understand it).
>>>
>>> I would love to be proved wrong.  I'm always willing to
>>> try a new approach if it's demonstrably better.
>>>
>>> Sam
>>
>>
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>>
>>
>
>

From owner-freebsd-arch@FreeBSD.ORG  Thu Sep 30 17:35:36 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 40F8516A4CE
	for <freebsd-arch@FreeBSD.org>; Thu, 30 Sep 2004 17:35:36 +0000 (GMT)
Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0A0A443D49
	for <freebsd-arch@FreeBSD.org>; Thu, 30 Sep 2004 17:35:34 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 26464 invoked from network); 30 Sep 2004 17:35:33 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <julian@elischer.org>; 30 Sep 2004 17:35:32 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i8UHZREi019616;
	Thu, 30 Sep 2004 13:35:27 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Thu, 30 Sep 2004 10:17:54 -0400
User-Agent: KMail/1.6.2
References: <1095468747.31297.241.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
In-Reply-To: <1096496057.3733.2163.camel@palm.tree.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200409301017.54350.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: Peter Holm <peter@holm.cc>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
cc: Julian Elischer <julian@elischer.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2004 17:35:36 -0000

On Wednesday 29 September 2004 06:14 pm, Stephan Uphoff wrote:
> On Wed, 2004-09-29 at 16:52, John Baldwin wrote:
> > > > OK - here is a crude patch to fix some problems with mutex priority
> > > > inheritance. My theory is that the clock thread gets stuck waiting on
> > > > GIANT.
> > > >
> > > > During release/acquisition of a contested sleep mutex there are a few
> > > > windows where a task can be preempted when actions (waking up blocked
> > > > threads, ownership of the mutex, ..) need to be atomic as far as
> > > > scheduling is concerned. Otherwise priority inheritance may fail. The
> > > > patch uses critical_enter/critical_exit to protect these regions
> > > > against preemption.
> > > >
> > > > It would be great if could run this in addition to the other patches.
> >
> > turnstile_claim() doesn't make any threads runnable and thus can't
> > preempt. The other place is supposed to preempt, and it should be ok to
> > do so.  Note that since the turnstile chain lock is held, that includes a
> > nested critical section and any preemption will be deferred until the
> > turnstile lock is released via turnstile_release which happens in the
> > middle of
> > turnstile_unpend() after it has finished building a list of all the
> > threads to be made runnable so that the turnstile object can be re-used
> > safely.  I don't think this patch will make much of a difference (if
> > any).  Can you provide a description of a case where you think the
> > priority inheritance can fail if turnstile_unpend() doesn't run in a
> > nested critical section?
>
> This is a bit of a mind bender.
> I hope you have some aspirins close by ;-)
>
> Thread A holds a mutex x contested by Thread B and has priority pri(A).
> Thread B holds a mutex y.
> There is a thread C with priority pri(C) with pri(C) < pri(A).
>
> Thread A is in the process of releasing x.
> It removes thread B from the turnstile and holds a pointer to B in a
> private list.
> Thread A sets the owner of the turnstile to NULL and releases all spin
> locks. ( mtx_unlock_spin(&tc->tc_lock); line 148)
> This means interrupts are now enabled.
>
> An interrupt occurs (or is already pending) and the interrupt handler
> puts the associated interrupt thread I on the run queue.
> This causes a preemption from A to I.
> The interrupt thread I tries to acquire mutex y owned by B and blocks.
> I donates its priority to B - but inheritance stops at B.
> The next thread with the best priority is C and the cpu switches to C.
> However B needs A to run to make it to the run-queue.
>
> If y is GIANT and I is the clock thread C could run forever in userspace
> without being interrupted.

Fair enough.  The right place to fix this is in turnstile_unpend() though I 
think.  I have had these patches that try to "clump" setrunqueue's before 
preempting lying around (but not thoroughly tested yet) that might fix this 
as well but in the turnstile code itself:

--- //depot/projects/smpng/sys/kern/kern_thread.c	2004/09/22 15:31:15
+++ //depot/user/jhb/preemption/kern/kern_thread.c	2004/09/22 16:59:47
@@ -954,6 +954,7 @@
 	p->p_suspcount++;
 	TD_SET_SUSPENDED(td);
 	TAILQ_INSERT_TAIL(&p->p_suspended, td, td_runq);
+#if 0
 	/*
 	 * Hack: If we are suspending but are on the sleep queue
 	 * then we are in msleep or the cv equivalent. We
@@ -962,6 +963,7 @@
 	 */
 	if (TD_ON_SLEEPQ(td))
 		TD_SET_SLEEPING(td);
+#endif
 }
 
 void
@@ -988,9 +990,11 @@
 	mtx_assert(&sched_lock, MA_OWNED);
 	PROC_LOCK_ASSERT(p, MA_OWNED);
 	if (!P_SHOULDSTOP(p)) {
+		critical_enter();
 		while ((td = TAILQ_FIRST(&p->p_suspended))) {
 			thread_unsuspend_one(td);
 		}
+		critical_exit();
 	} else if ((P_SHOULDSTOP(p) == P_STOPPED_SINGLE) &&
 	    (p->p_numthreads == p->p_suspcount)) {
 		/*
@@ -1025,9 +1029,11 @@
 	 * to continue however as this is a bad place to stop.
 	 */
 	if ((p->p_numthreads != 1) && (!P_SHOULDSTOP(p))) {
-		while (( td = TAILQ_FIRST(&p->p_suspended))) {
+		critical_enter();
+		while ((td = TAILQ_FIRST(&p->p_suspended))) {
 			thread_unsuspend_one(td);
 		}
+		critical_exit();
 	}
 	mtx_unlock_spin(&sched_lock);
 }
--- //depot/projects/smpng/sys/kern/subr_sleepqueue.c	2004/08/20 17:10:02
+++ //depot/user/jhb/preemption/kern/subr_sleepqueue.c	2004/09/10 21:36:10
@@ -400,9 +400,10 @@
 	 * just return.
 	 */
 	if (td->td_sleepqueue != NULL) {
-		MPASS(!TD_ON_SLEEPQ(td));
 		mtx_unlock_spin(&sc->sc_lock);
 		mtx_lock_spin(&sched_lock);
+		MPASS(!TD_ON_SLEEPQ(td));
+		MPASS(!TD_IS_SLEEPING(td));
 		return;
 	}
 
@@ -709,11 +710,13 @@
 	sleepq_release(wchan);
 
 	/* Resume all the threads on the temporary list. */
+	critical_enter();
 	while (!TAILQ_EMPTY(&list)) {
 		td = TAILQ_FIRST(&list);
 		TAILQ_REMOVE(&list, td, td_slpq);
 		sleepq_resume_thread(td, pri);
 	}
+	critical_exit();
 }
 
 /*
--- //depot/projects/smpng/sys/kern/subr_turnstile.c	2004/09/03 14:14:21
+++ //depot/user/jhb/preemption/kern/subr_turnstile.c	2004/09/10 21:36:10
@@ -727,6 +726,7 @@
 	 * in turnstile_wait().  Set a flag to force it to try to acquire
 	 * the lock again instead of blocking.
 	 */
+	critical_enter();
 	while (!TAILQ_EMPTY(&pending_threads)) {
 		td = TAILQ_FIRST(&pending_threads);
 		TAILQ_REMOVE(&pending_threads, td, td_lockq);
@@ -742,6 +742,7 @@
 			MPASS(TD_IS_RUNNING(td) || TD_ON_RUNQ(td));
 		}
 	}
+	critical_exit();
 	mtx_unlock_spin(&sched_lock);
 }
 
--- //depot/projects/smpng/sys/vm/vm_glue.c	2004/09/22 15:31:15
+++ //depot/user/jhb/preemption/vm/vm_glue.c	2004/09/22 16:59:47
@@ -753,6 +753,7 @@
 			vm_thread_swapin(td);
 
 		PROC_LOCK(p);
+		critical_enter();
 		mtx_lock_spin(&sched_lock);
 		p->p_sflag &= ~PS_SWAPPINGIN;
 		p->p_sflag |= PS_INMEM;
@@ -767,6 +768,7 @@
 
 		/* Allow other threads to swap p out now. */
 		--p->p_lock;
+		critical_exit();
 	}
 #endif /* NO_SWAPPING */
 }


I.e., you could just move the critical_enter() in subr_turnstile.c earlier so 
it is before the mtx_unlock_spin() of the turnstile chain lock.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Thu Sep 30 18:30:56 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1C29016A4CE
	for <freebsd-arch@FreeBSD.org>; Thu, 30 Sep 2004 18:30:56 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 7232643D48
	for <freebsd-arch@FreeBSD.org>; Thu, 30 Sep 2004 18:30:55 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 2097 invoked by uid 89); 30 Sep 2004 18:30:53 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 30 Sep 2004 18:30:53 -0000
Received: (qmail 2076 invoked by uid 89); 30 Sep 2004 18:30:53 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 30 Sep 2004 18:30:53 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8UIUpmt022172;
	Thu, 30 Sep 2004 14:30:52 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <200409301017.54350.jhb@FreeBSD.org>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <200409291652.29990.jhb@FreeBSD.org>
	 <1096496057.3733.2163.camel@palm.tree.com>
	 <200409301017.54350.jhb@FreeBSD.org>
Content-Type: text/plain
Message-Id: <1096569051.21577.23.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Thu, 30 Sep 2004 14:30:51 -0400
Content-Transfer-Encoding: 7bit
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2004 18:30:56 -0000

On Thu, 2004-09-30 at 10:17, John Baldwin wrote:
> Fair enough.  The right place to fix this is in turnstile_unpend() though I 
> think.  I have had these patches that try to "clump" setrunqueue's before 
> preempting lying around (but not thoroughly tested yet) that might fix this 
> as well but in the turnstile code itself:
- snip -
> --- //depot/projects/smpng/sys/kern/subr_turnstile.c	2004/09/03 14:14:21
> +++ //depot/user/jhb/preemption/kern/subr_turnstile.c	2004/09/10 21:36:10
> @@ -727,6 +726,7 @@
>  	 * in turnstile_wait().  Set a flag to force it to try to acquire
>  	 * the lock again instead of blocking.
>  	 */
> +	critical_enter();
>  	while (!TAILQ_EMPTY(&pending_threads)) {
>  		td = TAILQ_FIRST(&pending_threads);
>  		TAILQ_REMOVE(&pending_threads, td, td_lockq);
> @@ -742,6 +742,7 @@
>  			MPASS(TD_IS_RUNNING(td) || TD_ON_RUNQ(td));
>  		}
>  	}
> +	critical_exit();
>  	mtx_unlock_spin(&sched_lock);
>  }
-snip -
> 
> I.e., you could just move the critical_enter() in subr_turnstile.c earlier so 
> it is before the mtx_unlock_spin() of the turnstile chain lock.

I agree - this would be the right place.
I was originally planning to do some more work in kern_mutex and did not
want to touch more than one file ;-)
Can you check this in?

Your other patches look like they are targeted to avoid senseless
switching to improve performance - but should not have an impact on
correct function. Right ?
Hopefully I get some time to look at them more closely later on.


	Stephan


From owner-freebsd-arch@FreeBSD.ORG  Thu Sep 30 20:38:30 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E03A316A4D0
	for <freebsd-arch@FreeBSD.org>; Thu, 30 Sep 2004 20:38:30 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 24EF443D1D
	for <freebsd-arch@FreeBSD.org>; Thu, 30 Sep 2004 20:38:30 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 13021 invoked from network); 30 Sep 2004 20:38:27 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 30 Sep 2004 20:38:27 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i8UKcRCs055209;
	Thu, 30 Sep 2004 22:38:27 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i8UKcQti055208;
	Thu, 30 Sep 2004 22:38:26 +0200 (CEST)
	(envelope-from pho)
Date: Thu, 30 Sep 2004 22:38:26 +0200
From: Peter Holm <peter@holm.cc>
To: John Baldwin <jhb@FreeBSD.org>
Message-ID: <20040930203826.GA55153@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<200409301017.54350.jhb@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200409301017.54350.jhb@FreeBSD.org>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Stephan Uphoff <ups@tree.com>
cc: Julian Elischer <julian@elischer.org>
cc: freebsd-arch@FreeBSD.org
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2004 20:38:31 -0000

On Thu, Sep 30, 2004 at 10:17:54AM -0400, John Baldwin wrote:
> On Wednesday 29 September 2004 06:14 pm, Stephan Uphoff wrote:
> > On Wed, 2004-09-29 at 16:52, John Baldwin wrote:
> > > > > OK - here is a crude patch to fix some problems with mutex priority
> > > > > inheritance. My theory is that the clock thread gets stuck waiting on
> > > > > GIANT.
> > > > >
> > > > > During release/acquisition of a contested sleep mutex there are a few
> > > > > windows where a task can be preempted when actions (waking up blocked
> > > > > threads, ownership of the mutex, ..) need to be atomic as far as
> > > > > scheduling is concerned. Otherwise priority inheritance may fail. The
> > > > > patch uses critical_enter/critical_exit to protect these regions
> > > > > against preemption.
> > > > >
> > > > > It would be great if could run this in addition to the other patches.
> > >
> > > turnstile_claim() doesn't make any threads runnable and thus can't
> > > preempt. The other place is supposed to preempt, and it should be ok to
> > > do so.  Note that since the turnstile chain lock is held, that includes a
> > > nested critical section and any preemption will be deferred until the
> > > turnstile lock is released via turnstile_release which happens in the
> > > middle of
> > > turnstile_unpend() after it has finished building a list of all the
> > > threads to be made runnable so that the turnstile object can be re-used
> > > safely.  I don't think this patch will make much of a difference (if
> > > any).  Can you provide a description of a case where you think the
> > > priority inheritance can fail if turnstile_unpend() doesn't run in a
> > > nested critical section?
> >
> > This is a bit of a mind bender.
> > I hope you have some aspirins close by ;-)
> >
> > Thread A holds a mutex x contested by Thread B and has priority pri(A).
> > Thread B holds a mutex y.
> > There is a thread C with priority pri(C) with pri(C) < pri(A).
> >
> > Thread A is in the process of releasing x.
> > It removes thread B from the turnstile and holds a pointer to B in a
> > private list.
> > Thread A sets the owner of the turnstile to NULL and releases all spin
> > locks. ( mtx_unlock_spin(&tc->tc_lock); line 148)
> > This means interrupts are now enabled.
> >
> > An interrupt occurs (or is already pending) and the interrupt handler
> > puts the associated interrupt thread I on the run queue.
> > This causes a preemption from A to I.
> > The interrupt thread I tries to acquire mutex y owned by B and blocks.
> > I donates its priority to B - but inheritance stops at B.
> > The next thread with the best priority is C and the cpu switches to C.
> > However B needs A to run to make it to the run-queue.
> >
> > If y is GIANT and I is the clock thread C could run forever in userspace
> > without being interrupted.
> 
> Fair enough.  The right place to fix this is in turnstile_unpend() though I 
> think.  I have had these patches that try to "clump" setrunqueue's before 
> preempting lying around (but not thoroughly tested yet) that might fix this 
> as well but in the turnstile code itself:
> 
> --- //depot/projects/smpng/sys/kern/kern_thread.c	2004/09/22 15:31:15
> +++ //depot/user/jhb/preemption/kern/kern_thread.c	2004/09/22 16:59:47
> @@ -954,6 +954,7 @@
>  	p->p_suspcount++;
>  	TD_SET_SUSPENDED(td);
>  	TAILQ_INSERT_TAIL(&p->p_suspended, td, td_runq);
> +#if 0
>  	/*
>  	 * Hack: If we are suspending but are on the sleep queue
>  	 * then we are in msleep or the cv equivalent. We
> @@ -962,6 +963,7 @@
>  	 */
>  	if (TD_ON_SLEEPQ(td))
>  		TD_SET_SLEEPING(td);
> +#endif
>  }
>  
>  void
> @@ -988,9 +990,11 @@
>  	mtx_assert(&sched_lock, MA_OWNED);
>  	PROC_LOCK_ASSERT(p, MA_OWNED);
>  	if (!P_SHOULDSTOP(p)) {
> +		critical_enter();
>  		while ((td = TAILQ_FIRST(&p->p_suspended))) {
>  			thread_unsuspend_one(td);
>  		}
> +		critical_exit();
>  	} else if ((P_SHOULDSTOP(p) == P_STOPPED_SINGLE) &&
>  	    (p->p_numthreads == p->p_suspcount)) {
>  		/*
> @@ -1025,9 +1029,11 @@
>  	 * to continue however as this is a bad place to stop.
>  	 */
>  	if ((p->p_numthreads != 1) && (!P_SHOULDSTOP(p))) {
> -		while (( td = TAILQ_FIRST(&p->p_suspended))) {
> +		critical_enter();
> +		while ((td = TAILQ_FIRST(&p->p_suspended))) {
>  			thread_unsuspend_one(td);
>  		}
> +		critical_exit();
>  	}
>  	mtx_unlock_spin(&sched_lock);
>  }
> --- //depot/projects/smpng/sys/kern/subr_sleepqueue.c	2004/08/20 17:10:02
> +++ //depot/user/jhb/preemption/kern/subr_sleepqueue.c	2004/09/10 21:36:10
> @@ -400,9 +400,10 @@
>  	 * just return.
>  	 */
>  	if (td->td_sleepqueue != NULL) {
> -		MPASS(!TD_ON_SLEEPQ(td));
>  		mtx_unlock_spin(&sc->sc_lock);
>  		mtx_lock_spin(&sched_lock);
> +		MPASS(!TD_ON_SLEEPQ(td));
> +		MPASS(!TD_IS_SLEEPING(td));
>  		return;
>  	}
>  
> @@ -709,11 +710,13 @@
>  	sleepq_release(wchan);
>  
>  	/* Resume all the threads on the temporary list. */
> +	critical_enter();
>  	while (!TAILQ_EMPTY(&list)) {
>  		td = TAILQ_FIRST(&list);
>  		TAILQ_REMOVE(&list, td, td_slpq);
>  		sleepq_resume_thread(td, pri);
>  	}
> +	critical_exit();
>  }
>  
>  /*
> --- //depot/projects/smpng/sys/kern/subr_turnstile.c	2004/09/03 14:14:21
> +++ //depot/user/jhb/preemption/kern/subr_turnstile.c	2004/09/10 21:36:10
> @@ -727,6 +726,7 @@
>  	 * in turnstile_wait().  Set a flag to force it to try to acquire
>  	 * the lock again instead of blocking.
>  	 */
> +	critical_enter();
>  	while (!TAILQ_EMPTY(&pending_threads)) {
>  		td = TAILQ_FIRST(&pending_threads);
>  		TAILQ_REMOVE(&pending_threads, td, td_lockq);
> @@ -742,6 +742,7 @@
>  			MPASS(TD_IS_RUNNING(td) || TD_ON_RUNQ(td));
>  		}
>  	}
> +	critical_exit();
>  	mtx_unlock_spin(&sched_lock);
>  }
>  
> --- //depot/projects/smpng/sys/vm/vm_glue.c	2004/09/22 15:31:15
> +++ //depot/user/jhb/preemption/vm/vm_glue.c	2004/09/22 16:59:47
> @@ -753,6 +753,7 @@
>  			vm_thread_swapin(td);
>  
>  		PROC_LOCK(p);
> +		critical_enter();
>  		mtx_lock_spin(&sched_lock);
>  		p->p_sflag &= ~PS_SWAPPINGIN;
>  		p->p_sflag |= PS_INMEM;
> @@ -767,6 +768,7 @@
>  
>  		/* Allow other threads to swap p out now. */
>  		--p->p_lock;
> +		critical_exit();
>  	}
>  #endif /* NO_SWAPPING */
>  }
> 
> 
> I.e., you could just move the critical_enter() in subr_turnstile.c earlier so 
> it is before the mtx_unlock_spin() of the turnstile chain lock.
> 
> -- 
> John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

This patch did not seem to make the freeze problem go away.

-- 
Peter Holm

From owner-freebsd-arch@FreeBSD.ORG  Thu Sep 30 22:00:52 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3AF2916A4EC
	for <freebsd-arch@freebsd.org>; Thu, 30 Sep 2004 22:00:52 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id AB05243D39
	for <freebsd-arch@freebsd.org>; Thu, 30 Sep 2004 22:00:51 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 13106 invoked by uid 89); 30 Sep 2004 22:00:50 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 30 Sep 2004 22:00:50 -0000
Received: (qmail 13091 invoked by uid 89); 30 Sep 2004 22:00:50 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 30 Sep 2004 22:00:50 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i8UM0nmt023181;
	Thu, 30 Sep 2004 18:00:49 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20040930075759.GA52233@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1095529353.31297.1192.camel@palm.tree.com>
	 <1096135220.53798.17754.camel@palm.tree.com>
	 <20040926075218.GA85983@peter.osted.lan>
	 <1096339936.3733.279.camel@palm.tree.com>
	 <20040928074926.GA99957@peter.osted.lan>
	 <1096383103.3733.312.camel@palm.tree.com>
	 <20040929085748.GA19695@peter.osted.lan>
	 <1096467843.3733.1145.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
	 <20040930075759.GA52233@peter.osted.lan>
Content-Type: text/plain
Message-Id: <1096581649.21577.88.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Thu, 30 Sep 2004 18:00:49 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2004 22:00:52 -0000

On Thu, 2004-09-30 at 03:57, Peter Holm wrote:
> OK, did so. Doesn't seem to make any difference.
> In order to spot a freeze I have instrumented hardclock() to report if
> Giant is being held more than 60 seconds. I don't know if this is any
> help, but here are examples of two freezes, both unfrozen by ping:

I will try to reproduce your environment here.
Are you running your tests as root?

	Stephan 

From owner-freebsd-arch@FreeBSD.ORG  Thu Sep 30 23:23:06 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0D99316A4CE
	for <arch@freebsd.org>; Thu, 30 Sep 2004 23:23:06 +0000 (GMT)
Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CD0B843D48
	for <arch@freebsd.org>; Thu, 30 Sep 2004 23:23:05 +0000 (GMT)
	(envelope-from brdavis@odin.ac.hmc.edu)
Received: from odin.ac.hmc.edu (localhost.localdomain [127.0.0.1])
	by odin.ac.hmc.edu (8.13.0/8.13.0) with ESMTP id i8UNRAPt024585
	for <arch@freebsd.org>; Thu, 30 Sep 2004 16:27:10 -0700
Received: (from brdavis@localhost)
	by odin.ac.hmc.edu (8.13.0/8.13.0/Submit) id i8UNRAjc024584
	for arch@freebsd.org; Thu, 30 Sep 2004 16:27:10 -0700
Date: Thu, 30 Sep 2004 16:27:10 -0700
From: Brooks Davis <brooks@one-eyed-alien.net>
To: arch@freebsd.org
Message-ID: <20040930232710.GA19905@odin.ac.hmc.edu>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="h31gzZEtNLTqOjlF"
Content-Disposition: inline
User-Agent: Mutt/1.4.1i
X-Virus-Scanned: by amavisd-new
X-Spam-Status: No, hits=0.0 required=8.0 tests=none autolearn=no version=2.63
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on odin.ac.hmc.edu
Subject: mtree before mounting /usr
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2004 23:23:06 -0000


--h31gzZEtNLTqOjlF
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I'm trying to remove the dependencies on tools in /usr from the
/etc/rc.d/var script.  I've managed to eliminate touch and newsyslog,
but I would like some feedback on the best way to eliminate mtree.
Currently mtree lives in /usr/sbin.  I need to populate the empty md(4)
based /var.  There are two main approaches I can think of:

 - Move mtree to /sbin.  We'd probalby have to leave a symlink behind,
   but it would work and be fairly easy.
 - Add support to bsdtar for reading mtree files and use that
   functionality to create pax archives of BSD.var.dist and
   BSD.sendmail.dist.

Comments?

-- Brooks

--=20
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

--h31gzZEtNLTqOjlF
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBXJZNXY6L6fI4GtQRAhp/AJ0W25BulM1hYb07fEGdSYOQAf4AcQCgmDdb
L6y1p25NScA2q2ROqMhojX8=
=jqrs
-----END PGP SIGNATURE-----

--h31gzZEtNLTqOjlF--

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 04:18:27 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6EF9416A4CE
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 04:18:27 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 13E2A43D5A
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 04:18:26 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 18303 invoked by uid 89); 1 Oct 2004 04:13:04 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 04:13:04 -0000
Received: (qmail 18222 invoked by uid 89); 1 Oct 2004 04:13:02 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 04:13:02 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i914D1mt024697;
	Fri, 1 Oct 2004 00:13:01 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <1096496057.3733.2163.camel@palm.tree.com>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
	 <1096489576.3733.1868.camel@palm.tree.com>
	 <200409291652.29990.jhb@FreeBSD.org>
	 <1096496057.3733.2163.camel@palm.tree.com>
Content-Type: text/plain
Message-Id: <1096603981.21577.195.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Fri, 01 Oct 2004 00:13:01 -0400
Content-Transfer-Encoding: 7bit
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 04:18:27 -0000

On Wed, 2004-09-29 at 18:14, Stephan Uphoff wrote:
> I was looking at the MUTEX_WAKE_ALL undefined case when I used the
> critical section for turnstile_claim().
> However there are bigger problems with MUTEX_WAKE_ALL undefined
> so you are right - the critical section for turnstile_claim is pretty
> useless.

Arghhh !!!

MUTEX_WAKE_ALL is NOT an option in GENERIC.
I recall verifying that it is defined twice. Guess I must have looked at
the wrong source tree :-(
This means yes - we have bigger problems!

Example:

Thread A holds a mutex x contested by Thread B and C and has priority
pri(A).

Thread C holds a mutex y and pri(B) < pri(C)

Thread A releases the lock wakes thread B but lets C on the turnstile
wait queue.

An interrupt thread I tries to lock mutex y owned by C.

However priority inheritance does not work since B needs to run first to
take ownership of the lock.

I is blocked :-(

This was found using Peter Holm's test and a slight modification of this
giant hog detector. (kern_clock.diff)

I definitely won't have time to fix kern_mutex.c for the next few days
so please add the line:

options		MUTEX_WAKE_ALL		# Needed do not remove

to your configuration files.

I also had overlooked 
	 	http://www.holm.cc/stress/log/cons80.html
Showing that my patch for kern_switch.c (switch_patch) has a bug.
I will send an updated patch later today.

	Stephan

PS: I love the firewire debugging speed!

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 05:23:25 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E32D516A6CA
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 05:23:25 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 6424A43D41
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 05:23:25 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 13830 invoked by uid 89); 1 Oct 2004 05:23:24 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:23:24 -0000
Received: (qmail 13818 invoked by uid 89); 1 Oct 2004 05:23:24 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:23:24 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i915NLmt025003;
	Fri, 1 Oct 2004 01:23:22 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <1096603981.21577.195.camel@palm.tree.com>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
	 <1096489576.3733.1868.camel@palm.tree.com>
	 <200409291652.29990.jhb@FreeBSD.org>
	 <1096496057.3733.2163.camel@palm.tree.com>
	 <1096603981.21577.195.camel@palm.tree.com>
Content-Type: multipart/mixed; boundary="=-fjzIiysJoZMWtL83qsgh"
Message-Id: <1096608201.21577.203.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Fri, 01 Oct 2004 01:23:21 -0400
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 05:23:26 -0000


--=-fjzIiysJoZMWtL83qsgh
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:

> I also had overlooked 
> 	 	http://www.holm.cc/stress/log/cons80.html
> Showing that my patch for kern_switch.c (switch_patch) has a bug.
> I will send an updated patch later today.

OK - here is the promised patch.


--=-fjzIiysJoZMWtL83qsgh
Content-Disposition: attachment; filename=switch_patch_v2
Content-Type: text/x-patch; name=switch_patch_v2; charset=ASCII
Content-Transfer-Encoding: 7bit

Index: kern_switch.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_switch.c,v
retrieving revision 1.95
diff -u -r1.95 kern_switch.c
--- kern_switch.c	19 Sep 2004 18:34:17 -0000	1.95
+++ kern_switch.c	1 Oct 2004 05:15:16 -0000
@@ -315,6 +315,106 @@
 	td->td_priority = newpri;
 	setrunqueue(td, SRQ_BORING);
 }
+
+
+/*
+ * This function is called when a thread is about to be put on a
+ * ksegrp run queue because it has been made runnable or its 
+ * priority has been adjusted and the ksegrp does not have a 
+ * free kse slot.  It determines if a thread from the same ksegrp
+ * should be preempted.  If so, it tries to switch threads
+ * if the thread is on the same cpu or notifies another cpu that
+ * it should switch threads. 
+ */
+
+static void
+maybe_preempt_in_ksegrp(struct thread *td)
+{
+#if  defined(SMP)
+	int highest_pri;
+	struct ksegrp *kg;
+	cpumask_t cpumask,dontuse;
+	struct pcpu *pc;
+	struct pcpu *highest_pcpu;
+	struct thread *running_thread;
+
+#ifndef FULL_PREEMPTION
+	int pri;
+
+	pri = td->td_priority;
+
+	if (!(pri >= PRI_MIN_ITHD && pri <= PRI_MAX_ITHD))
+	  return;
+#endif
+
+	mtx_assert(&sched_lock, MA_OWNED);
+
+	running_thread = curthread;
+
+#if !defined(KSEG_PEEMPT_BEST_CPU)
+	if(running_thread->td_ksegrp != td->td_ksegrp)
+#endif
+		{
+			kg = td->td_ksegrp;
+
+			/* Anyone waiting in front ? */
+			if(td != TAILQ_FIRST(&kg->kg_runq))  {
+				return; /* Yes - wait your turn*/
+			}
+			highest_pri  = td->td_priority;
+			highest_pcpu = NULL;
+			dontuse      = stopped_cpus | idle_cpus_mask;
+
+			/* Find a cpu with the worst priority that runs at thread from the
+			 * same  ksegrp - if multiple exist give first the last run cpu and then
+			 * the current cpu priority 
+			 */
+
+			SLIST_FOREACH(pc, &cpuhead, pc_allcpu) {
+				cpumask = pc->pc_cpumask;
+				if ( (cpumask & dontuse) == 0 && 
+				     pc->pc_curthread->td_ksegrp == kg) {
+					if (pc->pc_curthread->td_priority > highest_pri) {
+						highest_pri  = pc->pc_curthread->td_priority;
+						highest_pcpu = pc;
+					} else if (pc->pc_curthread->td_priority == highest_pri &&
+						   highest_pcpu != NULL) {
+						if (td->td_lastcpu == pc->pc_cpuid ||
+						    (PCPU_GET(cpumask) == cpumask &&
+						     td->td_lastcpu != highest_pcpu->pc_cpuid)) {
+							highest_pcpu = pc;
+						}
+					}
+				}
+			}
+			
+			/* Check if we need to preempt someone */
+			if (highest_pcpu == NULL) return;
+
+			if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) {
+				highest_pcpu->pc_curthread->td_flags |= TDF_NEEDRESCHED;
+				ipi_selected(highest_pcpu->pc_cpumask, IPI_AST);
+				return;
+			}
+		}
+#else
+	KASSERT(running_thread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread"));
+#endif
+
+	if  (td->td_priority > running_thread->td_priority)
+		return;
+#ifdef PREEMPTION
+	if (running_thread->td_critnest > 1) {
+		running_thread->td_pflags |= TDP_OWEPREEMPT;
+	} else {
+		mi_switch(SW_INVOL, NULL);
+	}
+#else
+	running_thread->td_flags |= TDF_NEEDRESCHED;
+#endif
+	return;
+}
+
 int limitcount;
 void
 setrunqueue(struct thread *td, int flags)
@@ -422,6 +522,7 @@
 	} else {
 		CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d",
 			td, td->td_ksegrp, td->td_proc->p_pid);
+		maybe_preempt_in_ksegrp(td);
 	}
 }
 

--=-fjzIiysJoZMWtL83qsgh--

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 05:55:33 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4795616A4CF
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 05:55:33 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id D6D2443D5A
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 05:55:32 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 32529 invoked by uid 89); 1 Oct 2004 05:55:31 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:55:31 -0000
Received: (qmail 32490 invoked by uid 89); 1 Oct 2004 05:55:31 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 05:55:31 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i915tUmt025109;
	Fri, 1 Oct 2004 01:55:30 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Content-Type: text/plain
Message-Id: <1096610130.21577.219.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Fri, 01 Oct 2004 01:55:30 -0400
Content-Transfer-Encoding: 7bit
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
Subject: sched_switch (sched_4bsd) may be preempted
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 05:55:33 -0000

sched_switch (sched_4bsd) may be preempted in setrunqueue or slot_fill.
This could be ugly.
Wrapping it into a critical section and resetting TDP_OWEPREEMPT should
work.

Hand trimmed patch:

RCS file: /cvsroot/src/sys/kern/sched_4bsd.c,v
retrieving revision 1.65
diff -u -r1.65 sched_4bsd.c
--- sys/kern/sched_4bsd.c       16 Sep 2004 07:12:59 -0000      1.65
+++ sys/kern/sched_4bsd.c       1 Oct 2004 05:35:28 -0000
@@ -823,6 +823,7 @@
                TD_SET_CAN_RUN(td);
        else {
                td->td_ksegrp->kg_avail_opennings++;
+               critical_enter();
                if (TD_IS_RUNNING(td)) {
                        /* Put us back on the run queue (kse and all).
*/
                        setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING);
@@ -834,6 +835,8 @@
                         */
                        slot_fill(td->td_ksegrp);
                }
+               critical_exit();
+               td->td_pflags &= ~TDP_OWEPREEMPT;
        }
        if (newtd == NULL)
                newtd = choosethread();


From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 06:04:22 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id F143B16A4CE; Fri,  1 Oct 2004 06:04:22 +0000 (GMT)
Received: from harmony.village.org (rover.village.org [168.103.84.182])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7A46843D45; Fri,  1 Oct 2004 06:04:22 +0000 (GMT)
	(envelope-from imp@bsdimp.com)
Received: from localhost (harmony.village.org [10.0.0.6])
	by harmony.village.org (8.13.1/8.13.1) with ESMTP id i9163SWw043087;
	Fri, 1 Oct 2004 00:03:28 -0600 (MDT)
	(envelope-from imp@bsdimp.com)
Date: Fri, 01 Oct 2004 00:04:52 -0600 (MDT)
Message-Id: <20041001.000452.99281901.imp@bsdimp.com>
To: sah@softcardsystems.com
From: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <Pine.LNX.4.60.0409300903070.6230@athena>
References: <41533E0D.9000908@elischer.org>
	<1095976309.53798.8390.camel@palm.tree.com>
	<Pine.LNX.4.60.0409300903070.6230@athena>
X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-arch@freebsd.org
cc: re@freebsd.org
cc: julian@elischer.org
cc: ups@tree.com
Subject: Re: AoE for 4.x
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 06:04:23 -0000

In message: <Pine.LNX.4.60.0409300903070.6230@athena>
            Sam <sah@softcardsystems.com> writes:
: I haven't heard any major objections to my getting a major
: number -- can someone please step up and help me out?

187.

Warner

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 07:57:48 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 21D0016A4CE
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 07:57:48 +0000 (GMT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CDDB043D53
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 07:57:47 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	i917vkvA017412;	Fri, 1 Oct 2004 00:57:46 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i917vjac017409;
	Fri, 1 Oct 2004 00:57:45 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 1 Oct 2004 00:57:45 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200410010757.i917vjac017409@apollo.backplane.com>
To: Stephan Uphoff <ups@tree.com>
References: <1096610130.21577.219.camel@palm.tree.com>
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: sched_switch (sched_4bsd) may be preempted
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 07:57:48 -0000

    I would put the entire scheduler core in a critical section, not just
    the part you think needs to be there.  It's just too critical a subsystem
    to be able to make operational assumptions of that nature in.  It is
    what I do in the DragonFly LWKT core, BTW, and crazy as I am I would
    never even consider trying to move the critical section further in.

    I would not reset TDP_OWEPREEMPT there.  If I understand its function
    correctly you need to leave it intact in order to detect preemption
    request races against the scheduler.  Since at that point newtd may
    be non-NULL and thus not cause another scheduling queue check to be
    made before the next switch, you cannot safely clear the flag where you
    are clearing it.

    If you want to optimize operation of the flag I recommend storing the
    preempting entity's priority at the same point where TDP_OWEPREEMPT is 
    set and then do a quick priority comparison in critical_exit() to avoid
    unnecessary mi_switch()'s.  I would also not put the TDP_OWEPREEMPT flag
    in the thread structure.  It really belongs in the globaldata structure
    so it remains properly intact through the thread switch, else you have
    more potential races even while *in* the critical section.  Your
    TDP_OWEPREEMPT flag has almost exactly the same function as DFly's
    gd_reqflags word and I spent a long time thinking through where I would
    store it, and came to the conclusion that the globaldata structure was
    the best place.

    e.g. so FreeBSD's critical_exit() code would become this (note: I might
    have the priority comparison backwards, I forget how FBsd does it):

        if (td->td_critnest == 1) {
#ifdef PREEMPTION
                mtx_assert(&sched_lock, MA_NOTOWNED);
                if (gd->gd_pflags & GDP_OWEPREEMPT) {		<<< CHG TO gd
			gd->gd_pflags &= ~GDP_OWEPREEMPT;	<<< CHG TO gd
			if (gd->gd_preempt_priority < td->td_priority) { << ADD
				mtx_lock_spin(&sched_lock);
				mi_switch(SW_INVOL, NULL);
				mtx_unlock_spin(&sched_lock);
			}
                }
#endif
                td->td_critnest = 0;
                cpu_critical_exit(td);

    And the code which sets GDP_OWEPREEMPT would become this:

	[checks whether preemption is desired]
        if (ctd->td_critnest > 1) {
                CTR1(KTR_PROC, "maybe_preempt: in critical section %d",
                    ctd->td_critnest);
		if ((gd->gd_pflags & GDP_OWEPREEMPT) == 0 ||	<< ADD (gd)
		    pri < gd->gd_preempt_priority) {		<< ADD (gd)
			gd->gd_pflags |= GDP_OWEPREEMPT;	<< CHG (gd)
			gd->gd_preempt_priority = pri;		<< ADD (gd)
		}
                return (0);
        }

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


:sched_switch (sched_4bsd) may be preempted in setrunqueue or slot_fill.
:This could be ugly.
:Wrapping it into a critical section and resetting TDP_OWEPREEMPT should
:work.
:
:Hand trimmed patch:
:
:RCS file: /cvsroot/src/sys/kern/sched_4bsd.c,v
:retrieving revision 1.65
:diff -u -r1.65 sched_4bsd.c
:--- sys/kern/sched_4bsd.c       16 Sep 2004 07:12:59 -0000      1.65
:+++ sys/kern/sched_4bsd.c       1 Oct 2004 05:35:28 -0000
:@@ -823,6 +823,7 @@
:                TD_SET_CAN_RUN(td);
:        else {
:                td->td_ksegrp->kg_avail_opennings++;
:+               critical_enter();
:                if (TD_IS_RUNNING(td)) {
:                        /* Put us back on the run queue (kse and all).
:*/
:                        setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING);
:@@ -834,6 +835,8 @@
:                         */
:                        slot_fill(td->td_ksegrp);
:                }
:+               critical_exit();
:+               td->td_pflags &= ~TDP_OWEPREEMPT;
:        }
:        if (newtd == NULL)
:                newtd = choosethread();

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 11:08:21 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9E74A16A4CE
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 11:08:21 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 0B9E743D46
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 11:08:21 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 61626 invoked from network); 1 Oct 2004 11:08:19 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 1 Oct 2004 11:08:19 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91B8ICs058177;
	Fri, 1 Oct 2004 13:08:18 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i91B8HOt058176;
	Fri, 1 Oct 2004 13:08:17 +0200 (CEST)
	(envelope-from pho)
Date: Fri, 1 Oct 2004 13:08:17 +0200
From: Peter Holm <peter@holm.cc>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20041001110817.GA58111@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1096603981.21577.195.camel@palm.tree.com>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: John Baldwin <jhb@FreeBSD.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 11:08:21 -0000

On Fri, Oct 01, 2004 at 12:13:01AM -0400, Stephan Uphoff wrote:
> On Wed, 2004-09-29 at 18:14, Stephan Uphoff wrote:
> > I was looking at the MUTEX_WAKE_ALL undefined case when I used the
> > critical section for turnstile_claim().
> > However there are bigger problems with MUTEX_WAKE_ALL undefined
> > so you are right - the critical section for turnstile_claim is pretty
> > useless.
> 
> Arghhh !!!
> 
> MUTEX_WAKE_ALL is NOT an option in GENERIC.
> I recall verifying that it is defined twice. Guess I must have looked at
> the wrong source tree :-(
> This means yes - we have bigger problems!
> 
> Example:
> 
> Thread A holds a mutex x contested by Thread B and C and has priority
> pri(A).
> 
> Thread C holds a mutex y and pri(B) < pri(C)
> 
> Thread A releases the lock wakes thread B but lets C on the turnstile
> wait queue.
> 
> An interrupt thread I tries to lock mutex y owned by C.
> 
> However priority inheritance does not work since B needs to run first to
> take ownership of the lock.
> 
> I is blocked :-(
> 
> This was found using Peter Holm's test and a slight modification of this
> giant hog detector. (kern_clock.diff)
> 
> I definitely won't have time to fix kern_mutex.c for the next few days
> so please add the line:
> 
> options		MUTEX_WAKE_ALL		# Needed do not remove
> 

I like to test one thing at a time, so I added MUTEX_WAKE_ALL to HEAD from
Sep 30 09:58 UTC. This did not seem to change any thing :-(
I'll proceed with adding your switch_patch_v2 patch + your sched_4bsd.c patch,
but without MUTEX_WAKE_ALL.

- Peter

> to your configuration files.
> 
> I also had overlooked 
> 	 	http://www.holm.cc/stress/log/cons80.html
> Showing that my patch for kern_switch.c (switch_patch) has a bug.
> I will send an updated patch later today.
> 
> 	Stephan
> 
> PS: I love the firewire debugging speed!

-- 
Peter Holm

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 12:52:52 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9218616A4CE; Fri,  1 Oct 2004 12:52:52 +0000 (GMT)
Received: from athena.softcardsystems.com (mail.softcardsystems.com
	[12.34.136.114])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 1224B43D31; Fri,  1 Oct 2004 12:52:52 +0000 (GMT)
	(envelope-from sah@softcardsystems.com)
Received: from athena (athena [12.34.136.114])i91Dpelj014178;
	Fri, 1 Oct 2004 08:51:40 -0500
Date: Fri, 1 Oct 2004 08:51:40 -0500 (EST)
From: Sam <sah@softcardsystems.com>
X-X-Sender: sah@athena
To: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <20041001.000452.99281901.imp@bsdimp.com>
Message-ID: <Pine.LNX.4.60.0410010851190.14170@athena>
References: <41533E0D.9000908@elischer.org>
	<1095976309.53798.8390.camel@palm.tree.com>
	<20041001.000452.99281901.imp@bsdimp.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
cc: freebsd-arch@freebsd.org
cc: re@freebsd.org
cc: julian@elischer.org
cc: ups@tree.com
Subject: Re: AoE for 4.x
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 12:52:52 -0000

Is that block & char, or do i just need to specify the block?

On Fri, 1 Oct 2004, M. Warner Losh wrote:

> In message: <Pine.LNX.4.60.0409300903070.6230@athena>
>            Sam <sah@softcardsystems.com> writes:
> : I haven't heard any major objections to my getting a major
> : number -- can someone please step up and help me out?
>
> 187.
>
> Warner
>

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 14:10:45 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2083016A4CE
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 14:10:45 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 7877943D41
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 14:10:44 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 83835 invoked from network); 1 Oct 2004 14:10:42 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 1 Oct 2004 14:10:42 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91EAgV7001641;
	Fri, 1 Oct 2004 16:10:42 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i91EAfs4001640;
	Fri, 1 Oct 2004 16:10:41 +0200 (CEST)
	(envelope-from pho)
Date: Fri, 1 Oct 2004 16:10:40 +0200
From: Peter Holm <peter@holm.cc>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20041001141040.GA1556@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1096608201.21577.203.camel@palm.tree.com>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: John Baldwin <jhb@FreeBSD.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 14:10:45 -0000

On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote:
> On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:
> 
> > I also had overlooked 
> > 	 	http://www.holm.cc/stress/log/cons80.html
> > Showing that my patch for kern_switch.c (switch_patch) has a bug.
> > I will send an updated patch later today.
> 
> OK - here is the promised patch.
> 

For once I'm the bearer of good news. The switch_patch_v2 + the
sched_4bsd patch ran the tests for more than one hour without
any freeze. The sched_4bsd alone did not stop the freezes. I'm
now testing the switch_patch_v2 alone and it's looking good for
55+ minutes of testing.

-- 
Peter Holm

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 15:02:01 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9619516A4CE
	for <arch@FreeBSD.org>; Fri,  1 Oct 2004 15:02:01 +0000 (GMT)
Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4E97143D1D
	for <arch@FreeBSD.org>; Fri,  1 Oct 2004 15:02:01 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 4356 invoked from network); 1 Oct 2004 15:02:00 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 1 Oct 2004 15:02:00 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91F1sEG027282
	for <arch@FreeBSD.org>; Fri, 1 Oct 2004 11:01:55 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: arch@FreeBSD.org
Date: Fri, 1 Oct 2004 11:00:42 -0400
User-Agent: KMail/1.6.2
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-Id: <200410011100.42302.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
Subject: [PATCH] Rework how we store process times in the kernel and
	deferring calcru()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 15:02:01 -0000

I'll commit this soonish unless there are any objections.  The basic idea is 
to store process times resource usage as raw data (i.e. as bintimes and tick 
counts) for both process usage and child usage and only calculate the timeval 
style times if they are explicitly asked for.  This lets us avoid always 
calling calcru() to calculate the timeval values in exit1() for example.  A 
more detailed listing of the changes follows:

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  This also includes an additional fix so that calcru()
  now correctly handles threads from the process that are executing on other
  CPUs.  Also, the calcru() now only locks sched_lock internally while doing
  the rux_runtime fixup.  It now only requires the caller to hold the proc
  lock and calcru1() only requires the proc lock internally.  calcru() also no
  longer allows callers to ask for an interrupt timeval since none of them
  actually did.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.

As a side effect of storing the raw values, the accuracy of the process timing 
has been approved.  This makes benchmarking somewhat tricky as the appearance 
is that with this patch user times go way up but system times go way down.  
Thus, the only benchmarks I did were to compare real times and to also 
compare the sum of the user and system times to the real times.  Thus, here 
are the results on a kernel w/o debugging (when WITNESS + INVARIANTS were on, 
the extra overhead resulted in no statistical difference in the before and 
after).  For real times (100 runs of 10000 fork/wait loops):

x smpng.fast.real
+ proc.fast.real
+--------------------------------------------------------------------------+
|                  +                                                       |
|                  +                                                       |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                 x   x                             |
|                  +   +                 x   x                             |
|                  +   +                 x   x                             |
|                  +   +                 x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +          x   x  x   x  x                          |
|               +  +   +          x   x  x   x  x                          |
|               +  +   +   +      x   x  x   x  x                          |
|               +  +   +   +      x   x  x   x  x                          |
|               +  +   +   +      x   x  x   x  x   x                      |
|               +  +   +   +  +   *   x  x   x  x   x                      |
|           +   +  +   +   +  +   *   x  x   x  x   x                      |
|           +   +  +   +   +  +   *   x  x   x  x   x                      |
|       +   +   +  +   +   +  +   *   *  x   x  x   x              x       |
|+      +   +   +  +   +   +  +   *   *  *   x  x   x   x          x      x|
|              |___M__A_____|       |____M_A______|                        |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 100          2.97          3.08          2.99        2.9959   0.018968075
+ 100          2.88          2.99          2.93        2.9362   0.017568337
Difference at 95.0% confidence
        -0.0597 +/- 0.0050674
        -1.99272% +/- 0.169145%
        (Student's t, pooled s = 0.0182816)

So, close to about a 2% improvement.  As far as accuracy "improvements", the 
numbers comparing sum of user + sys compared to "real" time is:

x smpng.fast.real
+ smpng.fast.total
    N           Min           Max        Median           Avg        Stddev
x 100          2.97          3.08          2.99        2.9959   0.018968075
+ 100          2.83          2.93          2.86        2.8601   0.016111668
Difference at 95.0% confidence
        -0.1358 +/- 0.0048779
        -4.53286% +/- 0.162819%
        (Student's t, pooled s = 0.0175979)

And for the kernel with the patch:

x proc.fast.real
+ proc.fast.total
    N           Min           Max        Median           Avg        Stddev
x 100          2.88          2.99          2.93        2.9362   0.017568337
+ 100          2.85          2.96          2.92        2.9201   0.017551943
Difference at 95.0% confidence
        -0.0161 +/- 0.00486742
        -0.548328% +/- 0.165773%
        (Student's t, pooled s = 0.0175601)

Thus, the total counts are closer to the real times with the patch than 
without.  The missing counts can be interrupt time or time for other 
processes, of course.  Given that the box was idle and in the same situation 
for both tests and that these types of results were obtained across numerous 
repeated tests with several different benchmarks I think the difference in 
these last two is due to improved accuracy in the accounting.

The patch is at http://www.freebsd.org/~jhb/patches/rusage_ext.patch and is 
largely based on a patch given to me by bde@.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 15:02:04 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2287116A4D8
	for <arch@FreeBSD.org>; Fri,  1 Oct 2004 15:02:04 +0000 (GMT)
Received: from mail2.speakeasy.net (mail2.speakeasy.net [216.254.0.202])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D1C8943D4C
	for <arch@FreeBSD.org>; Fri,  1 Oct 2004 15:02:03 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 27300 invoked from network); 1 Oct 2004 15:02:03 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 1 Oct 2004 15:02:02 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91F1sEH027282
	for <arch@FreeBSD.org>; Fri, 1 Oct 2004 11:01:58 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: arch@FreeBSD.org
Date: Fri, 1 Oct 2004 11:02:43 -0400
User-Agent: KMail/1.6.2
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-Id: <200410011102.43394.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
Subject: [PATCH] Rework how we store process times in the kernel and
	deferring calcru()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 15:02:04 -0000

I'll commit this soonish unless there are any objections.  The basic idea is 
to store process times resource usage as raw data (i.e. as bintimes and tick 
counts) for both process usage and child usage and only calculate the timeval 
style times if they are explicitly asked for.  This lets us avoid always 
calling calcru() to calculate the timeval values in exit1() for example.  A 
more detailed listing of the changes follows:

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  This also includes an additional fix so that calcru()
  now correctly handles threads from the process that are executing on other
  CPUs.  Also, the calcru() now only locks sched_lock internally while doing
  the rux_runtime fixup.  It now only requires the caller to hold the proc
  lock and calcru1() only requires the proc lock internally.  calcru() also no
  longer allows callers to ask for an interrupt timeval since none of them
  actually did.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.

As a side effect of storing the raw values, the accuracy of the process timing 
has been approved.  This makes benchmarking somewhat tricky as the appearance 
is that with this patch user times go way up but system times go way down.  
Thus, the only benchmarks I did were to compare real times and to also 
compare the sum of the user and system times to the real times.  Thus, here 
are the results on a kernel w/o debugging (when WITNESS + INVARIANTS were on, 
the extra overhead resulted in no statistical difference in the before and 
after).  For real times (100 runs of 10000 fork/wait loops):

x smpng.fast.real
+ proc.fast.real
+--------------------------------------------------------------------------+
|                  +                                                       |
|                  +                                                       |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                                                   |
|                  +   +                 x   x                             |
|                  +   +                 x   x                             |
|                  +   +                 x   x                             |
|                  +   +                 x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x                             |
|                  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +              x  x   x  x                          |
|               +  +   +          x   x  x   x  x                          |
|               +  +   +          x   x  x   x  x                          |
|               +  +   +   +      x   x  x   x  x                          |
|               +  +   +   +      x   x  x   x  x                          |
|               +  +   +   +      x   x  x   x  x   x                      |
|               +  +   +   +  +   *   x  x   x  x   x                      |
|           +   +  +   +   +  +   *   x  x   x  x   x                      |
|           +   +  +   +   +  +   *   x  x   x  x   x                      |
|       +   +   +  +   +   +  +   *   *  x   x  x   x              x       |
|+      +   +   +  +   +   +  +   *   *  *   x  x   x   x          x      x|
|              |___M__A_____|       |____M_A______|                        |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 100          2.97          3.08          2.99        2.9959   0.018968075
+ 100          2.88          2.99          2.93        2.9362   0.017568337
Difference at 95.0% confidence
        -0.0597 +/- 0.0050674
        -1.99272% +/- 0.169145%
        (Student's t, pooled s = 0.0182816)

So, close to about a 2% improvement.  As far as accuracy "improvements", the 
numbers comparing sum of user + sys compared to "real" time is:

x smpng.fast.real
+ smpng.fast.total
    N           Min           Max        Median           Avg        Stddev
x 100          2.97          3.08          2.99        2.9959   0.018968075
+ 100          2.83          2.93          2.86        2.8601   0.016111668
Difference at 95.0% confidence
        -0.1358 +/- 0.0048779
        -4.53286% +/- 0.162819%
        (Student's t, pooled s = 0.0175979)

And for the kernel with the patch:

x proc.fast.real
+ proc.fast.total
    N           Min           Max        Median           Avg        Stddev
x 100          2.88          2.99          2.93        2.9362   0.017568337
+ 100          2.85          2.96          2.92        2.9201   0.017551943
Difference at 95.0% confidence
        -0.0161 +/- 0.00486742
        -0.548328% +/- 0.165773%
        (Student's t, pooled s = 0.0175601)

Thus, the total counts are closer to the real times with the patch than 
without the patch.  Given that these results were repeated numerous times 
with different benchmarks on an idle box in the same state I feel that these 
differences indicate an improvement in the accuracy of the accounting.

The patch is at http://www.FreeBSD.org/~jhb/patches/rusage_ext.patch and is 
largely based on a patch originally submitted by bde@.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 15:14:39 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1C96B16A4CE
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 15:14:39 +0000 (GMT)
Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E025443D46
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 15:14:38 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 6388 invoked from network); 1 Oct 2004 15:14:38 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 1 Oct 2004 15:14:38 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91FEYNi027393;
	Fri, 1 Oct 2004 11:14:35 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Fri, 1 Oct 2004 11:14:42 -0400
User-Agent: KMail/1.6.2
References: <200410011102.43394.jhb@FreeBSD.org>
In-Reply-To: <200410011102.43394.jhb@FreeBSD.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200410011114.42446.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: arch@FreeBSD.org
Subject: Re: [PATCH] Rework how we store process times in the kernel and
	deferring calcru()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 15:14:39 -0000

On Friday 01 October 2004 11:02 am, John Baldwin wrote:
> I'll commit this soonish unless there are any objections.  The basic idea
> is to store process times resource usage as raw data (i.e. as bintimes and
> tick counts) for both process usage and child usage and only calculate the
> timeval style times if they are explicitly asked for.  This lets us avoid
> always calling calcru() to calculate the timeval values in exit1() for
> example.  A more detailed listing of the changes follows:

Sorry for the dupe.  kmail crashed and I wasn't sure the first one had made 
it, esp. given that it came back up with a partial version of this e-mail 
(hence the two different endings).

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 15:14:39 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1F13D16A4CF
	for <arch@FreeBSD.org>; Fri,  1 Oct 2004 15:14:39 +0000 (GMT)
Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E041443D49
	for <arch@FreeBSD.org>; Fri,  1 Oct 2004 15:14:38 +0000 (GMT)
	(envelope-from jhb@FreeBSD.org)
Received: (qmail 6388 invoked from network); 1 Oct 2004 15:14:38 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <arch@FreeBSD.org>; 1 Oct 2004 15:14:38 -0000
Received: from [10.50.40.210] (gw1.twc.weather.com [216.133.140.1])
	(authenticated bits=0)
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i91FEYNi027393;
	Fri, 1 Oct 2004 11:14:35 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Date: Fri, 1 Oct 2004 11:14:42 -0400
User-Agent: KMail/1.6.2
References: <200410011102.43394.jhb@FreeBSD.org>
In-Reply-To: <200410011102.43394.jhb@FreeBSD.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200410011114.42446.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: arch@FreeBSD.org
Subject: Re: [PATCH] Rework how we store process times in the kernel and
	deferring calcru()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 15:14:39 -0000

On Friday 01 October 2004 11:02 am, John Baldwin wrote:
> I'll commit this soonish unless there are any objections.  The basic idea
> is to store process times resource usage as raw data (i.e. as bintimes and
> tick counts) for both process usage and child usage and only calculate the
> timeval style times if they are explicitly asked for.  This lets us avoid
> always calling calcru() to calculate the timeval values in exit1() for
> example.  A more detailed listing of the changes follows:

Sorry for the dupe.  kmail crashed and I wasn't sure the first one had made 
it, esp. given that it came back up with a partial version of this e-mail 
(hence the two different endings).

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 15:27:13 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3C93916A4CE
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 15:27:13 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id B962D43D39
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 15:27:12 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 18323 invoked by uid 89); 1 Oct 2004 15:27:10 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 15:27:10 -0000
Received: (qmail 18243 invoked by uid 89); 1 Oct 2004 15:27:09 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 15:27:09 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i91FR8mt027692;
	Fri, 1 Oct 2004 11:27:08 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
In-Reply-To: <200410010757.i917vjac017409@apollo.backplane.com>
References: <1096610130.21577.219.camel@palm.tree.com>
	 <200410010757.i917vjac017409@apollo.backplane.com>
Content-Type: text/plain
Message-Id: <1096644427.25800.26.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Fri, 01 Oct 2004 11:27:08 -0400
Content-Transfer-Encoding: 7bit
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: sched_switch (sched_4bsd) may be preempted
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 15:27:13 -0000

On Fri, 2004-10-01 at 03:57, Matthew Dillon wrote:
>     I would put the entire scheduler core in a critical section, not just
>     the part you think needs to be there.  It's just too critical a subsystem
>     to be able to make operational assumptions of that nature in.  It is
>     what I do in the DragonFly LWKT core, BTW, and crazy as I am I would
>     never even consider trying to move the critical section further in.

The core is wrapped in the sched_lock. And since it is a spin lock it is
running in a critical section with interrupts disabled.

The additional (recursive) critical_enter is just an abusive way to tell
maybe_preempt* that it should not immediately switch.
( Yes - eventually there should be a better way to do this)

> 
>     I would not reset TDP_OWEPREEMPT there.  If I understand its function
>     correctly you need to leave it intact in order to detect preemption
>     request races against the scheduler.  Since at that point newtd may
>     be non-NULL and thus not cause another scheduling queue check to be
>     made before the next switch, you cannot safely clear the flag where you
>     are clearing it.

This is all running in critical section and we just decided to switch
and either have or will pick the best thread. Interrupts are locked. The
additional critical section just prevents recursion problems by delaying
unwanted switches in maybe_preempt* . Resetting TDP_OWEPREEMPT is
perfectly save since we switch to the thread chosen while everything has
been locked.

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 16:13:29 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 02E2A16A4CE
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 16:13:29 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id 791CE43D55
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 16:13:28 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 1258 invoked by uid 89); 1 Oct 2004 16:13:26 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 16:13:26 -0000
Received: (qmail 570 invoked by uid 89); 1 Oct 2004 16:13:14 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 16:13:14 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i91GDEmt027945;
	Fri, 1 Oct 2004 12:13:14 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20041001141040.GA1556@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
	 <1096489576.3733.1868.camel@palm.tree.com>
	 <200409291652.29990.jhb@FreeBSD.org>
	 <1096496057.3733.2163.camel@palm.tree.com>
	 <1096603981.21577.195.camel@palm.tree.com>
	 <1096608201.21577.203.camel@palm.tree.com>
	 <20041001141040.GA1556@peter.osted.lan>
Content-Type: text/plain
Message-Id: <1096647194.27811.12.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Fri, 01 Oct 2004 12:13:14 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: John Baldwin <jhb@FreeBSD.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 16:13:29 -0000

On Fri, 2004-10-01 at 10:10, Peter Holm wrote:
> On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote:
> > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:
> > 
> > > I also had overlooked 
> > > 	 	http://www.holm.cc/stress/log/cons80.html
> > > Showing that my patch for kern_switch.c (switch_patch) has a bug.
> > > I will send an updated patch later today.
> > 
> > OK - here is the promised patch.
> > 
> 
> For once I'm the bearer of good news. The switch_patch_v2 + the
> sched_4bsd patch ran the tests for more than one hour without
> any freeze. The sched_4bsd alone did not stop the freezes. I'm
> now testing the switch_patch_v2 alone and it's looking good for
> 55+ minutes of testing.

Great !
I guess I should roll a cleaned up cumulative patch soon.

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 16:41:48 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D0B8F16A4CE
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 16:41:48 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 53ECD43D46
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 16:41:48 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 19372 invoked from network); 1 Oct 2004 16:41:28 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 1 Oct 2004 16:41:28 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91GfRV7002534;
	Fri, 1 Oct 2004 18:41:27 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i91GfR6i002533;
	Fri, 1 Oct 2004 18:41:27 +0200 (CEST)
	(envelope-from pho)
Date: Fri, 1 Oct 2004 18:41:27 +0200
From: Peter Holm <peter@holm.cc>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20041001164127.GA2468@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1096647194.27811.12.camel@palm.tree.com>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 16:41:48 -0000

On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote:
> On Fri, 2004-10-01 at 10:10, Peter Holm wrote:
> > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote:
> > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:
> > > 
> > > > I also had overlooked 
> > > > 	 	http://www.holm.cc/stress/log/cons80.html
> > > > Showing that my patch for kern_switch.c (switch_patch) has a bug.
> > > > I will send an updated patch later today.
> > > 
> > > OK - here is the promised patch.
> > > 
> > 
> > For once I'm the bearer of good news. The switch_patch_v2 + the
> > sched_4bsd patch ran the tests for more than one hour without
> > any freeze. The sched_4bsd alone did not stop the freezes. I'm
> > now testing the switch_patch_v2 alone and it's looking good for
> > 55+ minutes of testing.
> 
> Great !
> I guess I should roll a cleaned up cumulative patch soon.
> 
> 	Stephan

With switch_patch_v2 alone a freeze occured after more than one hour. So
now I'm back to testing switch_patch_v2 + sched_4bsd. I'd let that run for 
a while, just to be sure.

-- 
Peter Holm

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 16:52:01 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 28E9716A4CE; Fri,  1 Oct 2004 16:52:01 +0000 (GMT)
Received: from green.homeunix.org (pcp04368961pcs.nrockv01.md.comcast.net
	[69.140.212.7])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 71E3243D39; Fri,  1 Oct 2004 16:52:00 +0000 (GMT)
	(envelope-from green@green.homeunix.org)
Received: from green.homeunix.org (green@localhost [127.0.0.1])
	by green.homeunix.org (8.13.1/8.13.1) with ESMTP id i91Gpqo3006657;
	Fri, 1 Oct 2004 12:51:52 -0400 (EDT)
	(envelope-from green@green.homeunix.org)
Received: (from green@localhost)
	by green.homeunix.org (8.13.1/8.13.1/Submit) id i91Gpp3O006656;
	Fri, 1 Oct 2004 12:51:51 -0400 (EDT)
	(envelope-from green)
Date: Fri, 1 Oct 2004 12:51:51 -0400
From: Brian Fundakowski Feldman <green@FreeBSD.org>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20041001165151.GJ997@green.homeunix.org>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1096647194.27811.12.camel@palm.tree.com>
User-Agent: Mutt/1.5.6i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: John Baldwin <jhb@FreeBSD.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 16:52:01 -0000

On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote:
> On Fri, 2004-10-01 at 10:10, Peter Holm wrote:
> > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote:
> > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:
> > > 
> > > > I also had overlooked 
> > > > 	 	http://www.holm.cc/stress/log/cons80.html
> > > > Showing that my patch for kern_switch.c (switch_patch) has a bug.
> > > > I will send an updated patch later today.
> > > 
> > > OK - here is the promised patch.
> > > 
> > 
> > For once I'm the bearer of good news. The switch_patch_v2 + the
> > sched_4bsd patch ran the tests for more than one hour without
> > any freeze. The sched_4bsd alone did not stop the freezes. I'm
> > now testing the switch_patch_v2 alone and it's looking good for
> > 55+ minutes of testing.
> 
> Great !
> I guess I should roll a cleaned up cumulative patch soon.

I suppose it might be a bit too hopeful, but is there any chance you're
taking a look at SCHED_ULE problems, too?

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 17:56:46 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id C9D9316A4CE; Fri,  1 Oct 2004 17:56:46 +0000 (GMT)
Received: from ylpvm43.prodigy.net (ylpvm43-ext.prodigy.net [207.115.57.74])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 8561143D45; Fri,  1 Oct 2004 17:56:46 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net
	[67.124.49.205])i91HurCE021701;	Fri, 1 Oct 2004 13:56:54 -0400
Message-ID: <415D9A5B.10200@elischer.org>
Date: Fri, 01 Oct 2004 10:56:43 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Stephan Uphoff <ups@tree.com>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
In-Reply-To: <1096647194.27811.12.camel@palm.tree.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: Peter Holm <peter@holm.cc>
cc: John Baldwin <jhb@FreeBSD.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 17:56:46 -0000

Stephan Uphoff wrote:
> On Fri, 2004-10-01 at 10:10, Peter Holm wrote:
> 
>>On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote:
>>
>>>On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:
>>>
>>>
>>>>I also had overlooked 
>>>>	 	http://www.holm.cc/stress/log/cons80.html
>>>>Showing that my patch for kern_switch.c (switch_patch) has a bug.
>>>>I will send an updated patch later today.
>>>
>>>OK - here is the promised patch.
>>>
>>
>>For once I'm the bearer of good news. The switch_patch_v2 + the
>>sched_4bsd patch ran the tests for more than one hour without
>>any freeze. The sched_4bsd alone did not stop the freezes. I'm
>>now testing the switch_patch_v2 alone and it's looking good for
>>55+ minutes of testing.
> 
> 
> Great !
> I guess I should roll a cleaned up cumulative patch soon.
> 
> 	Stephan

I'm on the sidelines cheering..
I'm just coming off a hmmm.. 28 hour day from work.. ** need sleep **


From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 18:15:36 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 06F6D16A4F3
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 18:15:35 +0000 (GMT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DB4C843D45
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 18:15:23 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	i91IFJvA019973;	Fri, 1 Oct 2004 11:15:19 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i91IFJQq019971;
	Fri, 1 Oct 2004 11:15:19 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 1 Oct 2004 11:15:19 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200410011815.i91IFJQq019971@apollo.backplane.com>
To: Stephan Uphoff <ups@tree.com>
References: <1096610130.21577.219.camel@palm.tree.com>
	<1096644427.25800.26.camel@palm.tree.com>
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: sched_switch (sched_4bsd) may be preempted
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 18:15:36 -0000


:The core is wrapped in the sched_lock. And since it is a spin lock it is
:running in a critical section with interrupts disabled.
:
:The additional (recursive) critical_enter is just an abusive way to tell
:maybe_preempt* that it should not immediately switch.
:( Yes - eventually there should be a better way to do this)

    Umm.  So you are saying that the code is intentionally breaking the
    API that should otherwise be protecting it from a preemptive thread
    switch by making the assumption that the only critical section count
    will come from an intentional scheduler mutex obtained ONLY for the
    purpose of calling sched_add(), and thus only two or more critical section
    counts means that the originally interrupted code desires no preemption?
    No wonder the scheduler is broken!  That sounds like a recipe for disaster!

    But the solution is simple enough... move the maybe_preempt() call out
    of sched_add().  That is, remove the flawed assumption instead of adding
    further hacks to work around the flawed assumption.  Instead, 
    just conditionally set TDP_OWEPREEMPT there, don't actually try to switch.
    Then simply check for TDP_OWEPREEMPT either just after the scheduler 
    mutex is released, or just before it would otherwise be released.  The
    recursion is happening because the original code was badly designed,
    not because it is an inevitable consequence of implementing preemption.
    But this problem looks *REALLY* easy to fix... NOT by adding more hacks,
    but by fixing the originally flawed code.

:>     I would not reset TDP_OWEPREEMPT there.  If I understand its function
:>     correctly you need to leave it intact in order to detect preemption
:>     request races against the scheduler.  Since at that point newtd may
:>     be non-NULL and thus not cause another scheduling queue check to be
:>     made before the next switch, you cannot safely clear the flag where you
:>     are clearing it.
:
:This is all running in critical section and we just decided to switch
:and either have or will pick the best thread. Interrupts are locked. The
:additional critical section just prevents recursion problems by delaying
:unwanted switches in maybe_preempt* . Resetting TDP_OWEPREEMPT is
:perfectly save since we switch to the thread chosen while everything has
:been locked.
:
:	Stephan

    I sorta see that, but then again newtd is already set so you are
    assuming that no side effects have occured (from calling other scheduler
    related routines) since newtd was last chosen.  But it is clear that
    there are a ton of opporunities for side effects either to occur or
    to occur in the future as the code continues to be modified, which makes
    this sort of assumption very dangerous and makes the resulting code very
    fragile.  For example, if someone ever wanted to avoid physically 
    disabling interrupts with a 'cli' in critical_enter() (and this is 
    something that could very well happen since neither the original 4.x code
    or the DragonFly code disables interrupts in this case, as an 
    optimization), that breaks all of your assumptions.   In fact, the
    interrupt disablement is being done in the machine-dependant (MD) code,
    and you are assuming it in machine-independant (MI) code.  This makes
    your assumption even MORE unsafe.

    Your goal, with all the problems that the scheduler is having now, should
    be to make the code more robust, NOT make it more fragile.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 18:31:56 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 16D6E16A4CE
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 18:31:56 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id A178343D46
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 18:31:55 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 23978 invoked by uid 89); 1 Oct 2004 18:31:54 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 18:31:54 -0000
Received: (qmail 23958 invoked by uid 89); 1 Oct 2004 18:31:54 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 1 Oct 2004 18:31:54 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i91IVrmt028646;
	Fri, 1 Oct 2004 14:31:53 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Brian Fundakowski Feldman <green@FreeBSD.org>
In-Reply-To: <20041001165151.GJ997@green.homeunix.org>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
	 <1096489576.3733.1868.camel@palm.tree.com>
	 <200409291652.29990.jhb@FreeBSD.org>
	 <1096496057.3733.2163.camel@palm.tree.com>
	 <1096603981.21577.195.camel@palm.tree.com>
	 <1096608201.21577.203.camel@palm.tree.com>
	 <20041001141040.GA1556@peter.osted.lan>
	 <1096647194.27811.12.camel@palm.tree.com>
	 <20041001165151.GJ997@green.homeunix.org>
Content-Type: text/plain
Message-Id: <1096655513.27811.66.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Fri, 01 Oct 2004 14:31:53 -0400
Content-Transfer-Encoding: 7bit
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: John Baldwin <jhb@FreeBSD.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 18:31:56 -0000

On Fri, 2004-10-01 at 12:51, Brian Fundakowski Feldman wrote:
> On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote:
> > On Fri, 2004-10-01 at 10:10, Peter Holm wrote:
> > > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote:
> > > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:
> > > > 
> > > > > I also had overlooked 
> > > > > 	 	http://www.holm.cc/stress/log/cons80.html
> > > > > Showing that my patch for kern_switch.c (switch_patch) has a bug.
> > > > > I will send an updated patch later today.
> > > > 
> > > > OK - here is the promised patch.
> > > > 
> > > 
> > > For once I'm the bearer of good news. The switch_patch_v2 + the
> > > sched_4bsd patch ran the tests for more than one hour without
> > > any freeze. The sched_4bsd alone did not stop the freezes. I'm
> > > now testing the switch_patch_v2 alone and it's looking good for
> > > 55+ minutes of testing.
> > 
> > Great !
> > I guess I should roll a cleaned up cumulative patch soon.
> 
> I suppose it might be a bit too hopeful, but is there any chance you're
> taking a look at SCHED_ULE problems, too?

I have to get some work done on my own project before I run out of
funding :-(
This means I need to avoid looking at SCHED_ULE for at least the next
week.

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 18:33:10 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B1F8016A4D2
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 18:33:10 +0000 (GMT)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3131443D31
	for <freebsd-arch@FreeBSD.org>; Fri,  1 Oct 2004 18:33:10 +0000 (GMT)
	(envelope-from scottl@FreeBSD.org)
Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.12.11/8.12.10) with ESMTP id i91IXsve078970;
	Fri, 1 Oct 2004 12:33:54 -0600 (MDT)
	(envelope-from scottl@FreeBSD.org)
Message-ID: <415DA2A2.5010309@FreeBSD.org>
Date: Fri, 01 Oct 2004 12:32:02 -0600
From: Scott Long <scottl@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040831
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Matthew Dillon <dillon@apollo.backplane.com>
References: <1096610130.21577.219.camel@palm.tree.com>
	<1096644427.25800.26.camel@palm.tree.com>
	<200410011815.i91IFJQq019971@apollo.backplane.com>
In-Reply-To: <200410011815.i91IFJQq019971@apollo.backplane.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, hits=0.0 required=3.8 tests=none autolearn=no version=2.63
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on pooker.samsco.org
cc: Peter Holm <peter@holm.cc>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@FreeBSD.org>
cc: Julian Elischer <julian@elischer.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: sched_switch (sched_4bsd) may be preempted
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 18:33:10 -0000

Matthew Dillon wrote:
> :The core is wrapped in the sched_lock. And since it is a spin lock it is
> :running in a critical section with interrupts disabled.
> :
> :The additional (recursive) critical_enter is just an abusive way to tell
> :maybe_preempt* that it should not immediately switch.
> :( Yes - eventually there should be a better way to do this)
> 
>     Umm.  So you are saying that the code is intentionally breaking the
>     API that should otherwise be protecting it from a preemptive thread
>     switch by making the assumption that the only critical section count
>     will come from an intentional scheduler mutex obtained ONLY for the
>     purpose of calling sched_add(), and thus only two or more critical section
>     counts means that the originally interrupted code desires no preemption?
>     No wonder the scheduler is broken!  That sounds like a recipe for disaster!
> 
>     But the solution is simple enough... move the maybe_preempt() call out
>     of sched_add().  That is, remove the flawed assumption instead of adding
>     further hacks to work around the flawed assumption.  Instead, 
>     just conditionally set TDP_OWEPREEMPT there, don't actually try to switch.
>     Then simply check for TDP_OWEPREEMPT either just after the scheduler 
>     mutex is released, or just before it would otherwise be released.  The
>     recursion is happening because the original code was badly designed,
>     not because it is an inevitable consequence of implementing preemption.

This is a pretty bold assumption to make.  I agree that when I first 
looked at this code a while back I was quite confused by the semantics, 
but after discussing it with John it makes a whole lot more sense.  The
whole design is based on being allowed to be switched away from if
curthread->td_critnest is less than one.  Simply holding a single 
spinlock or critical section will not prevent this, but this is only a
problem from within the scheduler.  If a thread enters the scheduler
with a spinlock  or critical section held, the act of the scheduler
picking up sched_lock will bump up td_critnest and prevent preemption.
This of course leaves a hole where the scheduler is entered without
a spinlock held, and Stephen looks like he's cleaning up this hole,
and doing a pretty reasonable job at it.  The easy hack was to just
wrap setrunqueue() in a critical section, but there were still a few
problems with that.

I agree that there are better ways to deal with this in the long run,
but please don't distract us from making it work correctly right now.

>     But this problem looks *REALLY* easy to fix... NOT by adding more hacks,
>     but by fixing the originally flawed code.
> 
> :>     I would not reset TDP_OWEPREEMPT there.  If I understand its function
> :>     correctly you need to leave it intact in order to detect preemption
> :>     request races against the scheduler.  Since at that point newtd may
> :>     be non-NULL and thus not cause another scheduling queue check to be
> :>     made before the next switch, you cannot safely clear the flag where you
> :>     are clearing it.
> :
> :This is all running in critical section and we just decided to switch
> :and either have or will pick the best thread. Interrupts are locked. The
> :additional critical section just prevents recursion problems by delaying
> :unwanted switches in maybe_preempt* . Resetting TDP_OWEPREEMPT is
> :perfectly save since we switch to the thread chosen while everything has
> :been locked.
> :
> :	Stephan
> 
>     I sorta see that, but then again newtd is already set so you are
>     assuming that no side effects have occured (from calling other scheduler
>     related routines) since newtd was last chosen.  But it is clear that
>     there are a ton of opporunities for side effects either to occur or
>     to occur in the future as the code continues to be modified, which makes
>     this sort of assumption very dangerous and makes the resulting code very
>     fragile.  For example, if someone ever wanted to avoid physically 
>     disabling interrupts with a 'cli' in critical_enter() (and this is 
>     something that could very well happen since neither the original 4.x code
>     or the DragonFly code disables interrupts in this case, as an 
>     optimization), that breaks all of your assumptions.   In fact, the
>     interrupt disablement is being done in the machine-dependant (MD) code,
>     and you are assuming it in machine-independant (MI) code.  This makes
>     your assumption even MORE unsafe.
> 
>     Your goal, with all the problems that the scheduler is having now, should
>     be to make the code more robust, NOT make it more fragile.

Testing is showing that it is becoming more robust and following the
original design goals.  The real problem here is that the design wasn't
well documented, and what was documented wasn't being read by most
people.

Scott

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 19:25:55 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id ACAA516A4CF
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 19:25:55 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 0200C43D48
	for <freebsd-arch@freebsd.org>; Fri,  1 Oct 2004 19:25:55 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 57936 invoked from network); 1 Oct 2004 19:25:53 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 1 Oct 2004 19:25:53 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i91JPqV7003465;
	Fri, 1 Oct 2004 21:25:52 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i91JPpuh003464;
	Fri, 1 Oct 2004 21:25:51 +0200 (CEST)
	(envelope-from pho)
Date: Fri, 1 Oct 2004 21:25:51 +0200
From: Peter Holm <peter@holm.cc>
To: Stephan Uphoff <ups@tree.com>
Message-ID: <20041001192551.GA3381@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="IS0zKkzwUGydFO0o"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1096647194.27811.12.camel@palm.tree.com>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 19:25:55 -0000


--IS0zKkzwUGydFO0o
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On Fri, Oct 01, 2004 at 12:13:14PM -0400, Stephan Uphoff wrote:
> On Fri, 2004-10-01 at 10:10, Peter Holm wrote:
> > On Fri, Oct 01, 2004 at 01:23:21AM -0400, Stephan Uphoff wrote:
> > > On Fri, 2004-10-01 at 00:13, Stephan Uphoff wrote:
> > > 
> > > > I also had overlooked 
> > > > 	 	http://www.holm.cc/stress/log/cons80.html
> > > > Showing that my patch for kern_switch.c (switch_patch) has a bug.
> > > > I will send an updated patch later today.
> > > 
> > > OK - here is the promised patch.
> > > 
> > 
> > For once I'm the bearer of good news. The switch_patch_v2 + the
> > sched_4bsd patch ran the tests for more than one hour without
> > any freeze. The sched_4bsd alone did not stop the freezes. I'm
> > now testing the switch_patch_v2 alone and it's looking good for
> > 55+ minutes of testing.
> 
> Great !
> I guess I should roll a cleaned up cumulative patch soon.
> 
> 	Stephan

I have now been running the stress test for more than 3� hours, without
any freezes. I have included the two of your changes I have been using.

- Peter

--IS0zKkzwUGydFO0o
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="stephan.combined.diff"

Index: sys/kern/kern_switch.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_switch.c,v
retrieving revision 1.95
diff -u -r1.95 kern_switch.c
--- sys/kern/kern_switch.c	19 Sep 2004 18:34:17 -0000	1.95
+++ sys/kern/kern_switch.c	1 Oct 2004 19:06:03 -0000
@@ -315,6 +315,106 @@
 	td->td_priority = newpri;
 	setrunqueue(td, SRQ_BORING);
 }
+
+
+/*
+ * This function is called when a thread is about to be put on a
+ * ksegrp run queue because it has been made runnable or its 
+ * priority has been adjusted and the ksegrp does not have a 
+ * free kse slot.  It determines if a thread from the same ksegrp
+ * should be preempted.  If so, it tries to switch threads
+ * if the thread is on the same cpu or notifies another cpu that
+ * it should switch threads. 
+ */
+
+static void
+maybe_preempt_in_ksegrp(struct thread *td)
+{
+#if  defined(SMP)
+	int highest_pri;
+	struct ksegrp *kg;
+	cpumask_t cpumask,dontuse;
+	struct pcpu *pc;
+	struct pcpu *highest_pcpu;
+	struct thread *running_thread;
+
+#ifndef FULL_PREEMPTION
+	int pri;
+
+	pri = td->td_priority;
+
+	if (!(pri >= PRI_MIN_ITHD && pri <= PRI_MAX_ITHD))
+	  return;
+#endif
+
+	mtx_assert(&sched_lock, MA_OWNED);
+
+	running_thread = curthread;
+
+#if !defined(KSEG_PEEMPT_BEST_CPU)
+	if(running_thread->td_ksegrp != td->td_ksegrp)
+#endif
+		{
+			kg = td->td_ksegrp;
+
+			/* Anyone waiting in front ? */
+			if(td != TAILQ_FIRST(&kg->kg_runq))  {
+				return; /* Yes - wait your turn*/
+			}
+			highest_pri  = td->td_priority;
+			highest_pcpu = NULL;
+			dontuse      = stopped_cpus | idle_cpus_mask;
+
+			/* Find a cpu with the worst priority that runs at thread from the
+			 * same  ksegrp - if multiple exist give first the last run cpu and then
+			 * the current cpu priority 
+			 */
+
+			SLIST_FOREACH(pc, &cpuhead, pc_allcpu) {
+				cpumask = pc->pc_cpumask;
+				if ( (cpumask & dontuse) == 0 && 
+				     pc->pc_curthread->td_ksegrp == kg) {
+					if (pc->pc_curthread->td_priority > highest_pri) {
+						highest_pri  = pc->pc_curthread->td_priority;
+						highest_pcpu = pc;
+					} else if (pc->pc_curthread->td_priority == highest_pri &&
+						   highest_pcpu != NULL) {
+						if (td->td_lastcpu == pc->pc_cpuid ||
+						    (PCPU_GET(cpumask) == cpumask &&
+						     td->td_lastcpu != highest_pcpu->pc_cpuid)) {
+							highest_pcpu = pc;
+						}
+					}
+				}
+			}
+			
+			/* Check if we need to preempt someone */
+			if (highest_pcpu == NULL) return;
+
+			if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) {
+				highest_pcpu->pc_curthread->td_flags |= TDF_NEEDRESCHED;
+				ipi_selected(highest_pcpu->pc_cpumask, IPI_AST);
+				return;
+			}
+		}
+#else
+	KASSERT(running_thread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread"));
+#endif
+
+	if  (td->td_priority > running_thread->td_priority)
+		return;
+#ifdef PREEMPTION
+	if (running_thread->td_critnest > 1) {
+		running_thread->td_pflags |= TDP_OWEPREEMPT;
+	} else {
+		mi_switch(SW_INVOL, NULL);
+	}
+#else
+	running_thread->td_flags |= TDF_NEEDRESCHED;
+#endif
+	return;
+}
+
 int limitcount;
 void
 setrunqueue(struct thread *td, int flags)
@@ -422,6 +522,7 @@
 	} else {
 		CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d",
 			td, td->td_ksegrp, td->td_proc->p_pid);
+		maybe_preempt_in_ksegrp(td);
 	}
 }
 
Index: sys/kern/sched_4bsd.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/sched_4bsd.c,v
retrieving revision 1.65
diff -u -r1.65 sched_4bsd.c
--- sys/kern/sched_4bsd.c	16 Sep 2004 07:12:59 -0000	1.65
+++ sys/kern/sched_4bsd.c	1 Oct 2004 19:06:03 -0000
@@ -823,6 +823,7 @@
 		TD_SET_CAN_RUN(td);
 	else {
 		td->td_ksegrp->kg_avail_opennings++;
+		critical_enter();
 		if (TD_IS_RUNNING(td)) {
 			/* Put us back on the run queue (kse and all). */
 			setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING);
@@ -834,6 +835,8 @@
 			 */
 			slot_fill(td->td_ksegrp);
 		}
+		critical_exit();
+		td->td_pflags &= ~TDP_OWEPREEMPT;
 	}
 	if (newtd == NULL)
 		newtd = choosethread();

--IS0zKkzwUGydFO0o--

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 21:22:41 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4780616A4CE; Fri,  1 Oct 2004 21:22:41 +0000 (GMT)
Received: from harmony.village.org (rover.village.org [168.103.84.182])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 0678C43D1D; Fri,  1 Oct 2004 21:22:37 +0000 (GMT)
	(envelope-from imp@bsdimp.com)
Received: from localhost (harmony.village.org [10.0.0.6])
	by harmony.village.org (8.13.1/8.13.1) with ESMTP id i91LJMVr053371;
	Fri, 1 Oct 2004 15:19:22 -0600 (MDT)
	(envelope-from imp@bsdimp.com)
Date: Fri, 01 Oct 2004 15:20:46 -0600 (MDT)
Message-Id: <20041001.152046.32721253.imp@bsdimp.com>
To: sah@softcardsystems.com
From: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <Pine.LNX.4.60.0410010851190.14170@athena>
References: <Pine.LNX.4.60.0409300903070.6230@athena>
	<20041001.000452.99281901.imp@bsdimp.com>
	<Pine.LNX.4.60.0410010851190.14170@athena>
X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: freebsd-arch@freebsd.org
cc: re@freebsd.org
cc: julian@elischer.org
cc: ups@tree.com
Subject: Re: AoE for 4.x
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 21:22:41 -0000

In message: <Pine.LNX.4.60.0410010851190.14170@athena>
            Sam <sah@softcardsystems.com> writes:
: Is that block & char, or do i just need to specify the block?

There's just one major number type in 4.x.  I believe it is the
character device, but every time I say it is foo, someone else proves
me wrong.

Warner

From owner-freebsd-arch@FreeBSD.ORG  Fri Oct  1 21:29:58 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D136E16A4CE; Fri,  1 Oct 2004 21:29:58 +0000 (GMT)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7C00F43D1F; Fri,  1 Oct 2004 21:29:58 +0000 (GMT)
	(envelope-from scottl@FreeBSD.org)
Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.12.11/8.12.10) with ESMTP id i91LUqvn079587;
	Fri, 1 Oct 2004 15:30:52 -0600 (MDT)
	(envelope-from scottl@FreeBSD.org)
Message-ID: <415DCC1A.1030305@FreeBSD.org>
Date: Fri, 01 Oct 2004 15:28:58 -0600
From: Scott Long <scottl@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040831
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: "M. Warner Losh" <imp@bsdimp.com>
References: <Pine.LNX.4.60.0409300903070.6230@athena>
	<20041001.000452.99281901.imp@bsdimp.com>
	<Pine.LNX.4.60.0410010851190.14170@athena>
	<20041001.152046.32721253.imp@bsdimp.com>
In-Reply-To: <20041001.152046.32721253.imp@bsdimp.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, hits=0.0 required=3.8 tests=none autolearn=no version=2.63
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on pooker.samsco.org
cc: sah@softcardsystems.com
cc: ups@tree.com
cc: re@FreeBSD.org
cc: julian@elischer.org
cc: freebsd-arch@FreeBSD.org
Subject: Re: AoE for 4.x
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2004 21:29:59 -0000

M. Warner Losh wrote:
> In message: <Pine.LNX.4.60.0410010851190.14170@athena>
>             Sam <sah@softcardsystems.com> writes:
> : Is that block & char, or do i just need to specify the block?
> 
> There's just one major number type in 4.x.  I believe it is the
> character device, but every time I say it is foo, someone else proves
> me wrong.
> 
> Warner
> 

Correct, only char majors for 4.x.  The bdevmaj field is usually given
'-1'.

Scott

From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  2 05:33:55 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4E72616A4CE
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 05:33:55 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id D1AF143D1F
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 05:33:54 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 93856 invoked from network); 2 Oct 2004 05:33:53 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 2 Oct 2004 05:33:53 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i925XqV7006388;
	Sat, 2 Oct 2004 07:33:52 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i925XpqN006387;
	Sat, 2 Oct 2004 07:33:51 +0200 (CEST)
	(envelope-from pho)
Date: Sat, 2 Oct 2004 07:33:51 +0200
From: Peter Holm <peter@holm.cc>
To: Peter Holm <peter@holm.cc>
Message-ID: <20041002053351.GA6259@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
	<20041001192551.GA3381@peter.osted.lan>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20041001192551.GA3381@peter.osted.lan>
User-Agent: Mutt/1.4.1i
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
cc: Julian Elischer <julian@elischer.org>
cc: Stephan Uphoff <ups@tree.com>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 05:33:55 -0000

On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote:
> > > 
> > > For once I'm the bearer of good news. The switch_patch_v2 + the
> > > sched_4bsd patch ran the tests for more than one hour without
> > > any freeze. The sched_4bsd alone did not stop the freezes. I'm
> > > now testing the switch_patch_v2 alone and it's looking good for
> > > 55+ minutes of testing.
> > 
> > Great !
> > I guess I should roll a cleaned up cumulative patch soon.
> > 
> > 	Stephan
> 
> I have now been running the stress test for more than 3� hours, without
> any freezes. I have included the two of your changes I have been using.
> 
> - Peter

After more testing, I'm sad to report that the freeze is still there.
The patch has however decreased the number of freezes dramatically:

During 14 hours of testing 3 separate freezes has been seen:

24 Giant held for more than 60 sec by td 0xc244e900, pid 27683
31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098
79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531

- Peter

From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  2 07:11:25 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1FB8F16A4CE; Sat,  2 Oct 2004 07:11:25 +0000 (GMT)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 61D4F43D2F; Sat,  2 Oct 2004 07:11:24 +0000 (GMT)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id i927B8ab082444;
	Sat, 2 Oct 2004 09:11:09 +0200 (CEST)
	(envelope-from phk@critter.freebsd.dk)
To: "M. Warner Losh" <imp@bsdimp.com>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Fri, 01 Oct 2004 15:20:46 MDT."
             <20041001.152046.32721253.imp@bsdimp.com> 
Date: Sat, 02 Oct 2004 09:11:08 +0200
Message-ID: <82443.1096701068@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
cc: sah@softcardsystems.com
cc: ups@tree.com
cc: re@freebsd.org
cc: julian@elischer.org
cc: freebsd-arch@freebsd.org
Subject: Re: AoE for 4.x 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 07:11:25 -0000

In message <20041001.152046.32721253.imp@bsdimp.com>, "M. Warner Losh" writes:
>In message: <Pine.LNX.4.60.0410010851190.14170@athena>
>            Sam <sah@softcardsystems.com> writes:
>: Is that block & char, or do i just need to specify the block?
>
>There's just one major number type in 4.x.  I believe it is the
>character device, but every time I say it is foo, someone else proves
>me wrong.

It is char.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  2 13:55:58 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 472E016A4CE
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 13:55:58 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id C66F343D3F
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 13:55:57 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 16261 invoked by uid 89); 2 Oct 2004 13:55:56 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 2 Oct 2004 13:55:56 -0000
Received: (qmail 16251 invoked by uid 89); 2 Oct 2004 13:55:56 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 2 Oct 2004 13:55:56 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i92Dtrmt032748;
	Sat, 2 Oct 2004 09:55:54 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20041002053351.GA6259@peter.osted.lan>
References: <1095468747.31297.241.camel@palm.tree.com>
	 <1096477932.3733.1471.camel@palm.tree.com>
	 <1096489576.3733.1868.camel@palm.tree.com>
	 <200409291652.29990.jhb@FreeBSD.org>
	 <1096496057.3733.2163.camel@palm.tree.com>
	 <1096603981.21577.195.camel@palm.tree.com>
	 <1096608201.21577.203.camel@palm.tree.com>
	 <20041001141040.GA1556@peter.osted.lan>
	 <1096647194.27811.12.camel@palm.tree.com>
	 <20041001192551.GA3381@peter.osted.lan>
	 <20041002053351.GA6259@peter.osted.lan>
Content-Type: text/plain; charset=ISO-8859-1
Message-Id: <1096725353.27811.836.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Sat, 02 Oct 2004 09:55:53 -0400
Content-Transfer-Encoding: 8bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 13:55:58 -0000

On Sat, 2004-10-02 at 01:33, Peter Holm wrote:
> On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote:
> > > > 
> > > > For once I'm the bearer of good news. The switch_patch_v2 + the
> > > > sched_4bsd patch ran the tests for more than one hour without
> > > > any freeze. The sched_4bsd alone did not stop the freezes. I'm
> > > > now testing the switch_patch_v2 alone and it's looking good for
> > > > 55+ minutes of testing.
> > > 
> > > Great !
> > > I guess I should roll a cleaned up cumulative patch soon.
> > > 
> > > 	Stephan
> > 
> > I have now been running the stress test for more than 3� hours, without
> > any freezes. I have included the two of your changes I have been using.
> > 
> > - Peter
> 
> After more testing, I'm sad to report that the freeze is still there.
> The patch has however decreased the number of freezes dramatically:
> 
> During 14 hours of testing 3 separate freezes has been seen:
> 
> 24 Giant held for more than 60 sec by td 0xc244e900, pid 27683
> 31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098
> 79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531

You should also run with MUTEX_WAKE_ALL in your config file
AND the mutex patch. I think this is it but will verify later today.
Sorry -have to run - will roll the cumulative patch tonight (EST).

	Stephan

From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  2 18:14:25 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 34A0D16A4CF
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 18:14:25 +0000 (GMT)
Received: from ylpvm29.prodigy.net (ylpvm29-ext.prodigy.net [207.115.57.60])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 92A6343D3F
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 18:14:24 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net
	[67.124.49.205])i92IE5IV016379;	Sat, 2 Oct 2004 14:14:06 -0400
Message-ID: <415EEFFE.5080309@elischer.org>
Date: Sat, 02 Oct 2004 11:14:22 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Peter Holm <peter@holm.cc>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
	<20041001192551.GA3381@peter.osted.lan>
	<20041002053351.GA6259@peter.osted.lan>
In-Reply-To: <20041002053351.GA6259@peter.osted.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
cc: Stephan Uphoff <ups@tree.com>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 18:14:25 -0000

Peter Holm wrote:
> On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote:
> 
>>>>For once I'm the bearer of good news. The switch_patch_v2 + the
>>>>sched_4bsd patch ran the tests for more than one hour without
>>>>any freeze. The sched_4bsd alone did not stop the freezes. I'm
>>>>now testing the switch_patch_v2 alone and it's looking good for
>>>>55+ minutes of testing.
>>>
>>>Great !
>>>I guess I should roll a cleaned up cumulative patch soon.
>>>
>>>	Stephan
>>
>>I have now been running the stress test for more than 3� hours, without
>>any freezes. I have included the two of your changes I have been using.
>>
>>- Peter
> 
> 
> After more testing, I'm sad to report that the freeze is still there.
> The patch has however decreased the number of freezes dramatically:
> 
> During 14 hours of testing 3 separate freezes has been seen:
> 
> 24 Giant held for more than 60 sec by td 0xc244e900, pid 27683
> 31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098
> 79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531
> 
> - Peter
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
When this happes, drop to debugger..

using:

   kdb_enter("Giant too long");

and dump out teh thread backtrace, and the output of show ktr
iff you have ktr enabled.. (as we discussed before)

From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  2 18:14:53 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 640FC16A4CE
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 18:14:53 +0000 (GMT)
Received: from ylpvm29.prodigy.net (ylpvm29-ext.prodigy.net [207.115.57.60])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 13A3243D41
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 18:14:53 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (adsl-67-124-49-205.dsl.snfc21.pacbell.net
	[67.124.49.205])i92IEYIV016914;	Sat, 2 Oct 2004 14:14:34 -0400
Message-ID: <415EF01B.7000800@elischer.org>
Date: Sat, 02 Oct 2004 11:14:51 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Peter Holm <peter@holm.cc>
References: <1095468747.31297.241.camel@palm.tree.com>
	<1096477932.3733.1471.camel@palm.tree.com>
	<1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
	<20041001192551.GA3381@peter.osted.lan>
	<20041002053351.GA6259@peter.osted.lan>
In-Reply-To: <20041002053351.GA6259@peter.osted.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
cc: Stephan Uphoff <ups@tree.com>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 18:14:53 -0000

Peter Holm wrote:
> On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote:
> 
>>>>For once I'm the bearer of good news. The switch_patch_v2 + the
>>>>sched_4bsd patch ran the tests for more than one hour without
>>>>any freeze. The sched_4bsd alone did not stop the freezes. I'm
>>>>now testing the switch_patch_v2 alone and it's looking good for
>>>>55+ minutes of testing.
>>>
>>>Great !
>>>I guess I should roll a cleaned up cumulative patch soon.
>>>
>>>	Stephan
>>
>>I have now been running the stress test for more than 3� hours, without
>>any freezes. I have included the two of your changes I have been using.
>>
>>- Peter
> 
> 
> After more testing, I'm sad to report that the freeze is still there.
> The patch has however decreased the number of freezes dramatically:
> 
> During 14 hours of testing 3 separate freezes has been seen:
> 
> 24 Giant held for more than 60 sec by td 0xc244e900, pid 27683
> 31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098
> 79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531

oh yeah.... output of show locks too.

> 
> - Peter
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  2 18:31:24 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4A9A816A4CE
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 18:31:24 +0000 (GMT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id C696E43D39
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 18:31:23 +0000 (GMT)
	(envelope-from pho@holm.cc)
Received: (qmail 12625 invoked from network); 2 Oct 2004 18:31:21 -0000
Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan)
	(80.164.63.199)
	by relay.pair.com with SMTP; 2 Oct 2004 18:31:21 -0000
X-pair-Authenticated: 80.164.63.199
Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1])
	by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i92IVKXh001253;
	Sat, 2 Oct 2004 20:31:20 +0200 (CEST)
	(envelope-from pho@peter.osted.lan)
Received: (from pho@localhost)
	by peter.osted.lan (8.12.10/8.12.10/Submit) id i92IVKVC001252;
	Sat, 2 Oct 2004 20:31:20 +0200 (CEST)
	(envelope-from pho)
Date: Sat, 2 Oct 2004 20:31:20 +0200
From: Peter Holm <peter@holm.cc>
To: Julian Elischer <julian@elischer.org>
Message-ID: <20041002183120.GA1202@peter.osted.lan>
References: <1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
	<20041001192551.GA3381@peter.osted.lan>
	<20041002053351.GA6259@peter.osted.lan> <415EEFFE.5080309@elischer.org>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="T4sUOijqQbZv57TR"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <415EEFFE.5080309@elischer.org>
User-Agent: Mutt/1.4.1i
cc: Peter Holm <peter@holm.cc>
cc: Stephan Uphoff <ups@tree.com>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 18:31:24 -0000


--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On Sat, Oct 02, 2004 at 11:14:22AM -0700, Julian Elischer wrote:
> Peter Holm wrote:
> >On Fri, Oct 01, 2004 at 09:25:51PM +0200, Peter Holm wrote:
> >
> >>>>For once I'm the bearer of good news. The switch_patch_v2 + the
> >>>>sched_4bsd patch ran the tests for more than one hour without
> >>>>any freeze. The sched_4bsd alone did not stop the freezes. I'm
> >>>>now testing the switch_patch_v2 alone and it's looking good for
> >>>>55+ minutes of testing.
> >>>
> >>>Great !
> >>>I guess I should roll a cleaned up cumulative patch soon.
> >>>
> >>>	Stephan
> >>
> >>I have now been running the stress test for more than 3� hours, without
> >>any freezes. I have included the two of your changes I have been using.
> >>
> >>- Peter
> >
> >
> >After more testing, I'm sad to report that the freeze is still there.
> >The patch has however decreased the number of freezes dramatically:
> >
> >During 14 hours of testing 3 separate freezes has been seen:
> >
> >24 Giant held for more than 60 sec by td 0xc244e900, pid 27683
> >31 Giant held for more than 60 sec by td 0xc1b7b600, pid 12098
> >79 Giant held for more than 60 sec by td 0xc25f3180, pid 75531
> >
> >- Peter
> >_______________________________________________
> >freebsd-arch@freebsd.org mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> >To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
> When this happes, drop to debugger..
> 
> using:
> 
>   kdb_enter("Giant too long");
> 
> and dump out teh thread backtrace, and the output of show ktr
> iff you have ktr enabled.. (as we discussed before)

OK, right now I'm testing with all of Stephan's patches + the 
MUTEX_WAKE_ALL flag. Uptime is 3 3/4 hour and looking good.

-- 
Peter Holm

--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="changes.diff"

Index: sys/kern/kern_mutex.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_mutex.c,v
retrieving revision 1.149
diff -u -r1.149 kern_mutex.c
--- sys/kern/kern_mutex.c	2 Sep 2004 18:59:15 -0000	1.149
+++ sys/kern/kern_mutex.c	2 Oct 2004 14:46:26 -0000
@@ -492,7 +492,9 @@
 		if (v == MTX_CONTESTED) {
 			MPASS(ts != NULL);
 			m->mtx_lock = (uintptr_t)td | MTX_CONTESTED;
+			critical_enter();
 			turnstile_claim(ts);
+			critical_exit();
 			break;
 		}
 #endif
@@ -651,6 +653,9 @@
 #else
 	MPASS(ts != NULL);
 #endif
+
+	critical_enter();
+
 #ifndef PREEMPTION
 	/* XXX */
 	td1 = turnstile_head(ts);
@@ -671,6 +676,7 @@
 	}
 #endif
 	turnstile_unpend(ts);
+	critical_exit();
 
 #ifndef PREEMPTION
 	/*
Index: sys/kern/kern_switch.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_switch.c,v
retrieving revision 1.95
diff -u -r1.95 kern_switch.c
--- sys/kern/kern_switch.c	19 Sep 2004 18:34:17 -0000	1.95
+++ sys/kern/kern_switch.c	2 Oct 2004 14:46:27 -0000
@@ -315,6 +315,106 @@
 	td->td_priority = newpri;
 	setrunqueue(td, SRQ_BORING);
 }
+
+
+/*
+ * This function is called when a thread is about to be put on a
+ * ksegrp run queue because it has been made runnable or its 
+ * priority has been adjusted and the ksegrp does not have a 
+ * free kse slot.  It determines if a thread from the same ksegrp
+ * should be preempted.  If so, it tries to switch threads
+ * if the thread is on the same cpu or notifies another cpu that
+ * it should switch threads. 
+ */
+
+static void
+maybe_preempt_in_ksegrp(struct thread *td)
+{
+#if  defined(SMP)
+	int highest_pri;
+	struct ksegrp *kg;
+	cpumask_t cpumask,dontuse;
+	struct pcpu *pc;
+	struct pcpu *highest_pcpu;
+	struct thread *running_thread;
+
+#ifndef FULL_PREEMPTION
+	int pri;
+
+	pri = td->td_priority;
+
+	if (!(pri >= PRI_MIN_ITHD && pri <= PRI_MAX_ITHD))
+	  return;
+#endif
+
+	mtx_assert(&sched_lock, MA_OWNED);
+
+	running_thread = curthread;
+
+#if !defined(KSEG_PEEMPT_BEST_CPU)
+	if(running_thread->td_ksegrp != td->td_ksegrp)
+#endif
+		{
+			kg = td->td_ksegrp;
+
+			/* Anyone waiting in front ? */
+			if(td != TAILQ_FIRST(&kg->kg_runq))  {
+				return; /* Yes - wait your turn*/
+			}
+			highest_pri  = td->td_priority;
+			highest_pcpu = NULL;
+			dontuse      = stopped_cpus | idle_cpus_mask;
+
+			/* Find a cpu with the worst priority that runs at thread from the
+			 * same  ksegrp - if multiple exist give first the last run cpu and then
+			 * the current cpu priority 
+			 */
+
+			SLIST_FOREACH(pc, &cpuhead, pc_allcpu) {
+				cpumask = pc->pc_cpumask;
+				if ( (cpumask & dontuse) == 0 && 
+				     pc->pc_curthread->td_ksegrp == kg) {
+					if (pc->pc_curthread->td_priority > highest_pri) {
+						highest_pri  = pc->pc_curthread->td_priority;
+						highest_pcpu = pc;
+					} else if (pc->pc_curthread->td_priority == highest_pri &&
+						   highest_pcpu != NULL) {
+						if (td->td_lastcpu == pc->pc_cpuid ||
+						    (PCPU_GET(cpumask) == cpumask &&
+						     td->td_lastcpu != highest_pcpu->pc_cpuid)) {
+							highest_pcpu = pc;
+						}
+					}
+				}
+			}
+			
+			/* Check if we need to preempt someone */
+			if (highest_pcpu == NULL) return;
+
+			if (PCPU_GET(cpuid) != highest_pcpu->pc_cpuid) {
+				highest_pcpu->pc_curthread->td_flags |= TDF_NEEDRESCHED;
+				ipi_selected(highest_pcpu->pc_cpumask, IPI_AST);
+				return;
+			}
+		}
+#else
+	KASSERT(running_thread->td_ksegrp == td->td_ksegrp,("maybe_preempt_in_ksegrp: No chance to run thread"));
+#endif
+
+	if  (td->td_priority > running_thread->td_priority)
+		return;
+#ifdef PREEMPTION
+	if (running_thread->td_critnest > 1) {
+		running_thread->td_pflags |= TDP_OWEPREEMPT;
+	} else {
+		mi_switch(SW_INVOL, NULL);
+	}
+#else
+	running_thread->td_flags |= TDF_NEEDRESCHED;
+#endif
+	return;
+}
+
 int limitcount;
 void
 setrunqueue(struct thread *td, int flags)
@@ -422,6 +522,7 @@
 	} else {
 		CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d",
 			td, td->td_ksegrp, td->td_proc->p_pid);
+		maybe_preempt_in_ksegrp(td);
 	}
 }
 
Index: sys/kern/sched_4bsd.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/sched_4bsd.c,v
retrieving revision 1.65
diff -u -r1.65 sched_4bsd.c
--- sys/kern/sched_4bsd.c	16 Sep 2004 07:12:59 -0000	1.65
+++ sys/kern/sched_4bsd.c	2 Oct 2004 14:46:29 -0000
@@ -823,6 +823,7 @@
 		TD_SET_CAN_RUN(td);
 	else {
 		td->td_ksegrp->kg_avail_opennings++;
+		critical_enter();
 		if (TD_IS_RUNNING(td)) {
 			/* Put us back on the run queue (kse and all). */
 			setrunqueue(td, SRQ_OURSELF|SRQ_YIELDING);
@@ -834,6 +835,8 @@
 			 */
 			slot_fill(td->td_ksegrp);
 		}
+		critical_exit();
+		td->td_pflags &= ~TDP_OWEPREEMPT;
 	}
 	if (newtd == NULL)
 		newtd = choosethread();
--- sys/i386/conf/GENERIC	Sun Sep 19 02:52:22 2004
+++ sys/i386/conf/PHO	Sat Oct  2 16:06:19 2004
@@ -66,6 +66,7 @@
 options 	KDB			# Enable kernel debugger support.
 options 	DDB			# Support DDB.
 options 	GDB			# Support remote GDB.
+options         BREAK_TO_DEBUGGER
 options 	INVARIANTS		# Enable calls of extra sanity checking
 options 	INVARIANT_SUPPORT	# Extra sanity checks of internal structures, required by INVARIANTS
 options 	WITNESS			# Enable checks to detect deadlocks and cycles
@@ -285,3 +286,4 @@
 device		firewire	# FireWire bus code
 device		sbp		# SCSI over FireWire (Requires scbus and da)
 device		fwe		# Ethernet over FireWire (non-standard!)
+options           MUTEX_WAKE_ALL          # Needed do not remove

--T4sUOijqQbZv57TR--

From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  2 23:37:41 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5812E16A4CE
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 23:37:41 +0000 (GMT)
Received: from duchess.speedfactory.net (duchess.speedfactory.net
	[66.23.201.84])	by mx1.FreeBSD.org (Postfix) with SMTP id D465C43D1F
	for <freebsd-arch@freebsd.org>; Sat,  2 Oct 2004 23:37:40 +0000 (GMT)
	(envelope-from ups@tree.com)
Received: (qmail 26509 invoked by uid 89); 2 Oct 2004 23:37:39 -0000
Received: from duchess.speedfactory.net (66.23.201.84)
  by duchess.speedfactory.net with SMTP; 2 Oct 2004 23:37:39 -0000
Received: (qmail 26487 invoked by uid 89); 2 Oct 2004 23:37:39 -0000
Received: from unknown (HELO palm.tree.com) (66.23.216.49)
  by duchess.speedfactory.net with SMTP; 2 Oct 2004 23:37:39 -0000
Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1])
	by palm.tree.com (8.12.10/8.12.10) with ESMTP id i92Nbcmt034963;
	Sat, 2 Oct 2004 19:37:38 -0400 (EDT)
	(envelope-from ups@tree.com)
From: Stephan Uphoff <ups@tree.com>
To: Peter Holm <peter@holm.cc>
In-Reply-To: <20041002183120.GA1202@peter.osted.lan>
References: <1096489576.3733.1868.camel@palm.tree.com>
	<200409291652.29990.jhb@FreeBSD.org>
	<1096496057.3733.2163.camel@palm.tree.com>
	<1096603981.21577.195.camel@palm.tree.com>
	<1096608201.21577.203.camel@palm.tree.com>
	<20041001141040.GA1556@peter.osted.lan>
	<1096647194.27811.12.camel@palm.tree.com>
	<20041001192551.GA3381@peter.osted.lan>
	<415EEFFE.5080309@elischer.org>
	<20041002183120.GA1202@peter.osted.lan>
Content-Type: text/plain
Message-Id: <1096760257.34527.14.camel@palm.tree.com>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Sat, 02 Oct 2004 19:37:37 -0400
Content-Transfer-Encoding: 7bit
cc: Julian Elischer <julian@elischer.org>
cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: scheduler (sched_4bsd) questions
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 23:37:41 -0000

On Sat, 2004-10-02 at 14:31, Peter Holm wrote:
> OK, right now I'm testing with all of Stephan's patches + the 
> MUTEX_WAKE_ALL flag. Uptime is 3 3/4 hour and looking good.

Great.

Your attached diff contained all the fixes needed and I don't see the
need to post a cumulative patch. 

The only thing left to do is migrate a critical sections from
kern_mutex.c to subr_turnstile.c for readability. 
(no functional changes)

Maybe it would also better to just force  MUTEX_WAKE_ALL in
kern_mutex.c (#ifndef MUTEX_WAKE_ALL \n#define MUTEX_WAKE_ALL\n#endif)
to avoid temporary configuration file pollution?

	Stephan