From owner-cvs-all@FreeBSD.ORG  Wed Jun 23 08:12:37 2004
Return-Path: <owner-cvs-all@FreeBSD.ORG>
Delivered-To: cvs-all@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9E57516A4D0; Wed, 23 Jun 2004 08:12:37 +0000 (GMT)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 2364C43D48; Wed, 23 Jun 2004 08:12:37 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.0.86])i5N8CL4u009349;	Wed, 23 Jun 2004 18:12:21 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i5N8CIao032004;	Wed, 23 Jun 2004 18:12:19 +1000
Date: Wed, 23 Jun 2004 18:12:17 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <Pine.BSF.4.21.0406221855450.59196-100000@InterJet.elischer.org>
Message-ID: <20040623172902.C57766@gamplex.bde.org>
References: <Pine.BSF.4.21.0406221855450.59196-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: cvs-src@freebsd.org
cc: src-committers@freebsd.org
cc: David Xu <davidxu@freebsd.org>
cc: cvs-all@freebsd.org
cc: Bruce Evans <bde@freebsd.org>
Subject: Re: cvs commit: src/sys/kern kern_exit.c
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jun 2004 08:12:37 -0000

On Tue, 22 Jun 2004, Julian Elischer wrote:

> On Wed, 23 Jun 2004, David Xu wrote:
>
> > Bruce Evans wrote:
> >  >bde         2004-06-21 14:49:50 UTC
> >  >
> >  >  FreeBSD src repository
> >  >
> >  >  Modified files:
> >  >    sys/kern             kern_exit.c
> >  >  Log:
> >  >  (1) Removed the bogus condition "p->p_pid != 1" on calling >sched_exit()
> >  >      from exit1().  sched_exit() must be called unconditionally from
> >   >exit1().
> >  >      It was called almost unconditionally because the only exits on
> >  >system
> >  >      shutdown if at all.
> >  >
> >  >  (2) Removed the comment that presumed to know what sched_exit() does.
> >  >      sched_exit() does different things for the ULE case.  The call
> >  >became
> >  >      essential when it started doing load average stuff, but its caller
> >  >      should not know that.
> >
> > But this change loses a semantic, in most time, init is waitting there
> > to recycle runnaway processes, those process were not created by init,
> > if you call sched_exit for init unconditionally, the runnaway process's
> > cpu usage are all merged into init, this is unfair for init, is there

Er, the unquoted clause (3) in the commit log says this and more.  Priority
merging is unfair to all parent processes.  It's more of a problem for
shells.

> > any benifit to lower init's priority under load to slow down recycling
> > speed ? I don't think so. I think scheduler's sched_exit should be
> > fixed at same time to keep this semantic.

I may fix this.  I used to just remove the cpu merging in exit and cpu
inheritance in fork.  It think was originally to limit creation of new
processes for a special application (wcarchive forking ftpd's).  It
worked too well to limit creation of new processes in general.  When
it was committed, there was no ESTCPULIM to limit growth of cpu.
fork/exec grows the cpu in a fake way, so it wants ti be exponential
in the number of children and could grow to 2^30 after just 30 fork+execs
and then overflow to 2^31 on the next one.  Once it got to a few
hundred, it gave maximal (numeric) priority so processes tended not
to run; however if they did manage to fork-exec a few more times, their
cpu could reach 2^30 and then it took a _long_ time for it to decay
back below a few hundred so that the process could run again in
competition with processes with normal cpu/priority .  This caused
mysterious multi-second hangs in shells.

I've used more limited cpu merging since KSE made it clear that some
sort of cpu inheritance and merging is right:

%%%
Index: sched_4bsd.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/sched_4bsd.c,v
retrieving revision 1.41
diff -u -2 -r1.41 sched_4bsd.c
--- sched_4bsd.c	21 Jun 2004 23:47:47 -0000	1.41
+++ sched_4bsd.c	23 Jun 2004 06:28:01 -0000
@@ -550,9 +662,20 @@

 void
-sched_exit_ksegrp(struct ksegrp *kg, struct ksegrp *child)
+sched_exit_ksegrp(struct ksegrp *parent, struct ksegrp *child)
 {

 	mtx_assert(&sched_lock, MA_OWNED);
-	kg->kg_estcpu = ESTCPULIM(kg->kg_estcpu + child->kg_estcpu);
+	/*
+	 * XXX adding all of the child's cpu to the parent's like we used to
+	 * do would be wrong, since we duplicate the parent's cpu at fork
+	 * time so adding it all back would give exponential growth.  In
+	 * practice, the growth would have been limited by ESTCPULIM, but that
+	 * would be wrong too since it is very nonlinear.  Splitting the cpu
+	 * at fork time would be better, but adding it all back here would
+	 * still give nonlinearities since multiple processes tend to
+	 * accumulate more cpu than single ones.
+	 */
+	if (parent->kg_estcpu < child->kg_estcpu)
+		parent->kg_estcpu = child->kg_estcpu;
 }

%%%

My 4BSD scheduler needs to limit growth of fake cpu somewhere because
it lets non-fake cpu grow without bound (except for natural bounds
given by actual cpu use and cpu decay).  This is to fix breakdown of
the decay algorithm by clamping growth with ESTCPULIM().

> exaclty..
>
> Actually this doesn't CHANGE anything because "p->p_pid != 1
> was ALWAYS TRUE.

The problem is apparently unimportant, because it was only noticed by
code inspection.  ESTCPULIM() limits it in the same way as shells, and
init doesn't run much so its priority soon decays.  It obviously isn't
important for init to have a higher priority than most processes, else
it would be negatively niced.

Bruce