From owner-freebsd-current@FreeBSD.ORG  Wed Oct 29 07:52:48 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DE9A716A4CE
	for <current@freebsd.org>; Wed, 29 Oct 2003 07:52:47 -0800 (PST)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 54C4343F3F
	for <current@freebsd.org>; Wed, 29 Oct 2003 07:52:46 -0800 (PST)
	(envelope-from bde@zeta.org.au)
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id CAA19157;
	Thu, 30 Oct 2003 02:52:36 +1100
Date: Thu, 30 Oct 2003 02:52:36 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Jeff Roberson <jroberson@chesapeake.net>
In-Reply-To: <20031017150929.T6652@gamplex.bde.org>
Message-ID: <20031030023049.P628@gamplex.bde.org>
References: <20031015034832.E30029-100000@mail.chesapeake.net>
 <20031017150929.T6652@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: current@freebsd.org
Subject: Re: More ULE bugs fixed.
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Oct 2003 15:52:48 -0000

> Test for scheduling buildworlds:
>
> 	cd /usr/src/usr.bin
> 	for i in obj depend all
> 	do
> 		MAKEOBJDIRPREFIX=/somewhere/obj time make -s -j16 $i
> 	done >/tmp/zqz 2>&1
>
> (Run this with an empty /somewhere/obj.  The all stage doesn't quite
> finish.)  On an ABIT BP6 system with a 400MHz and a 366MHz CPU, with
> /usr (including /usr/src) nfs-mounted (with 100 Mbps ethernet and a
> reasonably fast server) and /somewhere/obj ufs1-mounted (on a fairly
> slow disk; no soft-updates), this gives the following times:
>
> SCHED_ULE-yesterday, with not so careful setup:
>        40.37 real         8.26 user         6.26 sys
>       278.90 real        59.35 user        41.32 sys
>       341.82 real       307.38 user        69.01 sys
> SCHED_ULE-today, run immediately after booting:
>        41.51 real         7.97 user         6.42 sys
>       306.64 real        59.66 user        40.68 sys
>       346.48 real       305.54 user        69.97 sys
> SCHED_4BSD-yesterday, with not so careful setup:
>       [same as today except the depend step was 10 seconds slower (real)]
> SCHED_4BSD-today, run immediately after booting:
>        18.89 real         8.01 user         6.66 sys
>       128.17 real        58.33 user        43.61 sys
>       291.59 real       308.48 user        72.33 sys
> SCHED_4BSD-yesterday, with a UP kernel (running on the 366 MHz CPU) with
>     many local changes and not so careful setup:
>        17.39 real         8.28 user         5.49 sys
>       130.51 real        60.97 user        34.63 sys
>       390.68 real       310.78 user        60.55 sys
>
> Summary: SCHED_ULE was more than twice as slow as SCHED_4BSD for the
> obj and depend stages.  These stages have little parallelism.  SCHED_ULE
> was only 19% slower for the all stage.  ...

I reran this with -current (sched_ule.c 1.68, etc.).  Result: no
significant change.  However, with a UP kernel there was no significant
difference between the times for SCHED_ULE and SCHED_4BSD.

> Test 5 for fair scheduling related to niceness:
>
> 	for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
> 	do
> 		nice -$i sh -c "while :; do echo -n;done" &
> 	done
> 	time top -o cpu
>
> With SCHED_ULE, this now hangs the system, but it worked yesterday.  Today
> it doesn't get as far as running top and it stops the nfs server responding.
> To unhang the system and see what the above does, run a shell at rtprio 0
> and start top before the above, and use top to kill processes (I normally
> use "killall sh" to kill all the shells generated by tests 1-5, but killall
> doesn't work if it is on nfs when the nfs server is not responding).

This shows problems much more clearly with UP kernels.  It gives the
nice -20 and -16 processes approx. 55% and 50% of the CPU, respectively
(the total is significantly more than 100%), and it gives approx.  0%
of the CPU to the other sh processes (perhaps exactly 0).  It also
apparently gives gives 0% of the CPU to some important nfs process (I
couldn't see exactly which) so the nfs server stops responding.
SCHED_4BSD errs in the opposite direction by giving too many cycles to
highly niced processes so it is naturally immune to this problem.  With
SMP, SCHED_ULE lets many more processes run.

The nfs server also sometimes stops reponding with only non-negatively
niced processes (0 through 20 in the above), but it takes longer.

The nfs server restarts if enough of the hog processes are killed.
Apparently nfs has some critical process running at only user priority
and nice 0 and even non-negatively niced processes are enough to prevent
it it running.

Top output with loops like the above shows many anomalies in PRI, TIME,
WCPU and CPU, but no worse than the ones with SCHED_4BSD.  PRI tends to
stick at 139 (the max) with SCHED_ULE.  With SCHED_4BSD, this indicates
that the scheduler has entered an unfair scheduling region.  I don't
know how to interpret it for SCHED_ULE (at first I thought 139 was a
dummy value).

Bruce