From owner-freebsd-current@FreeBSD.ORG  Fri Oct 27 19:27:17 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CA20516A412
	for <current@freebsd.org>; Fri, 27 Oct 2006 19:27:17 +0000 (UTC)
	(envelope-from prvs=julian=44840db18@elischer.org)
Received: from a50.ironport.com (a50.ironport.com [63.251.108.112])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 08E2943D7C
	for <current@freebsd.org>; Fri, 27 Oct 2006 19:27:14 +0000 (GMT)
	(envelope-from prvs=julian=44840db18@elischer.org)
Received: from unknown (HELO [10.251.18.229]) ([10.251.18.229])
	by a50.ironport.com with ESMTP; 27 Oct 2006 12:27:14 -0700
Message-ID: <45425D92.8060205@elischer.org>
Date: Fri, 27 Oct 2006 12:27:14 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 1.5.0.7 (Macintosh/20060909)
MIME-Version: 1.0
To: current@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Comments on the  KSE option
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Oct 2006 19:27:17 -0000

John, I appreciate that you have made KSE an option, but the way you 
have done it shows a complete misundertanding of what is there.

What you are calling "KSE" is in fact several different facilities that
are orthogonal.  The one that you have the most trouble with is in fact 
not SA based threading (refered to by most people as "KSE" but, rather
the fair scheduling code).

The aim of the fair scheduling code is to ensure that if you, as a user, 
make a process that starts 1000 threads, and I as a user, make an 
unthreaded process, then I can still get to the CPU at somewhat similar
rates to you.  A naive scheduler would give you 1000 cpu slots and me 1.

the current fair scheduler tries to make sure that each process gets
a fair crack at the CPU by holding back some of the runnable threads 
from the threadded process, until the ones it has in therun queu have 
been completed.. A bit like telling a young child, "yes you can have 
more ice-cream, when you've finished the ice-cream you already have".

I note that David recently (in the last year) disabled the fair
scheduling capacity of the libthr  code, but he didn't do it quite right
so that it still does all the work for it, and then disregarded the 
result. This means that not only does a 1000 thread process (libthr)
completely push a nonthreaded process out of the system, but it pays
all the costs in the scheduler for working out how to NOT do that.


The fairness algorythm that you have made 'optional' is a very crude one 
and I had thought that by now someone would have written a better one,
but no-one has.

I suggest that you fix your patch in two ways:
1/ you need (at least) 2 options.
     KSE
and
     FAIR_THREADS

most of the improvements you are seeing comes from the second one.
Especially all your changes that are in the scheduler. This removes the 
fair scheduling capability. It affects all threading libraries that
do not deliberatly knacker it. In other words it should be orthogonal
to what threading library is running.

If it is made a project goal that threads should be unfair, then
I have no objections to removing the code, but it needs to be a decision
that is deliberately taken. It was an initial project goal that threads 
should be fair, and the fact that David has made it ineffective for
libthr (though he still pays the full price for it) is not a reason to
throw it out. (What he does is to assign a new KSEGRP for each thread, 
but he doesn't label it  as exempt from fairness so it does all the
work only to discover at the end that it is the only thread on the 
ksegrp, and therefore always eligible to run).
If the correct flags were set, then then David's threads
could probably get the same speedup as seen with the KSE option removed,
as all the overhead would be skipped, but then we would be officially
condoning unfair threading.
teh chage to do thos would be to add a ksegrp or thread flag (possibly 
thread) called TDF_FAIR_SCHED

and change the few lines in the scheduler that do:
         if ((td->td_proc->p_flag & P_HADTHREADS) == 0) {

to be
	if ((td->flags & TDF_FAIR_SCHED) == 0) {


and set that flag in the  threading libraries when threads should be
made fair.  then probably the entire advantage seen by David in the 
supersmack tests from unsetting KSE would be seen by simply not setting
that bit.

(it might also just look for:
       if (td->ksegrp->kg_numthreads == 1)
and achieve the same thing automatically.


So, the question is:
DO we as a project want to have fair threading or unfair threading?

Should processes with a lot of threads be able to push out processes 
that do the same thing by using a state machine or an event loop?

BTW another alternative would be to write a different scheduler,
called sched_4bsd-unfair  (or similar) and just strip out the fairness 
code.  it would be another way of doing much the same thing.

This is a completely different question to whether there should be
an M:N threading library, the existance of which should make no
noticable difference to the speed of processses that don't use it.

My moral for this story is.
"If you don't understand the bigger picture and you modify things
then you can expect that your modifications may have unforseen
circumstances."

I as well as most other people fall foul of this at various times in our
carreers.


============
Technical note:

The current fairness code relies on a sub structure of the proc, called 
a ksegrp.  This structure represents the "unit of fairness". Most 
processes have one of these so they act as if the unit of fairness is 
the entire process.  The concept was that a threaded process would have 
one of these for it's directly  allocated threads, and that they woudl 
act as a group, fairly towards the rest of the system. A process could
also have a library that unbeknownst to the program propper, would 
create its own ksegrp, with its own threads that would act independently
and as their own 'fairness' characteristics, priorities etc.
The threads only the top N (= ncpu usually) threads are aloowed onto the
system run queue to compete with other processes. By assigning a 
separate KSEGRP for each thread the libthr code assures that each thread
is immediatly promoted to the system run queue, however because the 
system code doesn't realise that he is trying to subvert the fairness 
code, it still takes the code path the looks at the ksegrp run qieies 
and does all sorts of other checks.

If someone can come up with a better fairness method (Please!)
then I'm happy to see all that code in the shceduler replaced by 
whatever else is chosen (nothing if we REALLY want to see thread 
unfairness).

I think that libthr should be moved back to be "fair" by default,
and that unfair  mode should be made optional (if you are root) so
that dedicated servers, where the administrator wants to get all the
performance, and is willing to state explicitly that fairness is
not important to him, can do just that (and for benchmarks).