From owner-freebsd-arch  Fri May 26 11:23:27 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1])
	by hub.freebsd.org (Postfix) with ESMTP id 48C3137C66A
	for <arch@freebsd.org>; Fri, 26 May 2000 11:19:16 -0700 (PDT)
	(envelope-from cp@berserker.bsdi.com)
Received: from berserker.bsdi.com (cp@localhost [127.0.0.1])
	by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id MAA14981;
	Fri, 26 May 2000 12:19:10 -0600 (MDT)
Message-Id: <200005261819.MAA14981@berserker.bsdi.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: arch@freebsd.org
Subject: Re: Short summary 
From: Chuck Paterson <cp@bsdi.com>
Date: Fri, 26 May 2000 12:19:10 -0600
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Some confusion may have been introduced by when I said that we
allowed no interrupts in the kernel at the transitional stage. This
isn't true, we allowed all interrupts, its just that all they did
is schedule a thread, which got put on a run queue which then go
run on the way out.

I believe that we want to get to the point where all top half code
is running under one mutex with interrupts enabled before adding
any mutexs. One advantage of having a single well known mutex for
unconverted code is that sleep and tsleep can know to release it.

Running cli'd will prevent IPIs from being delivered. Any of these
that need to run synchronously will fail, such as tlb shoot downs.

The code which implements the mutex operations knows that the
processor should not be cli'd when acquiring or releasing a non
spin mutex. Aside from making the non debug code simpler it has
proved to be a valuable debugging tool.

Once you have the kernel running with interrupts enabled and under
a single lock work can begin in parallel. This without actually
having to start another processor.

    a)	Putting in the mechanism so you can have safe
	    and unsafe drivers. You need this unless you
	    want to try and convert all the drivers at once.
    b)	The actual code to handle light weight interrupt
	    threads.
    c)	Malloc, which has to be addressed to a first
	    degree before much or top half work can be done.
    d)	Implementation/porting/debugging of mutex order 
	    checking code.
    e)	Converting drivers and driver sub systems.


There are multiple paths to getting to running under a single mutex.
What Matt suggested below can be modified slight to achieve this.
I'll try and lay out what was done with BSD/OS though it was a long
time ago. I should also point out that it didn't take very long.
One person for a couple maybe 3 weeks. The actual code took very little
time, but debugging when the most basic stuff is suspect was painful.
The trace stuff in the SMPng code can now trace to memory on the
PCI bus. This would have made things much easier. The spontaneous
reboots where a killer.

This is what we did for BSD/OS, not the modification of Matt's
suggestion to achieve the same thing. Its been a while so I am
undoubtedly forgetting details. It was all straight forward and
because apparent in the normal course of events.

	1)	Defined basic types for macros.
	2)	Added mutex argument to sleep and tsleep, modified
		all callers to pass in NULL.
	3)	Added code to create the process for an interrupt
		thread. Really just called fork I think. Got
		thread created and safely stopped, not running
		not on run queue, and other process running.
	4)	Did basic testing of mutexs, both in user code
		and then in kernel. Not used for anything but
		test to see that they operated as expected. In
		our case this didn't really get all the bugs
		out.
	5)	Put code in spls so they could be turned into NOP
		by setting a new_mode variable
	6)	Added code to top level interrupt handlers to
		put a thread on the run queue and return leaving
		interrupts for that level blocked rather than
		calling the actual handlers for that level if new
		mode was set.
	6)	Added the code to the interrupt threads to
		a) release sched lock (spin mutex with a bad name)
		b) acquire giant (non spin)
		c) call the handles associated with the
		    thread associated with a given level
		d) release giant
		e) acquire sched
		f) re-enable interrupts
		g) call cpu_switch
	8)	Added code to the mutex routines so they would not do anything
		unless new mode was set.
	9)	Added code to cpu switch to fix up recursion count and
		owner on sched_lock
	10)	Added code around all calls to cpu_switch to acquire
		and release the sched lock.
	10)	Add acquiring and releasing of giant in trap an syscall.
	11)	Made sleep/tsleep give up giant if held on
		entry and acquire on exit.
	12)	Flipped the switch and debugged for a week or so.
		A couple of items above where actually done here, but
		I don't remember which ones.

I only hacked up the non sio mode interrupt code so we never
made the kernel in this configuration run with more than one
processor.

At this point someone else was working on the real interrupt
handler and could have some confidence that when he started
trying to run them the rest of the system would behave.

The SMPng kernel went directly from hacked up code above to the
light weight interrupt code. This took a long time, months. This
code is very optimized, assembler code which is then patched together
on the fly to match the level, type of interrupt source and the
number of handlers which need to be called. The goal, which was
achieved, was to make getting in an out of an interrupt thread as
cheap or cheaper than what was present in the previous kernel. This
was all part of the effort to make sure we would not require two
kernels in the long run. I would have really liked to have a MP
capable kernel in less time. This could have been achieved with a
fairly simple code to always use heavy weight interrupt threads.
The person who really knew how to deal with the APIC was the same
person doing the light weight threads, so this didn't happen in
BSD/OS. We, meaning FreeBSD this time, should certainly consider
this. This is assuming an approach similar to that which I have
outlined is adopted.

While waiting for real interrupt code:
    Really made sleep and tsleep do the right thing. Actually
    this took several go arounds, but it got close here.
    I Made malloc use its own lock.
    Added the run priorities really needed by interrupt threads.
    Added the priority propagation code.
    Added the code for safe and unsafe drivers.
    And mainly worked on the non SMPng kernel. Not because I
	didn't have stuff to do on the SMPng kernel, but other
	stuff got pushed first.

Chuck

----- Begin Included Message -----

    I think this will work to get the ball rolling.  We can simply
    'cli' in the MP lock code and 'sti' in the MP last-unlock code
    ( i386/i386/mplock.s ).  Then we can turn the spl*() calls into
    NOPs and do the (relatively trivial) fixup to the scheduler.
    Actually, I don't think we would have to touch the scheduler at all
    for this step, it already releases the MP lock and it already supports
    scheduling supervisor contexts to multiple cpu's.  (In fact, we 
    already support lockless system calls even though only a few trivial
    calls do it at the moment!).

    The next step would be to implement interrupt threads and simply
    allow them to be scheduled by the scheduler holding the MP lock.

    After the interrupts are all threaded, we can start removing the MP
    lock and switching subsystems over to use mutexes.

    What do you think?

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
----- End Included Message -----


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message