From owner-freebsd-arch Fri May 26 11:23:27 2000 Delivered-To: freebsd-arch@freebsd.org Received: from berserker.bsdi.com (berserker.twistedbit.com [199.79.183.1]) by hub.freebsd.org (Postfix) with ESMTP id 48C3137C66A for ; Fri, 26 May 2000 11:19:16 -0700 (PDT) (envelope-from cp@berserker.bsdi.com) Received: from berserker.bsdi.com (cp@localhost [127.0.0.1]) by berserker.bsdi.com (8.9.3/8.9.3) with ESMTP id MAA14981; Fri, 26 May 2000 12:19:10 -0600 (MDT) Message-Id: <200005261819.MAA14981@berserker.bsdi.com> To: Matthew Dillon Cc: arch@freebsd.org Subject: Re: Short summary From: Chuck Paterson Date: Fri, 26 May 2000 12:19:10 -0600 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Some confusion may have been introduced by when I said that we allowed no interrupts in the kernel at the transitional stage. This isn't true, we allowed all interrupts, its just that all they did is schedule a thread, which got put on a run queue which then go run on the way out. I believe that we want to get to the point where all top half code is running under one mutex with interrupts enabled before adding any mutexs. One advantage of having a single well known mutex for unconverted code is that sleep and tsleep can know to release it. Running cli'd will prevent IPIs from being delivered. Any of these that need to run synchronously will fail, such as tlb shoot downs. The code which implements the mutex operations knows that the processor should not be cli'd when acquiring or releasing a non spin mutex. Aside from making the non debug code simpler it has proved to be a valuable debugging tool. Once you have the kernel running with interrupts enabled and under a single lock work can begin in parallel. This without actually having to start another processor. a) Putting in the mechanism so you can have safe and unsafe drivers. You need this unless you want to try and convert all the drivers at once. b) The actual code to handle light weight interrupt threads. c) Malloc, which has to be addressed to a first degree before much or top half work can be done. d) Implementation/porting/debugging of mutex order checking code. e) Converting drivers and driver sub systems. There are multiple paths to getting to running under a single mutex. What Matt suggested below can be modified slight to achieve this. I'll try and lay out what was done with BSD/OS though it was a long time ago. I should also point out that it didn't take very long. One person for a couple maybe 3 weeks. The actual code took very little time, but debugging when the most basic stuff is suspect was painful. The trace stuff in the SMPng code can now trace to memory on the PCI bus. This would have made things much easier. The spontaneous reboots where a killer. This is what we did for BSD/OS, not the modification of Matt's suggestion to achieve the same thing. Its been a while so I am undoubtedly forgetting details. It was all straight forward and because apparent in the normal course of events. 1) Defined basic types for macros. 2) Added mutex argument to sleep and tsleep, modified all callers to pass in NULL. 3) Added code to create the process for an interrupt thread. Really just called fork I think. Got thread created and safely stopped, not running not on run queue, and other process running. 4) Did basic testing of mutexs, both in user code and then in kernel. Not used for anything but test to see that they operated as expected. In our case this didn't really get all the bugs out. 5) Put code in spls so they could be turned into NOP by setting a new_mode variable 6) Added code to top level interrupt handlers to put a thread on the run queue and return leaving interrupts for that level blocked rather than calling the actual handlers for that level if new mode was set. 6) Added the code to the interrupt threads to a) release sched lock (spin mutex with a bad name) b) acquire giant (non spin) c) call the handles associated with the thread associated with a given level d) release giant e) acquire sched f) re-enable interrupts g) call cpu_switch 8) Added code to the mutex routines so they would not do anything unless new mode was set. 9) Added code to cpu switch to fix up recursion count and owner on sched_lock 10) Added code around all calls to cpu_switch to acquire and release the sched lock. 10) Add acquiring and releasing of giant in trap an syscall. 11) Made sleep/tsleep give up giant if held on entry and acquire on exit. 12) Flipped the switch and debugged for a week or so. A couple of items above where actually done here, but I don't remember which ones. I only hacked up the non sio mode interrupt code so we never made the kernel in this configuration run with more than one processor. At this point someone else was working on the real interrupt handler and could have some confidence that when he started trying to run them the rest of the system would behave. The SMPng kernel went directly from hacked up code above to the light weight interrupt code. This took a long time, months. This code is very optimized, assembler code which is then patched together on the fly to match the level, type of interrupt source and the number of handlers which need to be called. The goal, which was achieved, was to make getting in an out of an interrupt thread as cheap or cheaper than what was present in the previous kernel. This was all part of the effort to make sure we would not require two kernels in the long run. I would have really liked to have a MP capable kernel in less time. This could have been achieved with a fairly simple code to always use heavy weight interrupt threads. The person who really knew how to deal with the APIC was the same person doing the light weight threads, so this didn't happen in BSD/OS. We, meaning FreeBSD this time, should certainly consider this. This is assuming an approach similar to that which I have outlined is adopted. While waiting for real interrupt code: Really made sleep and tsleep do the right thing. Actually this took several go arounds, but it got close here. I Made malloc use its own lock. Added the run priorities really needed by interrupt threads. Added the priority propagation code. Added the code for safe and unsafe drivers. And mainly worked on the non SMPng kernel. Not because I didn't have stuff to do on the SMPng kernel, but other stuff got pushed first. Chuck ----- Begin Included Message ----- I think this will work to get the ball rolling. We can simply 'cli' in the MP lock code and 'sti' in the MP last-unlock code ( i386/i386/mplock.s ). Then we can turn the spl*() calls into NOPs and do the (relatively trivial) fixup to the scheduler. Actually, I don't think we would have to touch the scheduler at all for this step, it already releases the MP lock and it already supports scheduling supervisor contexts to multiple cpu's. (In fact, we already support lockless system calls even though only a few trivial calls do it at the moment!). The next step would be to implement interrupt threads and simply allow them to be scheduled by the scheduler holding the MP lock. After the interrupts are all threaded, we can start removing the MP lock and switching subsystems over to use mutexes. What do you think? -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message ----- End Included Message ----- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message