From owner-freebsd-smp Sat Apr 5 09:05:27 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id JAA16776 for smp-outgoing; Sat, 5 Apr 1997 09:05:27 -0800 (PST) Received: from spinner.DIALix.COM (root@spinner.dialix.com [192.203.228.67]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA16767 for ; Sat, 5 Apr 1997 09:05:17 -0800 (PST) Received: from spinner.DIALix.COM (peter@localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.8.5/8.8.5) with ESMTP id BAA18422; Sun, 6 Apr 1997 01:04:44 +0800 (WST) Message-Id: <199704051704.BAA18422@spinner.DIALix.COM> X-Mailer: exmh version 2.0gamma 1/27/96 To: cr@jcmax.com (Cyrus Rahman) cc: smp@freebsd.org Subject: Re: Questions about mp_lock In-reply-to: Your message of "Sat, 05 Apr 1997 11:17:25 EST." <9704051617.AA05092@corona.jcmax.com> Date: Sun, 06 Apr 1997 01:04:44 +0800 From: Peter Wemm Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Cyrus Rahman wrote: > Could someone who had a hand in implementing the SMP kernel give me a hint > about why the mp_lock count gets stored in the proc/user structure and > switched out in cpu_switch()? > > Seems kind of weird, since I would expect that a process getting switched in > or out would always posses exactly one lock, and that any others would be > the result of interrupts. But it does appear that something more complicated > is going on, and I can't exactly figure out what it is. The main problem is that the kernel can be recursively entered while flow of execution is still "in the kernel". One interrupt can interrupt another's handler, a process can take a page fault while doing a copyin, causing the kernel to be reentered via the trap handlers and end up in the vm system. The catch is that when the kernel takes a page fault on a process's behalf, the odds are that the process is going to sleep while waiting for a block to be read from the disk etc. When we context switch, the kernel stack goes with it. If we switch from a context that's three levels deep to another one that's only two deep, we're going to return to user mode while holding the kernel lock, or if we switch from a 2-deep to a 3-deep context, the last part of the unwind in the new context is going to run in the kernel without the lock, and the other cpu can enter the kernel. So, we switch the nest count with the process. It's far from ideal, but it works reasonably well on two cpus. However, there's plenty of scope for improvement.. Moving the kernel locking up a layer and having a seperate entry/exit lock in the trap/syscall/interupt area would be a major win without too much cost. What we'd gain by that would be that we could then gradually move to a per-subsystem locking system perhaps based initially on which syscall or trap type. It'd be quite possible to have one cpu in the kernel doing IP checksumming on a packet, another in the vfs system somewhere, another doing some copy-on-write page copies in the vm system and so on. Things like getpid() would need no locking whatsoever. But that's for later once the basics are working. > Cyrus Cheers, -Peter