Date: Wed, 27 Oct 1999 01:28:41 -0700 (PDT) From: Alfred Perlstein <bright@wintelcom.net> To: smp@freebsd.org Subject: SMP infoletter #1 Message-ID: <Pine.BSF.4.05.9910270125180.12797-100000@fw.wintelcom.net>
next in thread | raw e-mail | index | archive | help
Infoletter #1 This is the start of what I hope to be several informative documents describing the current and ongoing state of SMP in FreeBSD. The purpose is to avoid duplicate research of the current state of FreeBSD's SMP behavior by those who haven't been following FreeBSD-SMP since 'day one'. It also points out some areas that are still unclear to me. This document was written on Tue Oct 26 1999 referencing the HEAD branch of the code, things may have significantly changed since. I also hope that this series helps to shed some light onto the low level routines in the kernel such as trap and interrupt handling, ASTs and scheduling. Where possible direct pointers are given to source code to reduce the amount of digging one must do to locate routines of interest. It is also important to note that the document is the result of the author investigation into the code, and much appreciated help from various members of the FreeBSD development team, (Poul-Henning Kamp (phk), Alan Cox (alc), Matt Dillon (dillon)) and Terry Lambert. As I am not the writer of the code there may be missing or incorrect information contained in this document. Please email any corrections or comments to smp@freebsd.org and please make sure I get a CC. (alfred@freebsd.org) ------------------------------------------------------------ The Big Giant Lock: (src/sys/i386/i386/mplock.s) The current state of SMP in FreeBSD is by means of the Big Giant Lock, (BGL). The BGL is an exclusive counting semaphore, the lock may be recursively acquired by a single CPU, from that point on other CPUs will spin while waiting to acquire the lock. The implementation on i386 is contained in the file src/sys/i386/i386/mplock.s The function 'void MPgetlock(unsigned int *lock)' acquires the BGL. An important side effect of MPgetlock is that it routes all interrupts to the processor that has acquired the lock. This is done so that if an interrupt occurs the handler doesn't need to spin waiting for the BGL. The code that is responsible for routing the interrupts is the GRAB_HWI macro within the MPgetlock code. Which fiddles the local APIC's interrupt priority level. Other MPlock functions exist in mplock.s to initialize, test and release the lock. --- Usage of the BGL: (src/sys/i386/i386/mplock.s) The BGL is pushed down (acquired) on all entry into the kernel, by means of syscall, trap or interrupt. The file src/sys/i386/i386/exception.s contains all the initial entry points for syscalls, traps and interrupts. syscalls and 'altsyscalls' acquire the lock through the macros SYSCALL_LOCK, and ALTSYSCALL_LOCK which map to the functions assembler functions _get_syscall_lock and _get_altsyscall_lock on SMP machines (if SMP is not defined they are not called) _get_syscall_lock and _get_altsyscall_lock are also present in src/sys/i386/i386/mplock.s, they save the contents of the local apic's interrupt priority and call MPgetlock. It would seem that the syscall lock could simply be delayed until entry to the actual system call (write/read/...) however several issues arise: 1) fault on copyin of user's syscall arguments This is actually a non-issue, if a fault occurs the processor will spin to acquire the MPlock, before potentially recursing into the non-re-entrant vm system. Although this leaves the processor in a faulted state for quite some time, it is no different than when CPU 1 has the lock and a process running on CPU 2 page faults. Problem #1 takes care of itself because of the recursive MPlock. 2) ktrace hooks src/sys/kern/kern_ktrace.c The ktrace hooks in the syscalls manipulate kernel resources that are not MP safe, ktrace touches many parts of the kernel that need work to become MP safe, a temporary solution would be to raise the BGL when entering the ktrace code. 3) STOPEVENT aka void stopevent(struct proc*, unsigned int, unsigned int); /home/src/sys/kern/sys_process.c stopevent will be called if the process is marked to sleep via procfs, stopping the process requires entry into the scheduler which is not MP safe. again a temporary hack would be to conditionally set the MPlock if the condition exists. --- SPL issues: (src/sys/i386/isa/ipl_funcs.c) There exists an inherent race condition with the spl() system in a MP environment, consider: system is at splbio: process A process B int s; int s; s = splhigh(); /* spl raised to high however, saved spl 's' has old value of splbio */ s = splhigh(); /* spl still high */ splx(s); /* processor spl now at bio even though B still needs splhigh */ splx(s); Process B may be interrupted in a critical section. Also note that the asymmetric nature of the spl system makes it very difficult to pinpoint down locations in the the bottom half of the kernel (the part that services interrupts) that may collide with the top half (user process context). A short sighted solution would be to enforce spl as an MPlock, an exclusive counting semaphore, however since no locking protocol or ordering of spl pushdown is required deadlock becomes a major problem. The only solution that may work with spl, is adding the pushdown of the BGL when first asserting any level of spl and releasing the MPlock when spl0 is reached. It may also be interesting to see what a separate lock based only on spl would accomplish, moving to a model where the spl entry points become our new BGL might also be something to investigate. Since spl is used only for short time mutual exclusion it may actually work nicely as a course grained locking system for the time being. --- Simple locks: (src/sys/i386/i386/simplelock.s) cursory research into the CVS logs reveals: on the file kern/vfs_syscalls.c: 1.28 Thu Jul 13 8:47:42 1995 UTC by davidg Diffs to 1.27 NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: ... 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. However with the Lite/2 merge they were re-introduced and the kernel is littered with them, the ones in place seem somewhat adequate for short term exclusion. essentially they are spinlocks. What's interesting is that the simplelocks seem to provide for MP sync with lockmgr locks, however the code is littered with calls to unsafe functions such as MALLOC. It looks like someone decided to do the hard stuff first. Why are the simplelocks necessary if the kernel is still guarded by the BGL? (besides use in the lockmgr) --- Scheduler: The scheduler in cpu_switch() (src/sys/i386/i386/swtch.s) saves the current nesting level of the process's MPlock (after masking off the CPUid bits from it) into the PCB (process control block) (lines 317-324) before attempting to switch to another process where it restores the next process's nesting level (lines 453-455). --- -Alfred Perlstein - [bright@rush.net|alfred@freebsd.org] Wintelcom systems administrator and programmer - http://www.wintelcom.net/ [bright@wintelcom.net] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9910270125180.12797-100000>