From owner-freebsd-hackers Sat Jun 12 3:40:31 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from quark.ChrisBowman.com (crbowman.erols.com [209.122.47.155]) by hub.freebsd.org (Postfix) with ESMTP id C224614CDF for ; Sat, 12 Jun 1999 03:40:24 -0700 (PDT) (envelope-from crb@ChrisBowman.com) Received: from fermion (crb@fermion.ChrisBowman.com [10.0.1.2]) by quark.ChrisBowman.com (8.9.3/8.9.3) with SMTP id FAA04934; Sat, 12 Jun 1999 05:40:10 -0500 (EST) (envelope-from crb@ChrisBowman.com) Message-Id: <199906121040.FAA04934@quark.ChrisBowman.com> X-Sender: crb@quark X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0.1 Date: Sat, 12 Jun 1999 06:37:50 -0400 To: Arun Sharma From: "Christopher R. Bowman" Subject: Re: High syscall overhead? Cc: "David E. Cross" , freebsd-hackers@FreeBSD.ORG In-Reply-To: References: <"David E. Cross"'s message of "Fri, 11 Jun 1999 10:40:37 -0400"> <199906111440.KAA70517@cs.rpi.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG At 11:15 PM 6/11/99 -0700, Arun Sharma wrote: >"David E. Cross" writes: > >> Looking through the exception.s it appears that on entry to the >> kernel an MP lock is obtained... I thought we had splX(); to >> protect concurancy in the kernel. > >Can someone explain to me why is SYSCALL_LOCK necessary ? It certainly >seems to hurt system call performance on a MP machine. > >Also, is there any data on lock contention in FreeBSD ? Is anyone >working on decomposing some of the giant locks ? I can't speak authoritatively since I don't know specifically what SYSCALL_LOCK is, but if it is what is often referred to on this list as the Giant Kernel Lock(tm) then the following should generally apply. When dealing with data structures there are often areas of code that must run with out any possibility of being interrupted or executing at the same time as another specific piece of code. By way of example, consider the simple act of incrementing a shared variable. In most modern RISC processors this is essentially 3 instructions: LOAD r2, #var load address of variable var into register r2 LOAD r1,r2 load the value stored in var into cpu register r1 ADD r1,r1,#1 add one to r1 and store result in r1 STORE r2,r1 store the incremented value of var back into memory Suppose that this simple 4 instruction code sequence was interrupted after the second load and before the add, and sometime later execution resumes at the add instruction. Lets first assume that we are on a uniprocessor machine, in which case there are only 2 ways this sequence of instruction can stop before that add and come back later. 1) if we are in user mode then a page boundary can occur between the load and the add, and this could lead to a page fault exception (this would be a page fault on the instruction read for the add). This cannot occur in kernel mode since if we were in kernel mode we would be inside the kernel, and the kernel wires down all of it's code pages (this might change in the future but probably only for pages that can be guaranteed will never be executed again like the startup splash screen code maybe). 2) an interrupt could occur such that the load completes, but the add does not start. This could be for a variety of reasons: perhaps a disk requests has finished, or maybe the sound card needs new data, or a clock interrupt signaling the end of a process quanta may occur. This interrupt may occur weather the code is running in user mode or in kernel mode. In either of the above 2 cases the processor typically saves the current context/state of the processor, switches to and interrupt or excerption context/state, and goes off to execute either a page fault exception handler or an interrupt handler as appropriate for the particular case. If in kernel mode at the time of the interrupt it restores the context and resumes where it was interrupted at the conclusion of the handler. If in user mode the processor may or may not resume executing the user process immediately after the interrupt is processed, but generally (baring a signal delivery for instance) when the process does resume execution it will restore the context/state and resume at the add. Now having examined how the example code could have it's execution interrupted, let us further suppose that the code that executes in between the load and the add also increments the same variable. If it happens to run a similar 4 instruction sequence then we can see that the value that get stored in memory doesn't reflect the desired effect of the 2 pieces of code and it appears as if the variable was incremented only once when it was intended to be incremented twice. If this happens to kernel code it can have catastrophic effects. To avoid this we must have some way of preventing the two pieces of code from intermingling their execution. In the FreeBSD (and all single threaded) kernels this is handled by locking out interrupts around these so called critical sections of code. So, for instance, if a variable was shared with the only network interrupt handler, the kernel would make a x=splnet() call before the second load and an splx(x) call after the store. Since the network interrupt is prevented from being serviced during between the spl calls the kernel is guaranteed that the load, add, store will proceed with out problems, and thus our problem is solved. We can see that this is so: the network interrupt which changes the variable has it's execution delayed until after the kernel updates the variable, and by assumption the other interrupts which may occur and won't be delayed, don't change the variable and thus don't present a problem. Now consider the multiprocessor case. We have the same problem we had with the single processor case, but now we also have a new variant. Suppose that both processors enter the kernel and both want to execute kernel code which will update a variable. If the second processor performs it's load after the same time as the first processor performs that add we again get improper results, and our spl mechanism for delaying interrupts doesn't help for 2 reasons: a) locking out interrupts usually occurs on only one processor, and thus the other processor could still take an interrupt that would lead to a conflict b) even if we assume that we could lock interrupts out of both processors, the 2 processors could both be executing the code as part of plain kernel code, a situation we could have on a uniprocessor machine. Some how we have got to prevent both processors from being in the kernel and executing these types of mutually exclusive code sections. If our kernel code starts out, as ours has, assuming it runs only on uniprocessor machines then usually only the variables and data structures shared with an interrupt are protected. If we then hack on second processor support we have to some how lock the data structures and global variables from simultaneous modification. A simple first order solution would be to lock out other processors from entering the kernel once one processor enters it. To do this we need only add code to start of the syscall to check for other another processor already being in the kernel and to wait for it to exit before we continue. This is the so called Giant Kernel Lock(tm) method since one lock locks the whole kernel code. This is easy to implement and incurs little overhead. If the processors spend most of their time executing user code, then they may only rarely have to wait to enter the kernel. 2 encryption processes may hardly ever enter the kernel, and may execute very efficiently. On the other hand, 2 process that read their pid in a loop might spend most of their time waiting, and thus seem to run almost as if there was only one processor. The alternative to the Giant Kernel Lock(tm) is so called fine grained locking wherein locking is pushed down closer to the data structures. In fine grained locking two processors might be executing in the kernel at the same time, but only if they didn't need the same resources. On might be doing a disk read while the other queues up a character for the serial port. The fine grained lock has the potential for higher parallelism and thus better throughput since process may not have to wait as long, but the larger number of locks with their many required lock and unlock operations add overhead and further the design is more difficult and error prone since the interaction of the numerous locks may result in deadlock or livelock situations every bit as problematical as the problem they try to solve. Well there you have it, I hope this answers some questions you had, I am sure I made a few mistakes here and there, but the overall problem is laid out for you and I am sure others will step in and correct me where I am mistaken. "Now you know, and knowing is half the battle " G.I. Joe, Sunday morning cartoons. -------- Christopher R. Bowman crb@ChrisBowman.com http://www.ChrisBowman.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message