From owner-freebsd-bugs Fri May 12 02:36:11 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id CAA20537 for bugs-outgoing; Fri, 12 May 1995 02:36:11 -0700 Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.34]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id CAA20495 for ; Fri, 12 May 1995 02:35:39 -0700 Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.9/8.6.9) id TAA20816; Fri, 12 May 1995 19:32:58 +1000 Date: Fri, 12 May 1995 19:32:58 +1000 From: Bruce Evans Message-Id: <199505120932.TAA20816@godzilla.zeta.org.au> To: maddox@CS.Berkeley.EDU, ylo@fx7.cs.hut.fi Subject: Re: i386/395: CRITICAL PROBLEM: spl functions implemented incorrectly Cc: freebsd-bugs@freefall.cdrom.com, maddox@redwood.CS.Berkeley.EDU Sender: bugs-owner@FreeBSD.org Precedence: bulk >You recently reported a bug in "spl.h" : >> ... >> This means that the compiler has full knowledge about the >> semantics, side-effects, and dependencies of these functions. >> The compiler inlines these functions, and can mix (reorder) the >> resulting code with other code in the function that calls the >> spl function. No it can't. `static inline' functions have the same semantics as ordinary static functions. They aren't equivalent to macros. Reordering is severely limited by the C Standard. The ISO version section 5.1.2.3 says: "Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of these operations are all _side effects_. At ... _sequence points_, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place". Note that static [inline] functions are not special. spl*() all modify the (non-volatile) object `cpl', so the above applies to them. Making `cpl' volatile wouldn't affect this and would be wrong (it isn't volatile as far as C routines can tell since interrupt handlers preserve it). > ... >>Fix: > ... >> i386/include/spl.h: >> - Move the GENSPL macro and its uses, and spl0 and splx to >> i386/386/sys_machdep.c. Edit the macro and functions to >> make them normal non-static, non-inline functions. > .... This makes optimizations harder but doesn't stop them. The linker is allowed to do global optimizations. The standard also allows entire expressions, function calls and even accesses to volatile objects to be optimized away if it can be shown that the side effects are not needed. This is the usual "as if" rule explictly stated. The current implementation of spl*() only relies on the compiler not being able to show if the assignments to cpl are neeeded. If it can show that then it is broken :-) (they are needed for interrupt handling). It would take a global optimizer noticing that there are no Standard C signal handlers in the kernel to think it can prove that any global variable is not needed. When we have a global optimizer then this will be easy to fix by using `cpl' in a dummy signal handler. The average device driver probably depends on non-reordering in many other places, but probably has more bugs from not declaring enough variables as volatile. >I think there is another potential problem lurking in spl.h, in the following: > /* > * ipending has to be volatile so that it is read every time it is accessed > * in splx() and spl0(), but we don't want it to be read nonatomically when > * it is changed. Pretending that ipending is a plain int happens to give > * suitable atomic code for "ipending |= constant;". > */ > #define setdelayed() (*(unsigned *)&ipending |= loadandclear(&idelayed)) > #define setsoftast() (*(unsigned *)&ipending |= SWI_AST_PENDING) >... >I assume that the remark above regarding "suitable atomic code" means >that the bits are set in place in memory, using a single instruction. >Just because GCC happens to do this today doesn't mean it will tomorrow. Yes. I'm willing to examine the generated code. I would be more concerned about this if all the drivers were careful about it. >As the internal execution mechanisms of new-generation x86 processors become >more heavily parallel and RISC-like, instruction sequences involving >architecturally >unnecessary register temporaries may turn out to be faster. I wouldn't bet on It already has the same speed on a 486 (one cycle to load, 1 to OR, 1 to store), but it clobbers a register, so ORing directly into memory is the best possible on i386's and i486's. >the stability >of a compiler's choice of instructions in future releases, and I doubt that >anyone is >going to check this each time a new version of GCC is integrated into FreeBSD. I always read the diffs for i386.md. >I would suggest defining these functions as in-line assembly-language >functions. >In any case, this is the sort of thing that should be done in assembler, not C. I agree about this for setting `ipending', but not for spl*() on a uniprocessor. spl*() really can be written in C in this case. On multiprocessors many other things will break. The setting of `ipending' is modeled on the setting of `netisr' in : #define schednetisr(anisr) { netisr |= 1<<(anisr); setsoftnet(); } `netisr' is volatile, so this is guaranteed to go through a register and lose any bits that have been set by interrupt handlers. This may not be a problem in practice - it is safe iff schednetisr() is always done at ipl >= imp so that it can't be interrupted by net interrupt handlers. More care is required setting `ipending' since interrupts can occur, e.g., setsoftast() is potentially interruptable by setsofttty(). Bruce