From owner-freebsd-smp Thu Aug 14 12:45:15 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id MAA08411 for smp-outgoing; Thu, 14 Aug 1997 12:45:15 -0700 (PDT) Received: from Ilsa.StevesCafe.com (Ilsa.StevesCafe.com [205.168.119.129]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA08403 for ; Thu, 14 Aug 1997 12:45:10 -0700 (PDT) Received: from Ilsa.StevesCafe.com (localhost [127.0.0.1]) by Ilsa.StevesCafe.com (8.8.6/8.8.5) with ESMTP id NAA02468; Thu, 14 Aug 1997 13:44:34 -0600 (MDT) Message-Id: <199708141944.NAA02468@Ilsa.StevesCafe.com> X-Mailer: exmh version 2.0gamma 1/27/96 From: Steve Passe To: Poul-Henning Kamp cc: Poul-Henning Kamp , smp@freebsd.org Subject: Re: bug ? In-reply-to: Your message of "Thu, 14 Aug 1997 21:05:26 +0200." <3668.871585526@critter.dk.tfs.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 14 Aug 1997 13:44:34 -0600 Sender: owner-freebsd-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Poul, > >> this is from sys/kern/kern_lock.c, isn't line 76 & 77 bogus ? > >> > >> 64 | */ > >> 65 | int lock_wait_time = 100; > >> 66 | #define PAUSE(lkp, wanted) > > \ > >> 67 | if (lock_wait_time > 0) { > > \ > >> 68 | int i; > > \ > >> 69 | > > \ > >> 70 | simple_unlock(&lkp->lk_interlock); > > \ > >> 71 | for (i = lock_wait_time; i > 0; i--) > > \ > >> 72 | if (!(wanted)) > > \ > >> 73 | break; > > \ > >> 74 | simple_lock(&lkp->lk_interlock); > > \ > >> 75 | } > > \ > >> 76 | if (!(wanted)) > > \ > >> 77 | break; > >> 78 | > >> 79 | #else /* NCPUS == 1 */ > > > >I think it's necessary. In a multiprocessor system there is a chance > >another CPU could grab the lock between lines 73 & 74. So line 76 is a > >test to guarantee that its still "!(wanted)" after re-acquiring the > >simple_lock. There is also the possibility that you timeout without getting > >"wanted", line 76 deals with this. > > Ok, but what does the "break" break out of ? First, I haven't spent any time reading this section of code bwfore today, so take anything I say as 'probably'. Second, I once was of the impression that the lite2 lockmanager code was well tested, but I now have my doubts about that so we need to look at everything in there with some suspicion. What "break" breaks out of obviosly depends on where the macro is used, but to use one example, look at: #define ACQUIRE(lkp, error, extflags, wanted) \ PAUSE(lkp, wanted); \ for (error = 0; wanted; ) { \ (lkp)->lk_waitcount++; \ simple_unlock(&(lkp)->lk_interlock); \ error = tsleep((void *)lkp, (lkp)->lk_prio, \ (lkp)->lk_wmesg, (lkp)->lk_timo); \ simple_lock(&(lkp)->lk_interlock); \ (lkp)->lk_waitcount--; \ if (error) \ break; \ if ((extflags) & LK_SLEEPFAIL) { \ error = ENOLCK; \ break; \ } \ } - here the 'break' bypasses the code which goes on to sleep on the lock. In other words, first we attempt to obtain the complex lock conditions we want (ie, wanted). if we get that within the PAUSE section we break out of whatever upper block contains the ACQUIRE/PAUSE macro. If we dont get the lock, we sleep on it in the following "for (error = 0; wanted; ) { ..." section. --- > That's where it looks bogus to me... > > I also don't understand why you don't check at the top, but always > take a simple_unlock/simple_lock pair, even in the best case situation.. I also can't see any reason why the check isn't done 1st, but as I said I am very unfamiliar with this section of code right now so I couldn't say whether this is corret or incorrect with any certainity. However it looks like its only an inefficiency, as oppossed to a flaw in the logic, so I would suggest leaving it alone for now till we get the lock manager working. Right now the kernel panics almost immediately if the lockmanager is turned on in the SMP kernel. -- Steve Passe | powered by smp@csn.net | Symmetric MultiProcessor FreeBSD