From owner-freebsd-smp  Thu Aug 14 12:45:15 1997
Return-Path: <owner-freebsd-smp>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id MAA08411
          for smp-outgoing; Thu, 14 Aug 1997 12:45:15 -0700 (PDT)
Received: from Ilsa.StevesCafe.com (Ilsa.StevesCafe.com [205.168.119.129])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id MAA08403
          for <smp@freebsd.org>; Thu, 14 Aug 1997 12:45:10 -0700 (PDT)
Received: from Ilsa.StevesCafe.com (localhost [127.0.0.1])
	by Ilsa.StevesCafe.com (8.8.6/8.8.5) with ESMTP id NAA02468;
	Thu, 14 Aug 1997 13:44:34 -0600 (MDT)
Message-Id: <199708141944.NAA02468@Ilsa.StevesCafe.com>
X-Mailer: exmh version 2.0gamma 1/27/96
From: Steve Passe <smp@csn.net>
To: Poul-Henning Kamp <phk@critter.dk.tfs.com>
cc: Poul-Henning Kamp <phk@dk.tfs.com>, smp@freebsd.org
Subject: Re: bug ? 
In-reply-to: Your message of "Thu, 14 Aug 1997 21:05:26 +0200."
             <3668.871585526@critter.dk.tfs.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 14 Aug 1997 13:44:34 -0600
Sender: owner-freebsd-smp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Poul,

> >> this is from sys/kern/kern_lock.c, isn't line 76 & 77 bogus ?
> >> 
> >>    64 |  */
> >>    65 | int lock_wait_time = 100;
> >>    66 | #define PAUSE(lkp, wanted)                                          
> >    \
> >>    67 |                 if (lock_wait_time > 0) {                           
> >    \
> >>    68 |                         int i;                                      
> >    \
> >>    69 |                                                                     
> >    \
> >>    70 |                         simple_unlock(&lkp->lk_interlock);          
> >    \
> >>    71 |                         for (i = lock_wait_time; i > 0; i--)        
> >    \
> >>    72 |                                 if (!(wanted))                      
> >    \
> >>    73 |                                         break;                      
> >    \
> >>    74 |                         simple_lock(&lkp->lk_interlock);            
> >    \
> >>    75 |                 }                                                   
> >    \
> >>    76 |                 if (!(wanted))                                      
> >    \
> >>    77 |                         break;
> >>    78 | 
> >>    79 | #else /* NCPUS == 1 */
> >
> >I think it's necessary.  In a multiprocessor system there is a chance
> >another CPU could grab the lock between lines 73 & 74.  So line 76 is a
> >test to guarantee that its still "!(wanted)" after re-acquiring the
> >simple_lock.  There is also the possibility that you timeout without getting
> >"wanted", line 76 deals with this.
> 
> Ok, but what does the "break" break out of ?  

First, I haven't spent any time reading this section of code bwfore today,
so take anything I say as 'probably'.  Second, I once was of the impression
that the lite2 lockmanager code was well tested, but I now have my doubts
about that so we need to look at everything in there with some suspicion.

What "break" breaks out of obviosly depends on where the macro is used, but
to use one example, look at:

#define ACQUIRE(lkp, error, extflags, wanted)				\
	PAUSE(lkp, wanted);						\
	for (error = 0; wanted; ) {					\
		(lkp)->lk_waitcount++;					\
		simple_unlock(&(lkp)->lk_interlock);			\
		error = tsleep((void *)lkp, (lkp)->lk_prio,		\
		    (lkp)->lk_wmesg, (lkp)->lk_timo);			\
		simple_lock(&(lkp)->lk_interlock);			\
		(lkp)->lk_waitcount--;					\
		if (error)						\
			break;						\
		if ((extflags) & LK_SLEEPFAIL) {			\
			error = ENOLCK;					\
			break;						\
		}							\
	}

-
here the 'break' bypasses the code which goes on to sleep on the lock.
In other words, first we attempt to obtain the complex lock conditions
we want (ie, wanted).  if we get that within the PAUSE section we break out
of whatever upper block contains the ACQUIRE/PAUSE macro.  If we dont
get the lock, we sleep on it in the following "for (error = 0; wanted; )
{ ..." section. 

---
> That's where it looks bogus to me...
> 
> I also don't understand why you don't check at the top, but always
> take a simple_unlock/simple_lock pair, even in the best case situation..

I also can't see any reason why the check isn't done 1st, but as I said I
am very unfamiliar with this section of code right now so I couldn't say 
whether this is corret or incorrect with any certainity.  However it looks
like its only an inefficiency, as oppossed to a flaw in the logic, so I would
suggest leaving it alone for now till we get the lock manager working.  Right
now the kernel panics almost immediately if the lockmanager is turned on
in the SMP kernel.

--
Steve Passe	| powered by
smp@csn.net	|            Symmetric MultiProcessor FreeBSD