From owner-freebsd-bugs  Fri Apr 18 10:38:17 1997
Return-Path: <owner-bugs>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id KAA20556
          for bugs-outgoing; Fri, 18 Apr 1997 10:38:17 -0700 (PDT)
Received: from nlsystems.com (nlsys.demon.co.uk [158.152.125.33])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id KAA20548
          for <freebsd-bugs@freefall.freebsd.org>; Fri, 18 Apr 1997 10:38:12 -0700 (PDT)
Received: from herring.nlsystems.com (herring.nlsystems.com [10.0.0.2])
	by nlsystems.com (8.8.5/8.8.5) with SMTP id SAA01297;
	Fri, 18 Apr 1997 18:38:04 +0100 (BST)
Date: Fri, 18 Apr 1997 18:38:04 +0100 (BST)
From: Doug Rabson <dfr@nlsystems.com>
To: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
cc: freebsd-bugs@freefall.freebsd.org
Subject: Re: kern/3304: NFS V2 readdir hangs
In-Reply-To: <199704181600.JAA13507@freefall.freebsd.org>
Message-ID: <Pine.BSF.3.95q.970418182954.428H-100000@herring.nlsystems.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-bugs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On Fri, 18 Apr 1997, Thomas David Rivers wrote:

> The following reply was made to PR kern/3304; it has been noted by GNATS.
> 
> From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
> To: ponds!lakes.water.net!rivers, ponds!khavrinen.lcs.mit.edu!wollman
> Cc: ponds!freefall.freebsd.org!freebsd-gnats-submit
> Subject: Re: kern/3304: NFS V2 readdir hangs
> Date: Fri, 18 Apr 1997 11:49:35 -0400 (EDT)
> 
>  More information...
>  
>  
>  Here's the scenario I've now determined (via more printf()s in the
>  kernel):
>  
>     1) nfs_request() is called from readdirrpc().
>  
>     2) nfs_request malloc's a nfsreq block, which is used
>        by rcvlock()... the lock is granted; we go down to
>        soreceive() and wind up tsleeping in sbwait().
>  
>     3) At this point, a vnode lookup() operation is called.
>        The lookup() isn't satisfied from the cache; so 
>        we call nfs_request() to get the information.
>  
>     4) This nfs_request() malloc's a different nfsreq block.
>        The "lock" is granted since rcvlock() works on addresses
>        from the nfsreq block; these are different addresses, the
>        lock is granted.  We wind down to soreceive()
>        again.
>  
>     5) udp_intr() is called because a UDP packet arrived...
>        this is, presumably, the packet we're expecting from 2).
>        *however* the last request we received was from 4).
>        That is the nfsreq this packet winds up being associated
>        with; but - it is totally wrong.  
>  

Nope.  The lock is done with flags from the struct nfsmount (flagp =
&rep->r_nmp->nm_flag).  This is shared by all the requests and nfsnodes on
the same mountpoint.  The code in nfs_reply is supposed to continue
looping until the reply for myrep is recieved.  If any other replies are
received, they are matched against the list of outstanding requests and
their owners will notice when they wake up and try to re-get the rcvlock.

>   So; we're left with the lookup() failing with a ENONENT (#2),
>  and the nfs_request from #2 hanging; never being woken up.
>  
>    I think that pretty well describes my findings.

I really need a packet trace to try and get a picture of what is
happening here.  Could you run 'tcpdump -vv -s300' on a third machine and
send me the trace.

>  
>    Perhaps the rcvlock() needs to change to lock on something other
>  than the nfsreq block... does anyone have any suggestions?

As mentioned above, the lock is shared by all requests on the same mount
point.

--
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 181 951 1891