Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Apr 1997 14:59:53 -0400 (EDT)
From:      Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
To:        ponds!nlsystems.com!dfr, ponds!lakes.water.net!rivers
Cc:        ponds!freefall.cdrom.com!freebsd-bugs
Subject:   Re: kern/3304: NFS V2 readdir hangs
Message-ID:  <199704211859.OAA02323@lakes.water.net>

next in thread | raw e-mail | index | archive | help

> 
> What appears to be happening is that numb is making a 4096byte sized
> readdir request for the first block of the large directory.  You can see
> this in the trace as request id b6cff051 (btw. you may find it useful to
> grep the log for nfs to separate the wood from the trees; next time we
> should add 'port nfs' to the tcpdump command).  The reply is sent but for
> some reason it never makes it into sorecieve.
> 
> You can see that numb retries the request with the same xid several times
> but never receives the reply.  My guess is that something between numb and
> sundog has corrupted the packet and it is failing the checksum in
> udp_input.  What we need to do is find out how far up the protocol stack
> the packet goes.  I suggest adding printfs to udp_input and ip_input where
> they drop packets with bad checksums (line 154 in udp_usrreq.c).  You
> should also be able to see it with 'netstat -p udp' and 'netstat -p ip'.

 Here's the output of those netstat commands:

Script started on Mon Apr 21 14:11:18 1997
# netstat -p udp
udp:
	129 datagrams received
	0 with incomplete header
	0 with bad data length field
	0 with bad checksum
	0 dropped due to no socket
	13 broadcast/multicast datagrams dropped due to no socket
	5 dropped due to full socket buffers
	0 not for hashed pcb
	111 delivered
	116 datagrams output
# netstat -p ip
ip:
	180 total packets received
	0 bad header checksums
	0 with size smaller than minimum
	0 with data size < data length
	0 with header length < data size
	0 with data length < header length
	0 with bad options
	0 with incorrect version number
	15 fragments received
	0 fragments dropped (dup or out of space)
	0 fragments dropped after timeout
	5 packets reassembled ok
	130 packets for this host
	0 packets for unknown/unsupported protocol
	0 packets forwarded
	40 packets not forwardable
	0 redirects sent
	116 packets sent from this host
	0 packets sent with fabricated ip header
	0 output packets dropped due to no bufs, etc.
	0 output packets discarded due to no route
	0 output datagrams fragmented
	0 fragments created
	0 datagrams that can't be fragmented
# exit

Script done on Mon Apr 21 14:11:25 1997

No checksum problems - but I do notice the "5 dropped due to socket full
buffers" line... could that be the reason?...

> 
> You might also try this (untested) hack which should limit readdirs to
> smaller bites:
> 
> Index: nfs_vfsops.c
> ===================================================================
> RCS file: /home/smp/sys/nfs/nfs_vfsops.c,v
> retrieving revision 1.1.1.5
> diff -u -r1.1.1.5 nfs_vfsops.c
> --- nfs_vfsops.c	1997/04/18 07:09:39	1.1.1.5
> +++ nfs_vfsops.c	1997/04/21 17:19:58
> @@ -748,6 +748,7 @@
>  	}
>  	if (nmp->nm_readdirsize > maxio)
>  		nmp->nm_readdirsize = maxio;
> +	nmp->nm_readdirsize = 1024; /* XXX */
>  
>  	if ((argp->flags & NFSMNT_MAXGRPS) && argp->maxgrouplist >= 0 &&
>  		argp->maxgrouplist <= NFS_MAXGRPS)
> 

 Yes! - this particular change does work-around the problem.  I'm
able to run my "ls -lR" and have it complete successfully [although,
there are some strange 'lags' every now and then...]  it does work.
I've been running it continuously for a few minutes now; no hangs...

 Now - a good question, which you asked,  is why are those packets 
getting blocked?

 Also, another question I have is why did this work with 2.1.5 - did
it always have a lower readdirsize; or is another problem in 2.2.1 simply
masked by lowering the readdirsize?

 I'm happy to investigate this further - and *overjoyed* that NFS
seems to be working for me...  let me know what I can do at this end.

	 - Thanks! -
	- Dave Rivers -



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199704211859.OAA02323>