From owner-freebsd-bugs Mon Apr 21 12:50:49 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id MAA09963 for bugs-outgoing; Mon, 21 Apr 1997 12:50:49 -0700 (PDT) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id MAA09953 for ; Mon, 21 Apr 1997 12:50:44 -0700 (PDT) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA20155; Mon, 21 Apr 1997 15:50:05 -0400 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Mon, 21 Apr 1997 15:50 EDT Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id OAA27131; Mon, 21 Apr 1997 14:53:27 -0400 (EDT) Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id OAA02323; Mon, 21 Apr 1997 14:59:53 -0400 (EDT) Date: Mon, 21 Apr 1997 14:59:53 -0400 (EDT) From: Thomas David Rivers Message-Id: <199704211859.OAA02323@lakes.water.net> To: ponds!nlsystems.com!dfr, ponds!lakes.water.net!rivers Subject: Re: kern/3304: NFS V2 readdir hangs Cc: ponds!freefall.cdrom.com!freebsd-bugs Content-Type: text Sender: owner-bugs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > What appears to be happening is that numb is making a 4096byte sized > readdir request for the first block of the large directory. You can see > this in the trace as request id b6cff051 (btw. you may find it useful to > grep the log for nfs to separate the wood from the trees; next time we > should add 'port nfs' to the tcpdump command). The reply is sent but for > some reason it never makes it into sorecieve. > > You can see that numb retries the request with the same xid several times > but never receives the reply. My guess is that something between numb and > sundog has corrupted the packet and it is failing the checksum in > udp_input. What we need to do is find out how far up the protocol stack > the packet goes. I suggest adding printfs to udp_input and ip_input where > they drop packets with bad checksums (line 154 in udp_usrreq.c). You > should also be able to see it with 'netstat -p udp' and 'netstat -p ip'. Here's the output of those netstat commands: Script started on Mon Apr 21 14:11:18 1997 # netstat -p udp udp: 129 datagrams received 0 with incomplete header 0 with bad data length field 0 with bad checksum 0 dropped due to no socket 13 broadcast/multicast datagrams dropped due to no socket 5 dropped due to full socket buffers 0 not for hashed pcb 111 delivered 116 datagrams output # netstat -p ip ip: 180 total packets received 0 bad header checksums 0 with size smaller than minimum 0 with data size < data length 0 with header length < data size 0 with data length < header length 0 with bad options 0 with incorrect version number 15 fragments received 0 fragments dropped (dup or out of space) 0 fragments dropped after timeout 5 packets reassembled ok 130 packets for this host 0 packets for unknown/unsupported protocol 0 packets forwarded 40 packets not forwardable 0 redirects sent 116 packets sent from this host 0 packets sent with fabricated ip header 0 output packets dropped due to no bufs, etc. 0 output packets discarded due to no route 0 output datagrams fragmented 0 fragments created 0 datagrams that can't be fragmented # exit Script done on Mon Apr 21 14:11:25 1997 No checksum problems - but I do notice the "5 dropped due to socket full buffers" line... could that be the reason?... > > You might also try this (untested) hack which should limit readdirs to > smaller bites: > > Index: nfs_vfsops.c > =================================================================== > RCS file: /home/smp/sys/nfs/nfs_vfsops.c,v > retrieving revision 1.1.1.5 > diff -u -r1.1.1.5 nfs_vfsops.c > --- nfs_vfsops.c 1997/04/18 07:09:39 1.1.1.5 > +++ nfs_vfsops.c 1997/04/21 17:19:58 > @@ -748,6 +748,7 @@ > } > if (nmp->nm_readdirsize > maxio) > nmp->nm_readdirsize = maxio; > + nmp->nm_readdirsize = 1024; /* XXX */ > > if ((argp->flags & NFSMNT_MAXGRPS) && argp->maxgrouplist >= 0 && > argp->maxgrouplist <= NFS_MAXGRPS) > Yes! - this particular change does work-around the problem. I'm able to run my "ls -lR" and have it complete successfully [although, there are some strange 'lags' every now and then...] it does work. I've been running it continuously for a few minutes now; no hangs... Now - a good question, which you asked, is why are those packets getting blocked? Also, another question I have is why did this work with 2.1.5 - did it always have a lower readdirsize; or is another problem in 2.2.1 simply masked by lowering the readdirsize? I'm happy to investigate this further - and *overjoyed* that NFS seems to be working for me... let me know what I can do at this end. - Thanks! - - Dave Rivers -