Date: Sun, 10 Jan 2010 16:36:18 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Mikolaj Golub <to.my.trociny@gmail.com> Cc: freebsd-fs@FreeBSD.org Subject: Re: FreeBSD NFS client/Linux NFS server issue Message-ID: <Pine.GSO.4.63.1001101623540.4616@muncher.cs.uoguelph.ca> In-Reply-To: <86ocl272mb.fsf@kopusha.onet> References: <86ocl272mb.fsf@kopusha.onet>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 10 Jan 2010, Mikolaj Golub wrote: > > For one of the incident we were tcpdumping "problem" NFS connection for about > 1 hour and during this hour an activity was observed only once: > > 08:20:38.281422 IP (tos 0x0, ttl 64, id 56110, offset 0, flags [DF], proto TCP (6), length 140) 172.30.10.27.344496259 > 172.30.10.121.2049: 88 access fh[1:9300:10df8001] 003f > 08:20:38.281554 IP (tos 0x0, ttl 64, id 26624, offset 0, flags [DF], proto TCP (6), length 52) 172.30.10.121.2049 > 172.30.10.27.971: ., cksum 0xca5e (correct), 89408667:89408667(0) ack 1517941890 win 46 <nop,nop,timestamp 901975640 111169517> > > The client sent rpc ACCESS request for root exported inode, received tcp ack > response (so tcp connection was ok) but did not receive any RPC reply from the > server. > > So it looks like the problem on NFS server side. But for me it looks a bit > strange that freebsd client is sending rpc packets so rarely. Shouldn't it > retransmit them more frequently? For another incident we monitored tcp > connection for 4 minutes and did not see any packets then. Unfortunately we > can't run tcpdumping long time as these are production servers and we need to > reboot hosts to restore normal operations. > For NFSv3 over TCP, there was no RFC specification, so client behaviour when the server failed to reply to an RPC was essentially undefined. (For NFSv4, a client isn't allowed to retry a non-NULL RPC on the same TCP connection and a server is expected to reply to all RPCs received on the connection or do a disconnect, but that's NFSv4 not NFSv3.) I think the new krpc in FreeBSD8 does to a slow timeout on RPCs over TCP for NFSv3 and eventually does a retry, but I didn't write the code, so I'm not absolutely sure. (I'll try and remember to take a look, or maybe dfr can comment?) However, this krpc code isn't used for FreeBSD7. Bottom line is I don't think the client does a retry until it sees the TCP connection break and if the server isn't replying to the RPC nor disconnecting the TCP connection, it'll be stuck as you describe. I think you have three choices: 1 - Fix the NFS server so that it does reply or disconnects, if that is possible. (I have no idea if the Linux NFS server can be reconfigured?) 2 - Switch to using UDP (which will retry RPCs when no reply is received). 3 - Try a FreeBSD8 system and see if it works ok, then upgrade if that's practical? rick ps: As an historical note, I think I implemented NFS over TCP before anyone else and assumed that a server would reply to all RPC requests, so retries at the RPC level wouldn't be necessary. Others, like Sun, implemented NFS over TCP with RPC timeout/retries and then slowly came over to my way of thinking, but it wasn't spelled out until NFSv4.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1001101623540.4616>