Date: Wed, 11 Jun 2003 00:46:03 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Jason Stone <freebsd-performance@dfmm.org> Cc: freebsd-performance@freebsd.org Subject: Re: Slow disk write speeds over network Message-ID: <3EE6DE3B.8B8EE11B@mindspring.com> References: <20030610150502.H14379-100000@walter>
next in thread | previous in thread | raw e-mail | index | archive | help
Jason Stone wrote: > > You haven't said if you were using UDP or TCP for the mounts; you > > should definitely use TCP with FreeBSD NFS servers; it's also just > > generally a good idea, since UDP frags act as a fixed non-sliding > > window: NFS over UDP sucks. > > Huh. I thought that the conventional wisdom was that on a local network > with no packet loss (and therefore no re-transmission penalties), udp was > way faster because the overhead was so much less. > > Sorry if this seems like a pretty basic question, but can you explain > this? Sure: 1) There is no such thing as no packet loss. 2) The UDP packets are reassembled in a reassembly queue on the receiver. While this is happening, you can only have one datagram outstanding at a time. With TCP, you get a sliding window; with UDP, you stall waiting for the reassembly, effectively giving you a non-sliding window (request/response, with round trip latencies per packet, instead of two of them amortized across a 100M file transfer). 3) When a packet is lost, the UDP retransmit code is rather crufty. It resends the whole series of packets, and you eat the overhead for that. TCP, on the other hand, can do selective acknowledgement, or, if it's not supported by both ends, it can at least acknowledge the packets that did get through, saving you a retransmit. 4) FreeBSD's UDP fragment reassembly buffer code is well known to pretty much suck. This is true of most UDP fragment reassembly code in the universe, however, and is not that specific to FreeBSD. So sending UDP packets that get fragged because they're larger than your MTU is not a very clever way of achieving a fixed window size larger than the MTU (see also #2, above, for why you do not want to used an effectively fixed window protocol anyway). Even if there were no packet loss at all with UDP, unless all your data is around the size of one rsize/wsize/packet, the combined RTT overhead for even a moderately large number of packets in a single run is enough to trigger the amortized cost of the additional TCP overhead being lower than the UDP overhead from the latency. Depending on your hardware (switch latency, half duplex, etc.), you could also be talking about a significant combined bandwidth delay product. Now add to all this that you have to send explicit ACKs with UDP, while you can use piggy-back ACKs on the return payloads for TCP. I think the idea that UDP was OK for nearly-lossless short-haul came about from people who couldn't code a working TCP NFS client. 8-). -- Terry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3EE6DE3B.8B8EE11B>