Date: Mon, 11 Mar 2013 21:25:45 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Garrett Wollman <wollman@hergotha.csail.mit.edu> Cc: freebsd-net@freebsd.org, andre@freebsd.org Subject: Re: Limits on jumbo mbuf cluster allocation Message-ID: <22122027.3796089.1363051545440.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201303111605.r2BG5I6v073052@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Garrett Wollman wrote: > In article <513DB550.5010004@freebsd.org>, andre@freebsd.org writes: > > >Garrett's problem is receive side specific and NFS can't do much > >about it. > >Unless, of course, NFS is holding on to received mbufs for a longer > >time. The NFS server only holds onto receive mbufs until it performs the RPC requested. Of course, if the server hits its load limit, there will then be a backlog of RPC requests --> the received mbufs for these requests will be held for a longer time. To be honest, I'd consider seeing a lot of non-empty receive queues for TCP connections to the NFS server to be an indication that it is near/at its load limit. (Sure, if you do netstat a lot, you will occasionally see a non-empty queue here or there, but I would not expect to see a lot of them non-empty a lot of the time.) If that is the case, then the question becomes "what is the bottleneck?". Below I suggest getting rid of the DRC in case it is the bottleneck for your server. > > Well, I have two problems: one is running out of mbufs (caused, we > think, by ixgbe requiring 9k clusters when it doesn't actually need > them), and one is livelock. Allowing potentially hundreds of clients > to queue 2 MB of requests before TCP pushes back on them helps to > sustain the livelock once it gets started, and of course those packets > will be of the 9k jumbo variety, which makes the first problem worse > as well. > The problem for the receive side is "how small should you make it?". Suppose we have the following situation: - only one client is active and it is flushing writes for a large file written into that client's buffer cache. --> If you set the receive size so that it is just big enough for one write, then the client will end up doing: - send one write, wait a long while for the NFS_OK reply - send the next write, wait a long while for the NFS_OK reply and so on --> the write back will take a long time, even though no other client is generating load on the server. --> the user for this client won't be happy If you make the receive side large enough to handle several Write requests, then the above works much faster, however... - the receive size is now large enough to accept many many other RPC requests (a Write request is 64Kbytes+ however Read requests are typically less than 100bytes) Even if you set the receive size to the minimum that will handle one Write request, that will allow the client to issue something like 650 Read requests. Since NFS clients wait for replies to the RPC requests they send, they will only queue so many requests before sending no more of them until they receive some replies. This does delay the "feedback" somewhat, but I'd argue that buffering of requests in the server's receive queue helps when clients generate bursts of requests on a server that is well below its load limit. Now, I'm not sure I understand what you mean by "livelock"? A - Do you mean that the server becomes unresponsive and is generating almost no RPC replies, with all the clients are reporting "NFS server not responding"? or B - Do you mean that the server keeps responding to RPCs at a steady rate, but that rate is slower than what the clients (and their users) would like to see? If it is B, I'd just consider that as hitting the server's load limit. For either A or B, I'd suggest that you disable the DRC for TCP connections (email if you need a patch for that), which will have a couple of effects: 1 - It will avoid the DRC from defining the server's load limit. (If the DRC is the server's bottleneck, this will increase the server's load limit to whatever else is the next bottleneck.) 2 - If the mbuf clusters held by the DRC are somehow contributing to the mbuf cluster allocation problem for the receive side of the network interface, this would alleviate that. (I'm not saying it fixes the problem, but might allow the server to avoid it under the driver guys come up with a good solution for it.) rick > -GAWollman > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?22122027.3796089.1363051545440.JavaMail.root>