Date: Wed, 25 Feb 2015 17:08:22 -0500 From: Garrett Wollman <wollman@csail.mit.edu> To: freebsd-fs@freebsd.org Cc: freebsd-net@freebsd.org Subject: Implementing backpressure in the NFS server Message-ID: <21742.18390.976511.707403@khavrinen.csail.mit.edu>
next in thread | raw e-mail | index | archive | help
Here's the scenario: 1) A small number of (Linux) clients run a large number of processes (compute jobs) that read large files sequentially out of an NFS filesystem. Each process is reading from a different file. 2) The clients are behind a network bottleneck. 3) The Linux NFS client will issue NFS3PROC_READ RPCs (potentially including read-ahead) independently for each process. 4) The network bottleneck does not serve to limit the rate at which read RPCs can be issued, because the requests are small (it's only the responses that are large). 5) Even if the responses are delayed, causing one process to block, there are sufficient other processes that are still runnable to allow more reads to be issued. 6) On the server side, because these are requests for different file handles, they will get steered to different NFS service threads by the generic RPC queueing code. 7) Each service thread will process the read to completion, and then block when the reply is transmitted because the socket buffer is full. 8) As more reads continue to be issued by the clients, more and more service threads are stuck waiting for the socket buffer until all of the nfsd threads are blocked. 9) The server is now almost completely idle. Incoming requests can only be serviced when one of the nfsd threads finally manages to put its pending reply on the socket send queue, at which point it can return to the RPC code and pick up one request -- which, because the incoming queues are full of pending reads from the problem clients, is likely to get stuck in the same place. Lather, rinse, repeat. What should happen here? As an administrator, I can certainly increase the number of NFS service threads until there are sufficient threads available to handle all of the offered load -- but the load varies widely over time, and it's likely that I would run into other resource constraints if I did this without limit. (Is 1000 threads practical? What happens when a different mix of RPCs comes in -- will it livelock the server?) I'm of the opinion that we need at least one of the following things to mitigate this issue, but I don't have a good knowledge of the RPC code to have an idea how feasible this is: a) Admission control. RPCs should not be removed from the receive queue if the transmit queue is over some high-water mark. This will ensure that a problem client behind a network bottleneck like this one will eventually feel backpressure via TCP window contraction if nothing else. This will also make it more likely that other clients will still get their RPCs processed even if most service threads are taken up by the problem clients. b) Fairness scheduling. There should be some parameter, configurable by the administrator, that restricts the number of nfsd threads any one client can occupy, independent of how many requests it has pending. A really advanced scheduler would allow bursting over the limit for some small number of requests. Does anyone else have thoughts, or even implementation ideas, on this? -GAWollman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?21742.18390.976511.707403>