Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Mar 2015 00:21:33 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Tim Borgeaud <timothy.borgeaud@framestore.com>
Cc:        freebsd-net@freebsd.org, Mark Hills <mark.hills@framestore.com>
Subject:   Re: A defensive NFS server (sbwait, flow control)
Message-ID:  <2143160693.12216521.1426220493289.JavaMail.root@uoguelph.ca>
In-Reply-To: <CADqOPxsAeViRBJ5a6z2LodikKx1EqE_Na7QsUF43tXX8K3PCFQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Tim Borgeaud wrote:
> Hi again FreeBSD folks,
> 
> A short while ago I sent a couple of emails regarding the idea of
> 'fair
> share' NFS scheduling. Amongst others, Garrett Wollman replied, and
> also
> sent a related email "Implementing backpressure in the NFS server".
> The
> common theme: to provide a level of service in situations where
> requests
> from some clients might tie up a lot of resources.
> 
> There are various issues to consider. We might say that we're looking
>  at
> 'defensive' functionality, and we've made some experimental progress
> to:
> 
>   round robin requests from different users
> 
>   provide some limits (flow control) on the number of requests read
>   from a single transport
> 
> Testing has highlighted the issue that Garrett mentioned. A client
> can make
> a set of concurrent requests and then tie up nfsd threads as they
> attempt
> to send replies.
> 
> To be more specific, we seem to be hitting the situation in which an
> nfsd
> thread sits in the "sbwait" state, waiting for a reply to be drained
> (by a
> suspended client). Other threads subsequently pick up requests for
> the same
> transport and then queue up waiting for a lock on the same socket.
> 
> I'm not sure of the exact situation in which the sbwait arises. It's
> easily
> repeatable only where there are multiple concurrent requests from the
> same
> transport.
> 
> Our testing is fairly synthetic, but it looks like this is a
> situation that
> has been noticed before. Having a pool of spare threads doesn't seem
> like a
> very robust solution. In fact, if more than the server minimum get
> tied up,
> then, if load and threads fall, we end up with _all_ remaining
> threads
> blocked (no more nfs service).
> 
I will note that kernel nfsd threads are pretty cheap. Kostik noted
that the main resource they hold are their kernel stacks. For a 64bit
system (which I'd expect most NFS servers to be), I think you can
afford to have lots of them. (I think the 256 limit should be bumped
up quite a bit, since that limit seemed appropriate for a single core
i386 and doesn't make much sense for a multicore 64bit system.)

I haven't heard more from Garrett, but he seemed to have resolved
his case by bumping the thread limit to 256.
If you are doing testing, make sure to have this set to 256 (-n 256)
argument to the nfsd. Any less doesn`t make much sense for a heavily
loaded server. (If you want to test with more, just email and I can
show you the one line source patch to increase it beyond 256.)
There will always be folk that say running too many is a waste, but
it is not much of a waste and you can set a minimum and maximum instead
of just ``-n 256`` and allow the / of them fluctuate.

I think the challenge w.r.t. any scheme that limits resources for
a single client is that there will be cases where only one NFS client
(or a few) will be sending RPCs to the server at a given time and
it seems unfair to limit the client in that case.
You might envision a system that tracks "number of active clients",
but this will be difficult, since NFS traffic tends to be very
bursty (and, as such, number of active clients can vary dramatically).

> How to address this particular issue is not obvious to me. I assume
> there
> are options including:
> 
>  - prevent thread blocking when sending replies
> 
>  - timeouts for sending replies (NFS or RPC level?)
> 
Can`t drop requests when busy. Sending an NFSERR_DELAY is possible
for NFSv3 and NFSv4 (it doesn`t exist for NFSv2, if you care).
It won`t be easy to modify the code to do this for an NFS RPC in
progress, since it will be buried in VFS calls or something like
that. The strategy would have to select certain RPC requests for
the NFSERR_DELAY reply before processing the RPC (ie. at the
beginning).

>  - serialize the sending of nfs/rpc replies to avoid multiple
>    nfsd threads waiting on the same transport.
> 
This will be a big performance hit when only a few clients are active
at a given time. Maybe N nfsd threads per TCP connection, but you still
run into how big to make the N. (Just easier to create lots of nfsd threads.
At some point the server will run out of other resources before all the
nfsd threads are busy. A 64bit system dedicated to being an NFS server
could have thousands of these, I think.
(I know you said that you did not think this was a robust way to
 handle the problem, but since they are pretty lightweight, I do not
 see a big problem having lots of them.)

> Does anyone have any thoughts about this? Either this particular
> issue or
> more general direction for a 'defensive' server?
> 
What you have to be wary of here is that NFS servers are expected
to try `very hard` to reply to RPCs. There is the NFSERR_DELAY reply
that can be sent to tell a client to try again later, but this should
probably only be done if the server can`t generate a ``real reply``.
(As above, the NFS server cannot just drop requests at least over TCP,
 since the client will wait a long time before retrying an RPC. Also,
 for NFSv4, the retry must be done on a new TCP connection. Not something
 that you want to have happening often.)

When a server is overloaded (which is about the only time you should
see a large queue of outstanding RPCs), it might be possible to implement
a strategy where clients get NFSERR_DELAY replies in a round-robin
fashion, but the coding won`t be trivial.

If you are willing to pay a performance penalty when only a few clients
a active, you can simply limit the system to N nfsd threads to each
client (probably actually TCP connections, since there isn`t really
a concept of a client for NFSv3).

I did like the suggestion that the NFS server not accept RPC requests
when reply(s) on that TCP connection are stuck in sbwait. I haven`t
looked to see what implementing that would take.

rick
ps: freebsd-fs or freebsd-current might be better mailing lists for
    NFS stuff than freebsd-net maybe.

> --
> Tim Borgeaud
> Systems Developer
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2143160693.12216521.1426220493289.JavaMail.root>