From owner-freebsd-net@FreeBSD.ORG Fri Mar 13 04:21:41 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2E261C3B for ; Fri, 13 Mar 2015 04:21:41 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id DAB6A353 for ; Fri, 13 Mar 2015 04:21:40 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CtBABrZAJV/95baINUB4NYWgSDB8BQCoUnSQKCAAEBAQEBAX2EDwEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIBwQBGgICAogGCA2yDZs2AQEBAQYBAQEBAQEBG4EhiXaEDwsFAgEHFDQHgmiBRQWUDINWgz05hUOMZyOCAhyBbCIxB3sBHwMffwEBAQ X-IronPort-AV: E=Sophos;i="5.11,392,1422939600"; d="scan'208";a="197069169" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 13 Mar 2015 00:21:33 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4A504B4041; Fri, 13 Mar 2015 00:21:33 -0400 (EDT) Date: Fri, 13 Mar 2015 00:21:33 -0400 (EDT) From: Rick Macklem To: Tim Borgeaud Message-ID: <2143160693.12216521.1426220493289.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: A defensive NFS server (sbwait, flow control) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-net@freebsd.org, Mark Hills X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 04:21:41 -0000 Tim Borgeaud wrote: > Hi again FreeBSD folks, > > A short while ago I sent a couple of emails regarding the idea of > 'fair > share' NFS scheduling. Amongst others, Garrett Wollman replied, and > also > sent a related email "Implementing backpressure in the NFS server". > The > common theme: to provide a level of service in situations where > requests > from some clients might tie up a lot of resources. > > There are various issues to consider. We might say that we're looking > at > 'defensive' functionality, and we've made some experimental progress > to: > > round robin requests from different users > > provide some limits (flow control) on the number of requests read > from a single transport > > Testing has highlighted the issue that Garrett mentioned. A client > can make > a set of concurrent requests and then tie up nfsd threads as they > attempt > to send replies. > > To be more specific, we seem to be hitting the situation in which an > nfsd > thread sits in the "sbwait" state, waiting for a reply to be drained > (by a > suspended client). Other threads subsequently pick up requests for > the same > transport and then queue up waiting for a lock on the same socket. > > I'm not sure of the exact situation in which the sbwait arises. It's > easily > repeatable only where there are multiple concurrent requests from the > same > transport. > > Our testing is fairly synthetic, but it looks like this is a > situation that > has been noticed before. Having a pool of spare threads doesn't seem > like a > very robust solution. In fact, if more than the server minimum get > tied up, > then, if load and threads fall, we end up with _all_ remaining > threads > blocked (no more nfs service). > I will note that kernel nfsd threads are pretty cheap. Kostik noted that the main resource they hold are their kernel stacks. For a 64bit system (which I'd expect most NFS servers to be), I think you can afford to have lots of them. (I think the 256 limit should be bumped up quite a bit, since that limit seemed appropriate for a single core i386 and doesn't make much sense for a multicore 64bit system.) I haven't heard more from Garrett, but he seemed to have resolved his case by bumping the thread limit to 256. If you are doing testing, make sure to have this set to 256 (-n 256) argument to the nfsd. Any less doesn`t make much sense for a heavily loaded server. (If you want to test with more, just email and I can show you the one line source patch to increase it beyond 256.) There will always be folk that say running too many is a waste, but it is not much of a waste and you can set a minimum and maximum instead of just ``-n 256`` and allow the / of them fluctuate. I think the challenge w.r.t. any scheme that limits resources for a single client is that there will be cases where only one NFS client (or a few) will be sending RPCs to the server at a given time and it seems unfair to limit the client in that case. You might envision a system that tracks "number of active clients", but this will be difficult, since NFS traffic tends to be very bursty (and, as such, number of active clients can vary dramatically). > How to address this particular issue is not obvious to me. I assume > there > are options including: > > - prevent thread blocking when sending replies > > - timeouts for sending replies (NFS or RPC level?) > Can`t drop requests when busy. Sending an NFSERR_DELAY is possible for NFSv3 and NFSv4 (it doesn`t exist for NFSv2, if you care). It won`t be easy to modify the code to do this for an NFS RPC in progress, since it will be buried in VFS calls or something like that. The strategy would have to select certain RPC requests for the NFSERR_DELAY reply before processing the RPC (ie. at the beginning). > - serialize the sending of nfs/rpc replies to avoid multiple > nfsd threads waiting on the same transport. > This will be a big performance hit when only a few clients are active at a given time. Maybe N nfsd threads per TCP connection, but you still run into how big to make the N. (Just easier to create lots of nfsd threads. At some point the server will run out of other resources before all the nfsd threads are busy. A 64bit system dedicated to being an NFS server could have thousands of these, I think. (I know you said that you did not think this was a robust way to handle the problem, but since they are pretty lightweight, I do not see a big problem having lots of them.) > Does anyone have any thoughts about this? Either this particular > issue or > more general direction for a 'defensive' server? > What you have to be wary of here is that NFS servers are expected to try `very hard` to reply to RPCs. There is the NFSERR_DELAY reply that can be sent to tell a client to try again later, but this should probably only be done if the server can`t generate a ``real reply``. (As above, the NFS server cannot just drop requests at least over TCP, since the client will wait a long time before retrying an RPC. Also, for NFSv4, the retry must be done on a new TCP connection. Not something that you want to have happening often.) When a server is overloaded (which is about the only time you should see a large queue of outstanding RPCs), it might be possible to implement a strategy where clients get NFSERR_DELAY replies in a round-robin fashion, but the coding won`t be trivial. If you are willing to pay a performance penalty when only a few clients a active, you can simply limit the system to N nfsd threads to each client (probably actually TCP connections, since there isn`t really a concept of a client for NFSv3). I did like the suggestion that the NFS server not accept RPC requests when reply(s) on that TCP connection are stuck in sbwait. I haven`t looked to see what implementing that would take. rick ps: freebsd-fs or freebsd-current might be better mailing lists for NFS stuff than freebsd-net maybe. > -- > Tim Borgeaud > Systems Developer > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >