From owner-freebsd-net@FreeBSD.ORG  Wed Feb 25 13:13:23 2015
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9D8B275A
 for <freebsd-net@freebsd.org>; Wed, 25 Feb 2015 13:13:23 +0000 (UTC)
Received: from mail-wg0-f54.google.com (mail-wg0-f54.google.com [74.125.82.54])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2FC7C65B
 for <freebsd-net@freebsd.org>; Wed, 25 Feb 2015 13:13:21 +0000 (UTC)
Received: by wghb13 with SMTP id b13so3582775wgh.0
 for <freebsd-net@freebsd.org>; Wed, 25 Feb 2015 05:13:20 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=1Xo1tN1ljy6yh1CaW6EUtSDVe8uk1trCfOJYU3DFUtw=;
 b=SqZ3XJWP8PyYJmbKqC60oDRVm6OpsNE+yeJF5cb0opCh/6uPyc+xVayX3i1To1xvpV
 O+NNYtyh5leSgweoz+sezcpu91WQbsI/ftNb5f2y8wTRxQLSHdTfyiQ9JeOZ+Lsx49+8
 UqrxkNkCenzZ7xnWZEWQ3Qlf6u0wgX5uovUCpptgMQFKeE/V25WXKCIlYGJUTw6V+vHp
 cqvWuGJwmtzYnBMrq/ihbenufxjs+qetzc0CZ2orObACLIuSq9AW8+AXpHW4eG5InerI
 MGeB6MwGwGYjwPW2NDRjFJajFXJkyMC3G9gS9r/DrEatrF1cxSmXIhGnyhPx1w7K4LO7
 B2PA==
X-Gm-Message-State: ALoCoQm75r+AQPDkgi7KwH7gpW0Y+10pkruQ07k/oh0QH1H8QwH+qvSEcXI7I7iS0ICzFFkE5Tkr
X-Received: by 10.180.72.98 with SMTP id c2mr40341637wiv.87.1424869999061;
 Wed, 25 Feb 2015 05:13:19 -0800 (PST)
MIME-Version: 1.0
Received: by 10.27.143.19 with HTTP; Wed, 25 Feb 2015 05:12:58 -0800 (PST)
In-Reply-To: <201502250244.t1P2iu6N094346@hergotha.csail.mit.edu>
References: <CADqOPxsJ4Sjt_u6+h5B8sFWFzOHQA28E69H0LnxxZg1UPeup7g@mail.gmail.com>
 <388835013.10159778.1424820357923.JavaMail.root@uoguelph.ca>
 <201502250244.t1P2iu6N094346@hergotha.csail.mit.edu>
From: Tim Borgeaud <timothy.borgeaud@framestore.com>
Date: Wed, 25 Feb 2015 13:12:58 +0000
Message-ID: <CADqOPxun6e0=bnYfH8cyHW89BPwT2xazOnOb7GSLdPvsciiNVg@mail.gmail.com>
Subject: Re: NFS: kernel modules (loading/unloading) and scheduling
To: Garrett Wollman <wollman@hergotha.csail.mit.edu>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
Cc: freebsd-fs@freebsd.org, freebsd-net@freebsd.org, rmacklem@uoguelph.ca,
 Mark Hills <mark.hills@framestore.com>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Feb 2015 13:13:23 -0000

Hi Rick, Garret & others,

Thanks for the replies and useful info. I may take a look at enabling some
kernel module reloading, but, if the usual approach is rebooting, I expect
that this won't be an issue.

Regarding the NFS functionality itself, I can give a bit more of an overview
of our (Framestore) NFS use and what NFS server functionality we are
considering.

We have NFS storage systems accessed by both an HPC cluster and also multiple
desktop users. Like many organizations, we want to have, and get the most
from, high performance NFS storage, in terms of both total IO and low
latency.

One of the demands that we face is to allow users, as nodes of the cluster or
more interactive desktops, to be provided with a reasonable level of service
from heavily loaded systems. We do want to allow "users" to apply high
load, but would like to avoid such requests from starving others.

We are considering scheduling of NFS RPCs such that both client transports
(xprt) and users themselves are given a fair share, including where a
single user is using multiple transports.

At the same time, we don't want hurt performance by loss of file handle
affinity or similar (though it may be quite nice to remove responsibility
from the RPC/NFS layer).

We've already prototyped a 'fair' schedule when servicing the RPC requests,
which appears to be an improvement. But, as Garrett pointed out, we may
have moved onto bottlenecks in sending replies and, possibly, reading
requests.

It may be that, with such cases as slow clients, overall performance could
also be improved.


--
Tim Borgeaud
Systems Developer

On 25 February 2015 at 02:44, Garrett Wollman <
wollman@hergotha.csail.mit.edu> wrote:

> In article
> <388835013.10159778.1424820357923.JavaMail.root@uoguelph.ca>,
> rmacklem@uoguelph.ca writes:
>
> >I tend to think that a bias towards doing Getattr/Lookup over Read/Write
> >may help performance (the old "shortest job first" principal), I'm not
> >sure you'll have a big enough queue of outstanding RPCs under normal load
> >for this to make a real difference.
>
> I don't think this is a particularly relevant condition here.  There
> are lots of ways RPCs can pile up where you really need to do better
> work-sharing than the current implementation does.  One example is a
> client that issues lots of concurrent reads (e.g., a compute node
> running dozens of parallel jobs).  Two such systems on gigabit NICs
> can easily issue large reads fast enough to cause 64 nfsd service
> threads to blocked while waiting for the socket send buffer to drain.
> Meanwhile, the file server is completely idle, but unable to respond
> to incoming requests, and the other users get angry.  Rather than
> assigning new threads to requests from the slow clients, it would be
> better to let the requests sit until the send buffer drains, and
> process other clients' requests instead of letting the resources get
> monopolized by a single user.
>
> Lest you think this is purely hypothetical: we actually experienced
> this problem today, and I verified with "procstat -kk" that all of the
> nfsd threads were in fact blocked waiting for send buffer space to
> open up.  I was able to restore service immediately by increasing the
> number of nfsd threads, but I'm unsure to what extent I can do this
> without breaking other things or hitting other bottlenecks.[1]  So I
> have a user asking me why I haven't enable fair-share scheduling for
> NFS, and I'm going to have to tell him the answer is "no such thing".
>
> -GAWollman
>
> [1] What would the right number actually be?  We could potentially
> have many thousands of threads in a compute cluster all operating
> simultaneously on the same filesystem, well within the I/O capacity of
> the server, and we'd really like to degrade gracefully rather than
> falling over when a single slow client soaks up all of the nfsd worker
> threads.
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>