Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Feb 2015 18:25:57 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Tim Borgeaud <timothy.borgeaud@framestore.com>
Cc:        freebsd-net@freebsd.org, Alexander Motin <mav@freebsd.org>, "Kenneth D. Merry" <ken@freebsd.org>, Mark Hills <mark.hills@framestore.com>
Subject:   Re: NFS: kernel modules (loading/unloading) and scheduling
Message-ID:  <388835013.10159778.1424820357923.JavaMail.root@uoguelph.ca>
In-Reply-To: <CADqOPxsJ4Sjt_u6%2Bh5B8sFWFzOHQA28E69H0LnxxZg1UPeup7g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Tim Borgeaud wrote:
> Hi FreeBSD folks,
>=20
> here at Framestore Mark Hills (cc'd) has been looking at how NFS
> servers
> schedule/prioritize incoming requests with a view towards a
> client/user
> 'fair share' of a service.
>=20
> We are taking a look at trying out some simple approaches of queuing
> up and
> handling requests.
>=20
> There are, obviously, various matters to deal with here, and several
> complications. However, an initial consideration is how we might best
> develop and test with the FreeBSD kernel code. Specifically, whether
> we
> would be able to unload and reload NFS related kernel modules, or
> whether
> there are any other alternatives.
>=20
> It looks like there are several modules involved such as krpc,
> nfssvc,
> nfscommon, nfslock, which we can build outside of the kernel itself,
> but
> which do not all support unloading.
>=20
> Not being able to reload a module does seem to present a hurdle to
> development and, therefore, I'd like to know how FreeBSD developers
> have
> managed development of the NFS functionality and what approaches may
> be
> recommended.
>=20
> It occurs to me that it would be possible for us to consider adding
> some
> ability to unload some of these modules, even if it were not suitable
> for
> anything other than development. Therefore, an extension of my main,
> more
> general, query is to ask how straightforward or fundamentally
> difficult
> this may be (for the NFS modules)?
>=20
Well, the ones that don't allow unloading do so because they can't safely
be unloaded. For most (maybe all), the problem is that there is no way for
the module to know if there is an RPC in progress, so unloading them when
there is traffic arriving from clients could cause a crash.

If you can block all incoming NFS RPC requests (maybe just turn off the
net interface(s)?), then you could probably unload/reload them without
causing any disaster. (Since this hasn't been done, there may be some
memory leaks and maybe a variable that needs to be re-initialized in the
load.)
If you grep for MOD_UNLOAD, you can probably find the code that returns
an error and just comment that out.
--> Then you can try and if it doesn't crash on the unload...

I would have thought a server reboot wouldn't take much longer than
the unload/reload and would do the same thing safely.

Now, on the "big picture":
- I think you'll find that the "interesting case" is when there are
  several RPC requests to choose from. However, the catch22 here is
  that, when you get an NFS server to that state, it is heavily
  overloaded and, as such, isn't performing at an acceptable level.
  Yes, there probably are short bursts of RPC traffic where the
  server can choose an RPC ordering, but I suspect these will usually
  be a burst of I/O on a single file (due to readaheads/write behinds,
  client side buffer cache flushing etc).
--> This brings us to "file handle affinity".
  Before FreeBSD had shared vnode locks, the vnode lock serialized all
  operations on a given vnode (ie. file). If these RPCs were handed to
  different nfsd threads, they were all tied up doing RPCs for one
  file serially and weren't available for other RPCs.
  --> This was "solved" by assigning an nfsd thread to do Ops for a
      given file handle.
Then shared vnodes came along and allow many ops to be done on a given
file concurrently via different nfsd threads, which I'd argue is a good
thing?
--> Unfortunately, ken@ found that when read/wrote ops. were done on ZFS
    "out of sequential order", ZFS's sequential I/O heuristic would fail
    and decide that the I/O was random. This caused a big performance hit
    for ZFS.
    --> As such, he found that file handle affinity does help w.r.t. I/O
        performance for ZFS because it reduced the "out-of-sequential-order=
ness"
        of the I/O ops.
    When I discussed this with mav@, he felt that ZFS's sequential I/O
    heuristic needed to be fixed/improved and that was where the problem
    should be attacked.

So, I think that the benefit of file handle affinity when used with
shared vnode locking is still an open question (in general, ignoring
the above ZFS case).
Since file handle affinity is very hard to do for NFSv4, I like the idea
of fixing ZFS.

I tend to think that a bias towards doing Getattr/Lookup over Read/Write
may help performance (the old "shortest job first" principal), I'm not
sure you'll have a big enough queue of outstanding RPCs under normal load
for this to make a real difference.

I don't think you want to delay doing any RPC "waiting" for a preferred
RPC to arrive. Any delay like this will increase RPC response time, which
is the main NFS performance issue.

rick
ps: NFS types are more likely to read freebsd-fs@ than freebsd-net@.
pss: I hope ken and mav don't mind me adding them as cc's.

> Thanks very much
>=20
> --
> Tim Borgeaud
> Systems Developer
> =E2=80=8BFramestore
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?388835013.10159778.1424820357923.JavaMail.root>