Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jan 2014 19:32:13 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>
Cc:        freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org>
Subject:   Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID:  <312973812.17975525.1390955533440.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RQksyZq43=jLw3wJT5vLzuK4h5cgE=Lj4caq1RgOBa8gA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_17975523_770461322.1390955533437
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

J David wrote:
> On Tue, Jan 28, 2014 at 10:35 AM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > Since messgaes are sent quickly and then mbufs released, except for
> > the DRC in the server, I think avoiding large allocations for
> > server
> > replies that may be cached is the case to try and avoid.
> > Fortunately
> > the large replies will be for read and readdir and these don't need
> > to be cached by the DRC. As such, a patch that uses 4K clusters in
> > the server for read, readdir and 4K clusters for write requests in
> > the client, should be appropriate, I think?
> 
> m_getm2 appears to consistent produce "right-sized" results.  The
> relevant code is:
> 
>     while (len > 0) {
> 
>         if (len > MCLBYTES)
> 
>             mb = m_getjcl(how, type, (flags & M_PKTHDR),
> 
>                 MJUMPAGESIZE);
> 
>         else if (len >= MINCLSIZE)
> 
>             mb = m_getcl(how, type, (flags & M_PKTHDR));
> 
>         else if (flags & M_PKTHDR)
> 
>             mb = m_gethdr(how, type);
> 
>         else
> 
>             mb = m_get(how, type);
> 
> /* ... */
> 
>     }
> 
> So it allocates the shortest possible chain and uses the best-fit
> cluster for the last (or only) block in the chain.
> 
> It's probably the use of this function in m_uiotombuf or somewhere
> very similar that prevents tools like iperf from encountering this
> same issue.
> 
> Getting this same logic into the NFS code seems like it would be a
> good thing, in terms of reducing code duplication, increasing
> performance, and leveraging a well-tested code path.
> 
For the server generating read replies, I suspect this is the case and
that is what Garrett Wollman's patch does. However, readdir builds up
the reply in small chunks via NFSM_BUILD() and this will require an extra
argument that says "allocate a big cluster". Since it builds the reply in
small chunks, it cannot use m_getm2().

I haven't looked at the client side write yet, so I don't know if m_getm2()
is feasible for it or not.

Hopefully Garrett and/or you will be able to do some testing of it
and report back w.r.t. performance gains, etc. Once we have that,
we can decide if this is an appropriate commit to head.

Since I suspect it will take some time for Garrett to do this, please
try my simple patch in your test environment, mostly to determine if
the fail count goes to 0 (and also count calls to m_collapse() without/with
the patch, since those will impact performance, too).

Thanks in advance for trying the patch, rick
ps: Attached again, just in case you don't already have it.

> It may raise portability concerns, but it does seem likely that other
> OS's to which the NFS code could potentially be ported have similar
> mechanisms these days.  Possibly it would be worthwhile to examine
> whether the NFS code could choose a slightly different point of
> abstraction.  Or, if that's undesirable, maybe asking the
> hypothetical
> person doing such a port to cross that bridge when they come to it is
> not unreasonable, since that would be the person most likely to be
> intimately familiar with the relevant details of both OS's.
> 
As I mentioned before, I am no longer concerned about portability.
The discussion about portability was meant to explain why the code
was written the way it was and, yes, I did note that "portability is
nice" but did not intend to imply that that should limit modifications
to the code that improve it for FreeBSD.

> Also, looking at GAWollman's patch, an mbuf+cluster allocator that
> kicks back a prewired iovec seems really handy.  Is that something
> that would be useful elsewhere in the kernel, or is NFS just kind of
> a
> special case because it's just moving data around, not across weird
> boundaries like device drivers and anything user mode-facing does?
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

------=_Part_17975523_770461322.1390955533437
Content-Type: text/x-patch; name=4kmcl.patch
Content-Disposition: attachment; filename=4kmcl.patch
Content-Transfer-Encoding: base64

LS0tIGZzL25mcy9uZnNwb3J0Lmguc2F2MgkyMDE0LTAxLTI2IDE4OjQzOjQ3LjAwMDAwMDAwMCAt
MDUwMAorKysgZnMvbmZzL25mc3BvcnQuaAkyMDE0LTAxLTI2IDE5OjA0OjI3LjAwMDAwMDAwMCAt
MDUwMApAQCAtMTUzLDE0ICsxNTMsMjcgQEAKIAkJCU1HRVRIRFIoKG0pLCBNX1dBSVRPSywgTVRf
REFUQSk7IAlcCiAJCX0gCQkJCQkJXAogCX0gd2hpbGUgKDApCi0jZGVmaW5lCU5GU01DTEdFVCht
LCB3KQlkbyB7IAkJCQkJXAotCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCQlcCi0J
CXdoaWxlICgobSkgPT0gTlVMTCApIHsgCQkJCVwKLQkJCSh2b2lkKSBuZnNfY2F0bmFwKFBaRVJP
LCAwLCAibmZzbWdldCIpOwlcCi0JCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCVwK
LQkJfSAJCQkJCQlcCi0JCU1DTEdFVCgobSksICh3KSk7CQkJCVwKKyNpZiBNSlVNUEFHRVNJWkUg
PiBNQ0xCWVRFUworI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkgCQkJCQlcCisJCShtKSA9
IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNSlVNUEFHRVNJWkUpOwlcCisJCXdoaWxl
ICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAoUFpFUk8sIDAsICJu
ZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOwkgCQlcCisJCQlp
ZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJCVwKKwkJfQkgCQkJ
CQkJXAogCX0gd2hpbGUgKDApCisjZWxzZQorI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkg
CQkJCQlcCisJCShtKSA9IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNQ0xCWVRFUyk7
CQlcCisJCXdoaWxlICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAo
UFpFUk8sIDAsICJuZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEp
OwkgCQlcCisJCQlpZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJ
CVwKKwkJfQkgCQkJCQkJXAorCX0gd2hpbGUgKDApCisjZW5kaWYKICNkZWZpbmUJTkZTTUNMR0VU
SERSKG0sIHcpIGRvIHsgCQkJCVwKIAkJTUdFVEhEUigobSksIE1fV0FJVE9LLCBNVF9EQVRBKTsJ
CVwKIAkJd2hpbGUgKChtKSA9PSBOVUxMICkgeyAJCQkJXAotLS0gZnMvbmZzc2VydmVyL25mc19u
ZnNkcG9ydC5jLnNhdjIJMjAxNC0wMS0yNiAxODo1NDoyOS4wMDAwMDAwMDAgLTA1MDAKKysrIGZz
L25mc3NlcnZlci9uZnNfbmZzZHBvcnQuYwkyMDE0LTAxLTI2IDE4OjU2OjA4LjAwMDAwMDAwMCAt
MDUwMApAQCAtNTY2LDggKzU2Niw3IEBAIG5mc3Zub19yZWFkbGluayhzdHJ1Y3Qgdm5vZGUgKnZw
LCBzdHJ1Y3QKIAlsZW4gPSAwOwogCWkgPSAwOwogCXdoaWxlIChsZW4gPCBORlNfTUFYUEFUSExF
TikgewotCQlORlNNR0VUKG1wKTsKLQkJTUNMR0VUKG1wLCBNX1dBSVRPSyk7CisJCU5GU01DTEdF
VChtcCwgTV9XQUlUT0spOwogCQltcC0+bV9sZW4gPSBORlNNU0laKG1wKTsKIAkJaWYgKGxlbiA9
PSAwKSB7CiAJCQltcDMgPSBtcDIgPSBtcDsKQEAgLTYzNiw4ICs2MzUsNyBAQCBuZnN2bm9fcmVh
ZChzdHJ1Y3Qgdm5vZGUgKnZwLCBvZmZfdCBvZmYsCiAJICovCiAJaSA9IDA7CiAJd2hpbGUgKGxl
ZnQgPiAwKSB7Ci0JCU5GU01HRVQobSk7Ci0JCU1DTEdFVChtLCBNX1dBSVRPSyk7CisJCU5GU01D
TEdFVChtLCBNX1dBSVRPSyk7CiAJCW0tPm1fbGVuID0gMDsKIAkJc2l6ID0gbWluKE1fVFJBSUxJ
TkdTUEFDRShtKSwgbGVmdCk7CiAJCWxlZnQgLT0gc2l6Owo=
------=_Part_17975523_770461322.1390955533437--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?312973812.17975525.1390955533440.JavaMail.root>