Date: Fri, 25 Nov 2016 16:34:31 +0100 From: "O. Hartmann" <ohartman@zedat.fu-berlin.de> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Alan Somers <asomers@freebsd.org>, Rick Macklem <rmacklem@uoguelph.ca>, FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: NFSv4 performance degradation with 12.0-CURRENT client Message-ID: <20161125163431.693b1ce0@thor.walstatt.dynvpn.de> In-Reply-To: <20161124203542.GU54029@kib.kiev.ua> References: <CAOtMX2jJ2XoQyVG1c04QL7NTJn1pg38s=XEgecE38ea0QoFAOw@mail.gmail.com> <20161124090811.GO54029@kib.kiev.ua> <YTXPR01MB0189E0B1DB5B16EE6B388B7DDDB60@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM> <CAOtMX2hBXAJN_udED-u5%2B6UznR2%2BW88xgb=RqKSZL65Z3%2BcKOw@mail.gmail.com> <20161124203542.GU54029@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/BJZ1.xZRQDmXQvrv71sbaAx
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Am Thu, 24 Nov 2016 22:35:42 +0200
Konstantin Belousov <kostikbel@gmail.com> schrieb:
> On Thu, Nov 24, 2016 at 11:42:41AM -0700, Alan Somers wrote:
> > On Thu, Nov 24, 2016 at 5:53 AM, Rick Macklem <rmacklem@uoguelph.ca> wr=
ote: =20
> > >
> > > On Wed, Nov 23, 2016 at 10:17:25PM -0700, Alan Somers wrote: =20
> > >> I have a FreeBSD 10.3-RELEASE-p12 server exporting its home
> > >> directories over both NFSv3 and NFSv4. I have a TrueOS client (based
> > >> on 12.0-CURRENT on the drm-next-4.7 branch, built on 28-October)
> > >> mounting the home directories over NFSv4. At first, everything is
> > >> fine and performance is good. But if the client does a buildworld
> > >> using sources on NFS and locally stored objects, performance slowly
> > >> degrades. The degradation is most noticeable with metadata-heavy
> > >> operations. For example, "ls -l" in a directory with 153 files takes
> > >> less than 0.1 seconds right after booting. But the longer the
> > >> buildworld goes on, the slower it gets. Eventually that same "ls -l"
> > >> takes 19 seconds. When the home directories are mounted over NFSv3
> > >> instead, I see no degradation.
> > >>
> > >> top shows negligible CPU consumption on the server, and very high
> > >> consumption on the client when using NFSv4 (nearly 100%). The
> > >> NFS-using process is spending almost all of its time in system mode,
> > >> and dtrace shows that almost all of its time is spent in
> > >> ncl_getpages().
> > >> =20
> > > A couple of things you could do when it slow (as well as what Kostik =
suggested):
> > > - nfsstat -c -e on client and nfsstat -e -s on server, to see what RP=
Cs are being
> > > done and how quickly. (nfsstat -s -e will also show you how big the D=
RC is,
> > > although a large DRC should show up as increased CPU consumption on t=
he server)
> > > - capture packets with tcpdump -s 0 -w test.pcap host <other-one>
> > > - then you can email me test.pcap as an attachment. I can look at i=
t in wireshark
> > > and see if there seem to protocol and/or TCP issues. (You can loo=
k at in
> > > wireshark yourself, the look for NFS4ERR_xxx, TCP segment retransmits=
...)
> > >
> > > If you are using either "intr" or "soft" on the mounts, try without t=
hose mount
> > > options. (The Bugs section of mount_nfs recommends against using them=
. If an RPC
> > > fails due to these options, something called a seqid# can be "out of =
sync" between
> > > client/server and that causes serious problems.) =20
> > > --> These seqid#s are not used by NFSv4.1, so you could try that by a=
dding =20
> > > "minorversion=3D1" to your mount options.
> > >
> > > Good luck with it, rick =20
> >=20
> > I've reproduced the issue on stock FreeBSD 12, and I've also learned
> > that nullfs is a required factor. Doing the buildworld directly on
> > the NFS mount doesn't cause any slowdown, but doing a buildworld on
> > the nullfs copy of the NFS mount does. The slowdown affects the base
> > NFS mount as well as the nullfs copy. Here is the nfsstat output for
> > both server and client duing "ls -al" on the client:
> >=20
> > nfsstat -e -s -z
> >=20
> > Server Info:
> > Getattr Setattr Lookup Readlink Read Write Create =
Remove
> > 800 0 121 0 0 2 0 =
0
> > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus =
Access
> > 0 0 0 0 0 0 0 =
8
> > Mknod Fsstat Fsinfo PathConf Commit LookupP SetClId S=
etClIdCf
> > 0 0 0 0 1 3 0 =
0
> > Open OpenAttr OpenDwnGr OpenCfrm DelePurge DeleRet GetFH =
Lock
> > 0 0 0 0 0 0 123 =
0
> > LockT LockU Close Verify NVerify PutFH PutPubFH P=
utRootFH
> > 0 0 0 0 0 674 0 =
0
> > Renew RestoreFH SaveFH Secinfo RelLckOwn V4Create
> > 0 0 0 0 0 0
> > Server:
> > Retfailed Faults Clients
> > 0 0 0
> > OpenOwner Opens LockOwner Locks Delegs
> > 0 0 0 0 0
> > Server Cache Stats:
> > Inprog Idem Non-idem Misses CacheSize TCPPeak
> > 0 0 0 674 16738 16738
> >=20
> > nfsstat -e -c -z
> > Client Info:
> > Rpc Counts:
> > Getattr Setattr Lookup Readlink Read Write Create =
Remove
> > 60 0 119 0 0 0 0 =
0
> > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus =
Access
> > 0 0 0 0 0 0 0 =
3
> > Mknod Fsstat Fsinfo PathConf Commit SetClId SetClIdCf =
Lock
> > 0 0 0 0 0 0 0 =
0
> > LockT LockU Open OpenCfr
> > 0 0 0 0
> > OpenOwner Opens LockOwner Locks Delegs LocalOwn LocalOpen L=
ocalLOwn
> > 5638 141453 0 0 0 0 0 =
0
> > LocalLock
> > 0
> > Rpc Info:
> > TimedOut Invalid X Replies Retries Requests
> > 0 0 0 0 662
> > Cache Info:
> > Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits =
Misses
> > 1275 58 837 121 0 0 0 =
0
> > BioRLHits Misses BioD Hits Misses DirE Hits Misses
> > 1 0 6 0 1 0
> >=20
> > And here are the most popular stack traces of "ls -al", as observed by
> > dtrace. The number beneath each stack is the number of times dtrace
> > observed that exact stack:
> >=20
> > kernel`bcmp+0x21
> > kernel`vinactive+0xc6
> > kernel`vputx+0x30e
> > kernel`kern_statat+0x165
> > kernel`sys_fstatat+0x2c
> > kernel`amd64_syscall+0x314
> > kernel`vputx+0x30e
> > kernel`NDFREE+0xaa
> > kernel`sys___acl_get_link+0x82
> > kernel`amd64_syscall+0x314
> > kernel`0xffffffff80eb95fb
> > 96
> >=20
> > kernel`nfscl_doclose+0x383
> > kernel`vinactive+0xc6
> > kernel`vputx+0x30e
> > kernel`NDFREE+0xaa
> > kernel`sys___acl_get_link+0x82
> > kernel`amd64_syscall+0x314
> > kernel`0xffffffff80eb95fb
> > 183
> >=20
> > kernel`nfscl_doclose+0x383
> > kernel`vinactive+0xc6
> > kernel`vputx+0x30e
> > kernel`kern_statat+0x165
> > kernel`sys_fstatat+0x2c
> > kernel`amd64_syscall+0x314
> > kernel`0xffffffff80eb95fb
> > 189
> >=20
> > kernel`lock_delay+0x52
> > kernel`nfs_lookup+0x337
> > kernel`VOP_LOOKUP_APV+0xda
> > kernel`lookup+0x6a2
> > kernel`namei+0x57e
> > kernel`sys___acl_get_link+0x55
> > kernel`amd64_syscall+0x314
> > kernel`0xffffffff80eb95fb
> > 194
> >=20
> > kernel`lock_delay+0x52
> > kernel`ncl_getattrcache+0x28
> > kernel`nfs_getattr+0x92
> > kernel`VOP_GETATTR_APV+0xda
> > kernel`vn_stat+0xa3
> > kernel`kern_statat+0xde
> > kernel`sys_fstatat+0x2c
> > kernel`amd64_syscall+0x314
> > kernel`0xffffffff80eb95fb
> > 196
> >=20
> > What role could nullfs be playing? =20
>=20
> Can you check two things:
> 1. Does NFSv3 mount used with nullfs your way cause the same issue, or no=
t ?
> You already said that NFSv3 somehow was not affected, but due to
> discovery that nullfs is part of the scenario, can you, please, confirm
> that still.
> 2. If you add nocache option to the nullfs mount which degrades, does the
> problem go away ?
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
I'm curious if this problem could also affect "poudriere" if related to nul=
lfs.
--Sig_/BJZ1.xZRQDmXQvrv71sbaAx
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEmeBi/TL7Cfr+/54/6AFwPsD/k3wFAlg4WgcACgkQ6AFwPsD/
k3yYXQf+IAI5FYcnC59lB5fXMYmpflTJLmlB1eRVU6Em8n1fHfmxhIaIJbosLdGo
Jg2+hirLdk79ViSC8jqwYVTOy8fFfp426T33UlvbUKJHz9pQMErw/7A+QBUcyipu
nMqZZA06Mi7AGtU112QaxeQXWO9VxRajfW2Jxu6qURgUs7qM5VybgkeLbTOb9sUL
m9g+fXsKnScVDhrB1kRfbBcAx6DMuHnym53HJ+xzHS/8OCHw7rpRVaIHqfM2NVMm
HhCEe19BHhDWYe7kriI/HR80eFdZTnp1pK00knYuXc58Zd0fGmqPwXu8vraT8+tr
NQfDFiw4XfIqXnxXulZAiFPO2nyBZg==
=5mzf
-----END PGP SIGNATURE-----
--Sig_/BJZ1.xZRQDmXQvrv71sbaAx--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161125163431.693b1ce0>
