Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jan 2014 21:16:54 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Adam McDougall <mcdouga9@egr.msu.edu>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID:  <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
In-Reply-To: <52DC1241.7010004@egr.msu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_16590856_824730477.1390789014322
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Adam McDougall wrote:
> Also try rsize=32768,wsize=32768 in your mount options, made a huge
> difference for me.  I've noticed slow file transfers on NFS in 9 and
> finally did some searching a couple months ago, someone suggested it
> and
> they were on to something.
> 
I have a "hunch" that might explain why 64K NFS reads/writes perform
poorly for some network environments.
A 64K NFS read reply/write request consists of a list of 34 mbufs when
passed to TCP via sosend() and a total data length of around 65680bytes.
Looking at a couple of drivers (virtio and ixgbe), they seem to expect
no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I think
(I don't have anything that does TSO to confirm this) that NFS will pass
a list that is longer (34 plus a TCP/IP header).
At a glance, it appears that the drivers call m_defrag() or m_collapse()
when the mbuf list won't fit in their scatter table (32 or 33 elements)
and if this fails, just silently drop the data without sending it.
If I'm right, there would considerable overhead from m_defrag()/m_collapse()
and near disaster if they fail to fix the problem and the data is silently
dropped instead of xmited.

Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE clusters,
so the mbuf count drops from 34 to 18.

If anyone has a TSO scatter/gather enabled net interface and can test this
patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is enabled
and see what effect it has, that would be appreciated.

Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE
clusters.

rick
ps: If the attachment doesn't make it through and you want the patch, just
    email me and I'll send you a copy.

> On 01/19/2014 09:32, Alfred Perlstein wrote:
> > 9.x has pretty poor mbuf tuning by default.
> > 
> > I hit nearly the same problem and raising the mbufs worked for me.
> > 
> > I'd suggest raising that and retrying.
> > 
> > -Alfred
> > 
> > On 1/19/14 12:47 AM, J David wrote:
> >> While setting up a test for other purposes, I noticed some really
> >> horrible NFS performance issues.
> >>
> >> To explore this, I set up a test environment with two FreeBSD
> >> 9.2-RELEASE-p3 virtual machines running under KVM.  The NFS server
> >> is
> >> configured to serve a 2 gig mfs on /mnt.
> >>
> >> The performance of the virtual network is outstanding:
> >>
> >> Server:
> >>
> >> $ iperf -c 172.20.20.169
> >>
> >> ------------------------------------------------------------
> >>
> >> Client connecting to 172.20.20.169, TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  3] local 172.20.20.162 port 59717 connected with 172.20.20.169
> >> port
> >> 5001
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  3]  0.0-10.0 sec  16.1 GBytes  13.8 Gbits/sec
> >>
> >> $ iperf -s
> >>
> >> ------------------------------------------------------------
> >>
> >> Server listening on TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  4] local 172.20.20.162 port 5001 connected with 172.20.20.169
> >> port
> >> 45655
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  4]  0.0-10.0 sec  15.8 GBytes  13.6 Gbits/sec
> >>
> >>
> >> Client:
> >>
> >>
> >> $ iperf -s
> >>
> >> ------------------------------------------------------------
> >>
> >> Server listening on TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  4] local 172.20.20.169 port 5001 connected with 172.20.20.162
> >> port
> >> 59717
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  4]  0.0-10.0 sec  16.1 GBytes  13.8 Gbits/sec
> >>
> >> ^C$ iperf -c 172.20.20.162
> >>
> >> ------------------------------------------------------------
> >>
> >> Client connecting to 172.20.20.162, TCP port 5001
> >>
> >> TCP window size: 1.00 MByte (default)
> >>
> >> ------------------------------------------------------------
> >>
> >> [  3] local 172.20.20.169 port 45655 connected with 172.20.20.162
> >> port
> >> 5001
> >>
> >> [ ID] Interval       Transfer     Bandwidth
> >>
> >> [  3]  0.0-10.0 sec  15.8 GBytes  13.6 Gbits/sec
> >>
> >>
> >> The performance of the mfs filesystem on the server is also good.
> >>
> >> Server:
> >>
> >> $ sudo mdconfig -a -t swap -s 2g
> >>
> >> md0
> >>
> >> $ sudo newfs -U -b 4k -f 4k /dev/md0
> >>
> >> /dev/md0: 2048.0MB (4194304 sectors) block size 4096, fragment
> >> size 4096
> >>
> >> using 43 cylinder groups of 48.12MB, 12320 blks, 6160 inodes.
> >>
> >> with soft updates
> >>
> >> super-block backups (for fsck_ffs -b #) at:
> >>
> >>   144, 98704, 197264, 295824, 394384, 492944, 591504, 690064,
> >>   788624,
> >> 887184,
> >>
> >>   985744, 1084304, 1182864, 1281424, 1379984, 1478544, 1577104,
> >>   1675664,
> >>
> >>   1774224, 1872784, 1971344, 2069904, 2168464, 2267024, 2365584,
> >>   2464144,
> >>
> >>   2562704, 2661264, 2759824, 2858384, 2956944, 3055504, 3154064,
> >>   3252624,
> >>
> >>   3351184, 3449744, 3548304, 3646864, 3745424, 3843984, 3942544,
> >>   4041104,
> >>
> >>   4139664
> >>
> >> $ sudo mount /dev/md0 /mnt
> >>
> >> $ cd /mnt
> >>
> >> $ sudo iozone -e -I -s 512m -r 4k -i 0 -i 1 -i 2
> >>
> >> Iozone: Performance Test of File I/O
> >>
> >>          Version $Revision: 3.420 $
> >>
> >> [...]
> >>
> >>                                                              random
> >> random
> >>
> >>                KB  reclen   write rewrite    read    reread
> >>                   read
> >> write
> >>
> >>            524288       4  560145 1114593   933699   831902
> >>              56347
> >> 158904
> >>
> >>
> >> iozone test complete.
> >>
> >>
> >> But introduce NFS into the mix and everything falls apart.
> >>
> >> Client:
> >>
> >> $ sudo mount -o tcp,nfsv3 f12.phxi:/mnt /mnt
> >>
> >> $ cd /mnt
> >>
> >> $ sudo iozone -e -I -s 512m -r 4k -i 0 -i 1 -i 2
> >>
> >> Iozone: Performance Test of File I/O
> >>
> >>          Version $Revision: 3.420 $
> >>
> >> [...]
> >>
> >>                                                              random
> >> random
> >>
> >>                KB  reclen   write rewrite    read    reread
> >>                   read
> >> write
> >>
> >>            524288       4   67246    2923   103295  1272407
> >>             172475
> >> 196
> >>
> >>
> >> And the above took 48 minutes to run, compared to 14 seconds for
> >> the
> >> local version.  So it's 200x slower over NFS.  The random write
> >> test
> >> is over 800x slower.  Of course NFS is slower, that's expected,
> >> but it
> >> definitely wasn't this exaggerated in previous releases.
> >>
> >> To emphasize that iozone reflects real workloads here, I tried
> >> doing
> >> an svn co of the 9-STABLE source tree over NFS but after two hours
> >> it
> >> was still in llvm so I gave up.
> >>
> >> While all this not-much-of-anything NFS traffic is going on, both
> >> systems are essentially idle.  The process on the client sits in
> >> "newnfs" wait state with nearly no CPU.  The server is completely
> >> idle
> >> except for the occasional 0.10% in an nfsd thread, which otherwise
> >> spend their lives in rpcsvc wait state.
> >>
> >> Server iostat:
> >>
> >> $ iostat -x -w 10 md0
> >>
> >>                         extended device statistics
> >>
> >> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> >>
> >> [...]
> >>
> >> md0        0.0  36.0     0.0     0.0    0   1.2   0
> >> md0        0.0  38.8     0.0     0.0    0   1.5   0
> >> md0        0.0  73.6     0.0     0.0    0   1.0   0
> >> md0        0.0  53.3     0.0     0.0    0   2.5   0
> >> md0        0.0  33.7     0.0     0.0    0   1.1   0
> >> md0        0.0  45.5     0.0     0.0    0   1.8   0
> >>
> >> Server nfsstat:
> >>
> >> $ nfsstat -s -w 10
> >>
> >>   GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
> >>
> >> [...]
> >>
> >>        0      0      0    471    816      0      0      0
> >>
> >>        0      0      0    480    751      0      0      0
> >>
> >>        0      0      0    481     36      0      0      0
> >>
> >>        0      0      0    469    550      0      0      0
> >>
> >>        0      0      0    485    814      0      0      0
> >>
> >>        0      0      0    467    503      0      0      0
> >>
> >>        0      0      0    473    345      0      0      0
> >>
> >>
> >> Client nfsstat:
> >>
> >> $ nfsstat -c -w 10
> >>
> >>   GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
> >>
> >> [...]
> >>
> >>        0      0      0      0    518      0      0      0
> >>
> >>        0      0      0      0    498      0      0      0
> >>
> >>        0      0      0      0    503      0      0      0
> >>
> >>        0      0      0      0    474      0      0      0
> >>
> >>        0      0      0      0    525      0      0      0
> >>
> >>        0      0      0      0    497      0      0      0
> >>
> >>
> >> Server vmstat:
> >>
> >> $ vmstat -w 10
> >>
> >>   procs      memory      page                    disks
> >> faults         cpu
> >>
> >>   r b w     avm    fre   flt  re  pi  po    fr  sr vt0 vt1   in
> >>     sy
> >> cs us sy id
> >>
> >> [...]
> >>
> >>   0 4 0    634M  6043M    37   0   0   0     1   0   0   0 1561
> >>     46
> >> 3431  0  2 98
> >>
> >>   0 4 0    640M  6042M    62   0   0   0    28   0   0   0 1598
> >>     94
> >> 3552  0  2 98
> >>
> >>   0 4 0    648M  6042M    38   0   0   0     0   0   0   0 1609
> >>     47
> >> 3485  0  1 99
> >>
> >>   0 4 0    648M  6042M    37   0   0   0     0   0   0   0 1615
> >>     46
> >> 3667  0  2 98
> >>
> >>   0 4 0    648M  6042M    37   0   0   0     0   0   0   0 1606
> >>     45
> >> 3678  0  2 98
> >>
> >>   0 4 0    648M  6042M    37   0   0   0     0   0   1   0 1561
> >>     45
> >> 3377  0  2 98
> >>
> >>
> >> Client vmstat:
> >>
> >> $ vmstat -w 10
> >>
> >>   procs      memory      page                    disks
> >> faults         cpu
> >>
> >>   r b w     avm    fre   flt  re  pi  po    fr  sr md0 da0   in
> >>     sy
> >> cs us sy id
> >>
> >> [...]
> >>
> >>   0 0 0    639M   593M    33   0   0   0  1237   0   0   0  281
> >>   5575
> >> 1043  0  3 97
> >>
> >>   0 0 0    639M   591M     0   0   0   0   712   0   0   0  235
> >>    122
> >> 889  0  2 98
> >>
> >>   0 0 0    639M   583M     0   0   0   0   571   0   0   1  227
> >>    120
> >> 851  0  2 98
> >>
> >>   0 0 0    639M   592M   198   0   0   0  1212   0   0   0  251
> >>   2497
> >> 950  0  3 97
> >>
> >>   0 0 0    639M   586M     0   0   0   0   614   0   0   0  250
> >>    121
> >> 924  0  2 98
> >>
> >>   0 0 0    639M   586M     0   0   0   0   765   0   0   0  250
> >>    120
> >> 918  0  3 97
> >>
> >>
> >> Top on the KVM host says it is 93-95% idle and that each VM sits
> >> around 7-10% CPU.  So basically nobody is doing anything.  There's
> >> no
> >> visible bottleneck, and I've no idea where to go from here to
> >> figure
> >> out what's going on.
> >>
> >> Does anyone have any suggestions for debugging this?
> >>
> >> Thanks!
> >> _______________________________________________
> >> freebsd-net@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to
> >> "freebsd-net-unsubscribe@freebsd.org"
> >>
> > 
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > "freebsd-net-unsubscribe@freebsd.org"
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 

------=_Part_16590856_824730477.1390789014322
Content-Type: text/x-patch; name=4kmcl.patch
Content-Disposition: attachment; filename=4kmcl.patch
Content-Transfer-Encoding: base64

LS0tIGZzL25mcy9uZnNwb3J0Lmguc2F2MgkyMDE0LTAxLTI2IDE4OjQzOjQ3LjAwMDAwMDAwMCAt
MDUwMAorKysgZnMvbmZzL25mc3BvcnQuaAkyMDE0LTAxLTI2IDE5OjA0OjI3LjAwMDAwMDAwMCAt
MDUwMApAQCAtMTUzLDE0ICsxNTMsMjcgQEAKIAkJCU1HRVRIRFIoKG0pLCBNX1dBSVRPSywgTVRf
REFUQSk7IAlcCiAJCX0gCQkJCQkJXAogCX0gd2hpbGUgKDApCi0jZGVmaW5lCU5GU01DTEdFVCht
LCB3KQlkbyB7IAkJCQkJXAotCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCQlcCi0J
CXdoaWxlICgobSkgPT0gTlVMTCApIHsgCQkJCVwKLQkJCSh2b2lkKSBuZnNfY2F0bmFwKFBaRVJP
LCAwLCAibmZzbWdldCIpOwlcCi0JCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOyAJCVwK
LQkJfSAJCQkJCQlcCi0JCU1DTEdFVCgobSksICh3KSk7CQkJCVwKKyNpZiBNSlVNUEFHRVNJWkUg
PiBNQ0xCWVRFUworI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkgCQkJCQlcCisJCShtKSA9
IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNSlVNUEFHRVNJWkUpOwlcCisJCXdoaWxl
ICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAoUFpFUk8sIDAsICJu
ZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEpOwkgCQlcCisJCQlp
ZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJCVwKKwkJfQkgCQkJ
CQkJXAogCX0gd2hpbGUgKDApCisjZWxzZQorI2RlZmluZQlORlNNQ0xHRVQobSwgdykJZG8gewkg
CQkJCQlcCisJCShtKSA9IG1fZ2V0amNsKE1fV0FJVE9LLCBNVF9EQVRBLCAwLCBNQ0xCWVRFUyk7
CQlcCisJCXdoaWxlICgobSkgPT0gTlVMTCkgewkgCQkJCVwKKwkJCSh2b2lkKW5mc19jYXRuYXAo
UFpFUk8sIDAsICJuZnNtZ2V0Iik7CQlcCisJCQlNR0VUKChtKSwgTV9XQUlUT0ssIE1UX0RBVEEp
OwkgCQlcCisJCQlpZiAoKG0pICE9IE5VTEwpCQkJCVwKKwkJCQlNQ0xHRVQoKG0pLCAodykpOwkJ
CVwKKwkJfQkgCQkJCQkJXAorCX0gd2hpbGUgKDApCisjZW5kaWYKICNkZWZpbmUJTkZTTUNMR0VU
SERSKG0sIHcpIGRvIHsgCQkJCVwKIAkJTUdFVEhEUigobSksIE1fV0FJVE9LLCBNVF9EQVRBKTsJ
CVwKIAkJd2hpbGUgKChtKSA9PSBOVUxMICkgeyAJCQkJXAotLS0gZnMvbmZzc2VydmVyL25mc19u
ZnNkcG9ydC5jLnNhdjIJMjAxNC0wMS0yNiAxODo1NDoyOS4wMDAwMDAwMDAgLTA1MDAKKysrIGZz
L25mc3NlcnZlci9uZnNfbmZzZHBvcnQuYwkyMDE0LTAxLTI2IDE4OjU2OjA4LjAwMDAwMDAwMCAt
MDUwMApAQCAtNTY2LDggKzU2Niw3IEBAIG5mc3Zub19yZWFkbGluayhzdHJ1Y3Qgdm5vZGUgKnZw
LCBzdHJ1Y3QKIAlsZW4gPSAwOwogCWkgPSAwOwogCXdoaWxlIChsZW4gPCBORlNfTUFYUEFUSExF
TikgewotCQlORlNNR0VUKG1wKTsKLQkJTUNMR0VUKG1wLCBNX1dBSVRPSyk7CisJCU5GU01DTEdF
VChtcCwgTV9XQUlUT0spOwogCQltcC0+bV9sZW4gPSBORlNNU0laKG1wKTsKIAkJaWYgKGxlbiA9
PSAwKSB7CiAJCQltcDMgPSBtcDIgPSBtcDsKQEAgLTYzNiw4ICs2MzUsNyBAQCBuZnN2bm9fcmVh
ZChzdHJ1Y3Qgdm5vZGUgKnZwLCBvZmZfdCBvZmYsCiAJICovCiAJaSA9IDA7CiAJd2hpbGUgKGxl
ZnQgPiAwKSB7Ci0JCU5GU01HRVQobSk7Ci0JCU1DTEdFVChtLCBNX1dBSVRPSyk7CisJCU5GU01D
TEdFVChtLCBNX1dBSVRPSyk7CiAJCW0tPm1fbGVuID0gMDsKIAkJc2l6ID0gbWluKE1fVFJBSUxJ
TkdTUEFDRShtKSwgbGVmdCk7CiAJCWxlZnQgLT0gc2l6Owo=
------=_Part_16590856_824730477.1390789014322--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1629593139.16590858.1390789014324.JavaMail.root>