From owner-freebsd-net@FreeBSD.ORG  Sun Jan 19 08:47:27 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0020FDB2;
 Sun, 19 Jan 2014 08:47:26 +0000 (UTC)
Received: from mail-ig0-x231.google.com (mail-ig0-x231.google.com
 [IPv6:2607:f8b0:4001:c05::231])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id AC31D1A4D;
 Sun, 19 Jan 2014 08:47:26 +0000 (UTC)
Received: by mail-ig0-f177.google.com with SMTP id k19so5407855igc.4
 for <multiple recipients>; Sun, 19 Jan 2014 00:47:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:date:message-id:subject:from:to:content-type;
 bh=FO9F9f0F43wS5W9s6/ygoOysOku2id76mo8IpY4Eflc=;
 b=XABKLYA4s1oYL5/wclgqsj6fogtiwMpxzBlScFgwuBAZU7+u0IUr8WutYYbOly7TUn
 QlrlvGR8+fucRnl/DGng/91YErTQ4SVUsGFqgDyls62UvI1K4xvX+Zr2H9p+TRj46fL6
 ULvRrzA6gOAc5KiRw6sK9O2NycuiNI8RIGUxY3TyiKDyH99CNmmtXqGQAyyXotuQOmpZ
 jWC281BwkyncuEvq/M7NjvmNcYeTczS4TyqroIwj9ioPPyCEm7SYmYR7tLfviAK4yt+5
 v9+xEDt0rz8SUamoX0jw/caIJDtDmRzUAUpw2RtkySYjKm4hjS96FOp3UZYqDJkpWmEN
 rA2w==
MIME-Version: 1.0
X-Received: by 10.51.17.101 with SMTP id gd5mr6719485igd.25.1390121245799;
 Sun, 19 Jan 2014 00:47:25 -0800 (PST)
Sender: jdavidlists@gmail.com
Received: by 10.42.170.8 with HTTP; Sun, 19 Jan 2014 00:47:25 -0800 (PST)
Date: Sun, 19 Jan 2014 03:47:25 -0500
X-Google-Sender-Auth: us4bPXAWxvueZzYlem9wY1c_NbE
Message-ID: <CABXB=RSmUe60e+J3bFVOGNcW8B6xyO5Kdgdhbo=3b94tJKUM4w@mail.gmail.com>
Subject: Terrible NFS performance under 9.2-RELEASE?
From: J David <j.david.lists@gmail.com>
To: freebsd-net@freebsd.org, freebsd-stable <freebsd-stable@freebsd.org>, 
 freebsd-virtualization@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Jan 2014 08:47:27 -0000

While setting up a test for other purposes, I noticed some really
horrible NFS performance issues.

To explore this, I set up a test environment with two FreeBSD
9.2-RELEASE-p3 virtual machines running under KVM.  The NFS server is
configured to serve a 2 gig mfs on /mnt.

The performance of the virtual network is outstanding:

Server:

$ iperf -c 172.20.20.169

------------------------------------------------------------

Client connecting to 172.20.20.169, TCP port 5001

TCP window size: 1.00 MByte (default)

------------------------------------------------------------

[  3] local 172.20.20.162 port 59717 connected with 172.20.20.169 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  16.1 GBytes  13.8 Gbits/sec

$ iperf -s

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size: 1.00 MByte (default)

------------------------------------------------------------

[  4] local 172.20.20.162 port 5001 connected with 172.20.20.169 port 45655

[ ID] Interval       Transfer     Bandwidth

[  4]  0.0-10.0 sec  15.8 GBytes  13.6 Gbits/sec


Client:


$ iperf -s

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size: 1.00 MByte (default)

------------------------------------------------------------

[  4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port 59717

[ ID] Interval       Transfer     Bandwidth

[  4]  0.0-10.0 sec  16.1 GBytes  13.8 Gbits/sec

^C$ iperf -c 172.20.20.162

------------------------------------------------------------

Client connecting to 172.20.20.162, TCP port 5001

TCP window size: 1.00 MByte (default)

------------------------------------------------------------

[  3] local 172.20.20.169 port 45655 connected with 172.20.20.162 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  15.8 GBytes  13.6 Gbits/sec


The performance of the mfs filesystem on the server is also good.

Server:

$ sudo mdconfig -a -t swap -s 2g

md0

$ sudo newfs -U -b 4k -f 4k /dev/md0

/dev/md0: 2048.0MB (4194304 sectors) block size 4096, fragment size 4096

using 43 cylinder groups of 48.12MB, 12320 blks, 6160 inodes.

with soft updates

super-block backups (for fsck_ffs -b #) at:

 144, 98704, 197264, 295824, 394384, 492944, 591504, 690064, 788624, 887184,

 985744, 1084304, 1182864, 1281424, 1379984, 1478544, 1577104, 1675664,

 1774224, 1872784, 1971344, 2069904, 2168464, 2267024, 2365584, 2464144,

 2562704, 2661264, 2759824, 2858384, 2956944, 3055504, 3154064, 3252624,

 3351184, 3449744, 3548304, 3646864, 3745424, 3843984, 3942544, 4041104,

 4139664

$ sudo mount /dev/md0 /mnt

$ cd /mnt

$ sudo iozone -e -I -s 512m -r 4k -i 0 -i 1 -i 2

Iozone: Performance Test of File I/O

        Version $Revision: 3.420 $

[...]

                                                            random  random

              KB  reclen   write rewrite    read    reread    read   write

          524288       4  560145 1114593   933699   831902   56347
158904


iozone test complete.


But introduce NFS into the mix and everything falls apart.

Client:

$ sudo mount -o tcp,nfsv3 f12.phxi:/mnt /mnt

$ cd /mnt

$ sudo iozone -e -I -s 512m -r 4k -i 0 -i 1 -i 2

Iozone: Performance Test of File I/O

        Version $Revision: 3.420 $

[...]

                                                            random  random

              KB  reclen   write rewrite    read    reread    read   write

          524288       4   67246    2923   103295  1272407  172475
196


And the above took 48 minutes to run, compared to 14 seconds for the
local version.  So it's 200x slower over NFS.  The random write test
is over 800x slower.  Of course NFS is slower, that's expected, but it
definitely wasn't this exaggerated in previous releases.

To emphasize that iozone reflects real workloads here, I tried doing
an svn co of the 9-STABLE source tree over NFS but after two hours it
was still in llvm so I gave up.

While all this not-much-of-anything NFS traffic is going on, both
systems are essentially idle.  The process on the client sits in
"newnfs" wait state with nearly no CPU.  The server is completely idle
except for the occasional 0.10% in an nfsd thread, which otherwise
spend their lives in rpcsvc wait state.

Server iostat:

$ iostat -x -w 10 md0

                       extended device statistics

device     r/s   w/s    kr/s    kw/s qlen svc_t  %b

[...]

md0        0.0  36.0     0.0     0.0    0   1.2   0
md0        0.0  38.8     0.0     0.0    0   1.5   0
md0        0.0  73.6     0.0     0.0    0   1.0   0
md0        0.0  53.3     0.0     0.0    0   2.5   0
md0        0.0  33.7     0.0     0.0    0   1.1   0
md0        0.0  45.5     0.0     0.0    0   1.8   0

Server nfsstat:

$ nfsstat -s -w 10

 GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir

[...]

      0      0      0    471    816      0      0      0

      0      0      0    480    751      0      0      0

      0      0      0    481     36      0      0      0

      0      0      0    469    550      0      0      0

      0      0      0    485    814      0      0      0

      0      0      0    467    503      0      0      0

      0      0      0    473    345      0      0      0


Client nfsstat:

$ nfsstat -c -w 10

 GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir

[...]

      0      0      0      0    518      0      0      0

      0      0      0      0    498      0      0      0

      0      0      0      0    503      0      0      0

      0      0      0      0    474      0      0      0

      0      0      0      0    525      0      0      0

      0      0      0      0    497      0      0      0


Server vmstat:

$ vmstat -w 10

 procs      memory      page                    disks     faults         cpu

 r b w     avm    fre   flt  re  pi  po    fr  sr vt0 vt1   in   sy
cs us sy id

[...]

 0 4 0    634M  6043M    37   0   0   0     1   0   0   0 1561   46
3431  0  2 98

 0 4 0    640M  6042M    62   0   0   0    28   0   0   0 1598   94
3552  0  2 98

 0 4 0    648M  6042M    38   0   0   0     0   0   0   0 1609   47
3485  0  1 99

 0 4 0    648M  6042M    37   0   0   0     0   0   0   0 1615   46
3667  0  2 98

 0 4 0    648M  6042M    37   0   0   0     0   0   0   0 1606   45
3678  0  2 98

 0 4 0    648M  6042M    37   0   0   0     0   0   1   0 1561   45
3377  0  2 98


Client vmstat:

$ vmstat -w 10

 procs      memory      page                    disks     faults         cpu

 r b w     avm    fre   flt  re  pi  po    fr  sr md0 da0   in   sy
cs us sy id

[...]

 0 0 0    639M   593M    33   0   0   0  1237   0   0   0  281 5575
1043  0  3 97

 0 0 0    639M   591M     0   0   0   0   712   0   0   0  235  122
889  0  2 98

 0 0 0    639M   583M     0   0   0   0   571   0   0   1  227  120
851  0  2 98

 0 0 0    639M   592M   198   0   0   0  1212   0   0   0  251 2497
950  0  3 97

 0 0 0    639M   586M     0   0   0   0   614   0   0   0  250  121
924  0  2 98

 0 0 0    639M   586M     0   0   0   0   765   0   0   0  250  120
918  0  3 97


Top on the KVM host says it is 93-95% idle and that each VM sits
around 7-10% CPU.  So basically nobody is doing anything.  There's no
visible bottleneck, and I've no idea where to go from here to figure
out what's going on.

Does anyone have any suggestions for debugging this?

Thanks!