From owner-freebsd-net@FreeBSD.ORG Tue Jan 21 02:01:14 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 01C8886F for ; Tue, 21 Jan 2014 02:01:14 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 7453313A6 for ; Tue, 21 Jan 2014 02:01:12 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: X-IronPort-AV: E=Sophos;i="4.95,693,1384318800"; d="scan'208";a="89306762" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 20 Jan 2014 21:01:11 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A4D7DB3F1D; Mon, 20 Jan 2014 21:01:11 -0500 (EST) Date: Mon, 20 Jan 2014 21:01:11 -0500 (EST) From: Rick Macklem To: J David Message-ID: <2057911949.13372985.1390269671666.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: Terrible NFS performance under 9.2-RELEASE? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: freebsd-net@freebsd.org, Adam McDougall X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jan 2014 02:01:14 -0000 Since this is getting long winded, I'm going to "cheat" and top post. (Don't top post flame suit on;-) You could try setting net.inet.tcp.delayed_ack=0 via sysctl. I just looked and it appears that TCP delays ACKs for a while, even when TCP_NODELAY is set (I didn't know that). I honestly don't know how much/if any effect these delayed ACKs will have, but is you disable them, you can see what happens. rick > MIME-Version: 1.0 > Sender: jdavidlists@gmail.com > Received: by 10.42.170.8 with HTTP; Sun, 19 Jan 2014 20:08:04 -0800 > (PST) > In-Reply-To: > <1349281953.12559529.1390174577569.JavaMail.root@uoguelph.ca> > References: <52DC1241.7010004@egr.msu.edu> > <1349281953.12559529.1390174577569.JavaMail.root@uoguelph.ca> > Date: Sun, 19 Jan 2014 23:08:04 -0500 > Delivered-To: jdavidlists@gmail.com > X-Google-Sender-Auth: 2XgnsPkoaEEkfTqW1ZVFM_Lel3o > Message-ID: > > Subject: Re: Terrible NFS performance under 9.2-RELEASE? > From: J David > To: Rick Macklem > Content-Type: text/plain; charset=ISO-8859-1 > > On Sun, Jan 19, 2014 at 9:32 AM, Alfred Perlstein > wrote: > > I hit nearly the same problem and raising the mbufs worked for me. > > > > I'd suggest raising that and retrying. > > That doesn't seem to be an issue here; mbufs are well below max on > both client and server and all the "delayed"/"denied" lines are > 0/0/0. > > > On Sun, Jan 19, 2014 at 12:58 PM, Adam McDougall > wrote: > > Also try rsize=32768,wsize=32768 in your mount options, made a huge > > difference for me. > > This does make a difference, but inconsistently. > > In order to test this further, I created a Debian guest on the same > host as these two FreeBSD hosts and re-ran the tests with it acting > as > both client and server, and ran them for both 32k and 64k. > > Findings: > > > random random > write rewrite read reread read write > > S:FBSD,C:FBSD,Z:64k > 67246 2923 103295 1272407 172475 196 > > S:FBSD,C:FBSD,Z:32k > 11951 99896 223787 1051948 223276 13686 > > S:FBSD,C:DEB,Z:64k > 11414 14445 31554 30156 30368 13799 > > S:FBSD,C:DEB,Z:32k > 11215 14442 31439 31026 29608 13769 > > S:DEB,C:FBSD,Z:64k > 36844 173312 313919 1169426 188432 14273 > > S:DEB,C:FBSD,Z:32k > 66928 120660 257830 1048309 225807 18103 > > So the rsize/wsize makes a difference between two FreeBSD nodes, but > with a Debian node as either client or server, it no longer seems to > matter much. And /proc/mounts on the debian box confirms that it > negotiates and honors the 64k size as a client. > > On Sun, Jan 19, 2014 at 6:36 PM, Rick Macklem > wrote: > > Yes, it shouldn't make a big difference but it sometimes does. When > > it > > does, I believe that indicates there is a problem with your network > > fabric. > > Given that this is an entirely virtual environment, if your belief is > correct, where would supporting evidence be found? > > As far as I can tell, there are no interface errors reported on the > host (checking both taps and the bridge) or any of the guests, > nothing > in sysctl dev.vtnet of concern, etc. Also the improvement from using > debian on either side, even with 64k sizes, seems counterintuitive. > > To try to help vindicate the network stack, I did iperf -d between > the > two FreeBSD nodes while the iozone was running: > > Server: > > $ iperf -s > > ------------------------------------------------------------ > > Server listening on TCP port 5001 > > TCP window size: 1.00 MByte (default) > > ------------------------------------------------------------ > > [ 4] local 172.20.20.162 port 5001 connected with 172.20.20.169 port > 37449 > > ------------------------------------------------------------ > > Client connecting to 172.20.20.169, TCP port 5001 > > TCP window size: 1.00 MByte (default) > > ------------------------------------------------------------ > > [ 6] local 172.20.20.162 port 28634 connected with 172.20.20.169 > port 5001 > > Waiting for server threads to complete. Interrupt again to force > quit. > > [ ID] Interval Transfer Bandwidth > > [ 6] 0.0-10.0 sec 15.8 GBytes 13.6 Gbits/sec > > [ 4] 0.0-10.0 sec 15.6 GBytes 13.4 Gbits/sec > > > Client: > > $ iperf -c 172.20.20.162 -d > > ------------------------------------------------------------ > > Server listening on TCP port 5001 > > TCP window size: 1.00 MByte (default) > > ------------------------------------------------------------ > > ------------------------------------------------------------ > > Client connecting to 172.20.20.162, TCP port 5001 > > TCP window size: 1.00 MByte (default) > > ------------------------------------------------------------ > > [ 5] local 172.20.20.169 port 32533 connected with 172.20.20.162 > port 5001 > > [ 4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port > 36617 > > [ ID] Interval Transfer Bandwidth > > [ 5] 0.0-10.0 sec 15.6 GBytes 13.4 Gbits/sec > > [ 4] 0.0-10.0 sec 15.5 GBytes 13.3 Gbits/sec > > > mbuf usage is pretty low. > > Server: > > $ netstat -m > > 545/4075/4620 mbufs in use (current/cache/total) > > 535/1819/2354/131072 mbuf clusters in use (current/cache/total/max) > > 535/1641 mbuf+clusters out of packet secondary zone in use > (current/cache) > > 0/2034/2034/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > > 1206K/12792K/13999K bytes allocated to network (current/cache/total) > > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > > 0/0/0 sfbufs in use (current/peak/max) > > 0 requests for sfbufs denied > > 0 requests for sfbufs delayed > > 0 requests for I/O initiated by sendfile > > 0 calls to protocol drain routines > > > Client: > > $ netstat -m > > 1841/3544/5385 mbufs in use (current/cache/total) > > 1172/1198/2370/32768 mbuf clusters in use (current/cache/total/max) > > 512/896 mbuf+clusters out of packet secondary zone in use > (current/cache) > > 0/2314/2314/16384 4k (page size) jumbo clusters in use > (current/cache/total/max) > > 0/0/0/8192 9k jumbo clusters in use (current/cache/total/max) > > 0/0/0/4096 16k jumbo clusters in use (current/cache/total/max) > > 2804K/12538K/15342K bytes allocated to network (current/cache/total) > > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > > 0/0/0 sfbufs in use (current/peak/max) > > 0 requests for sfbufs denied > > 0 requests for sfbufs delayed > > 0 requests for I/O initiated by sendfile > > 0 calls to protocol drain routines > > > > Here's 60 seconds of netstat -ss for ip and tcp from the server with > 64k mount running ozone: > > ip: > > 4776 total packets received > > 4758 packets for this host > > 18 packets for unknown/unsupported protocol > > 2238 packets sent from this host > > tcp: > > 2244 packets sent > > 1427 data packets (238332 bytes) > > 5 data packets (820 bytes) retransmitted > > 812 ack-only packets (587 delayed) > > 2235 packets received > > 1428 acks (for 238368 bytes) > > 2007 packets (91952792 bytes) received in-sequence > > 225 out-of-order packets (325800 bytes) > > 1428 segments updated rtt (of 1426 attempts) > > 5 retransmit timeouts > > 587 correct data packet header predictions > > 225 SACK options (SACK blocks) sent > > > And with 32k mount: > > ip: > > 24172 total packets received > > 24167 packets for this host > > 5 packets for unknown/unsupported protocol > > 26130 packets sent from this host > > tcp: > > 26130 packets sent > > 23506 data packets (5362120 bytes) > > 2624 ack-only packets (454 delayed) > > 21671 packets received > > 18143 acks (for 5362192 bytes) > > 20278 packets (756617316 bytes) received in-sequence > > 96 out-of-order packets (145964 bytes) > > 18143 segments updated rtt (of 17469 attempts) > > 1093 correct ACK header predictions > > 3449 correct data packet header predictions > > 111 SACK options (SACK blocks) sent > > > So the 32k mount sends about 6x the packet volume. (This is on > iozone's linear write test.) > > One thing I've noticed is that when the 64k connection bogs down, it > seems to "poison" things for awhile. For example, iperf will start > doing this afterward: > > From the client to the server: > > $ iperf -c 172.20.20.162 > > ------------------------------------------------------------ > > Client connecting to 172.20.20.162, TCP port 5001 > > TCP window size: 1.00 MByte (default) > > ------------------------------------------------------------ > > [ 3] local 172.20.20.169 port 14337 connected with 172.20.20.162 > port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.1 sec 4.88 MBytes 4.05 Mbits/sec > > > Ouch! That's quite a drop from 13Gbit/sec. Weirdly, iperf to the > debian node not affected: > > From the client to the debian node: > > $ iperf -c 172.20.20.166 > > ------------------------------------------------------------ > > Client connecting to 172.20.20.166, TCP port 5001 > > TCP window size: 1.00 MByte (default) > > ------------------------------------------------------------ > > [ 3] local 172.20.20.169 port 24376 connected with 172.20.20.166 > port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 20.4 GBytes 17.5 Gbits/sec > > > From the debian node to the server: > > $ iperf -c 172.20.20.162 > > ------------------------------------------------------------ > > Client connecting to 172.20.20.162, TCP port 5001 > > TCP window size: 23.5 KByte (default) > > ------------------------------------------------------------ > > [ 3] local 172.20.20.166 port 43166 connected with 172.20.20.162 > port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-10.0 sec 12.9 GBytes 11.1 Gbits/sec > > > But if I let it run for longer, it will apprently figure things out > and creep back up to normal speed and stay there until NFS strikes > again. It's like the kernel is caching some sort of hint that > connectivity to that other host sucks, and it has to either expire or > be slowly overcome. > > Client: > > $ iperf -c 172.20.20.162 -t 60 > > ------------------------------------------------------------ > > Client connecting to 172.20.20.162, TCP port 5001 > > TCP window size: 1.00 MByte (default) > > ------------------------------------------------------------ > > [ 3] local 172.20.20.169 port 59367 connected with 172.20.20.162 > port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-60.0 sec 56.2 GBytes 8.04 Gbits/sec > > > Server: > > $ netstat -I vtnet1 -ihw 1 > > input (vtnet1) output > > packets errs idrops bytes packets errs bytes colls > > 7 0 0 420 0 0 0 0 > > 7 0 0 420 0 0 0 0 > > 8 0 0 480 0 0 0 0 > > 8 0 0 480 0 0 0 0 > > 7 0 0 420 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 11 0 0 12k 3 0 206 0 > <--- starts here > > 17 0 0 227k 10 0 660 0 > > 17 0 0 408k 10 0 660 0 > > 17 0 0 417k 10 0 660 0 > > 17 0 0 425k 10 0 660 0 > > 17 0 0 438k 10 0 660 0 > > 17 0 0 444k 10 0 660 0 > > 16 0 0 453k 10 0 660 0 > > input (vtnet1) output > > packets errs idrops bytes packets errs bytes colls > > 16 0 0 463k 10 0 660 0 > > 16 0 0 469k 10 0 660 0 > > 16 0 0 482k 10 0 660 0 > > 16 0 0 487k 10 0 660 0 > > 16 0 0 496k 10 0 660 0 > > 16 0 0 504k 10 0 660 0 > > 18 0 0 510k 10 0 660 0 > > 16 0 0 521k 10 0 660 0 > > 17 0 0 524k 10 0 660 0 > > 17 0 0 538k 10 0 660 0 > > 17 0 0 540k 10 0 660 0 > > 17 0 0 552k 10 0 660 0 > > 17 0 0 554k 10 0 660 0 > > 17 0 0 567k 10 0 660 0 > > 16 0 0 568k 10 0 660 0 > > 16 0 0 581k 10 0 660 0 > > 16 0 0 582k 10 0 660 0 > > 16 0 0 595k 10 0 660 0 > > 16 0 0 595k 10 0 660 0 > > 16 0 0 609k 10 0 660 0 > > 16 0 0 609k 10 0 660 0 > > input (vtnet1) output > > packets errs idrops bytes packets errs bytes colls > > 16 0 0 620k 10 0 660 0 > > 16 0 0 623k 10 0 660 0 > > 17 0 0 632k 10 0 660 0 > > 17 0 0 637k 10 0 660 0 > > 8.7k 0 0 389M 4.4k 0 288k 0 > > 42k 0 0 2.1G 21k 0 1.4M 0 > > 41k 0 0 2.1G 20k 0 1.4M 0 > > 38k 0 0 1.9G 19k 0 1.2M 0 > > 40k 0 0 2.0G 20k 0 1.3M 0 > > 40k 0 0 2.0G 20k 0 1.3M 0 > > 40k 0 0 2G 20k 0 1.3M 0 > > 39k 0 0 2G 20k 0 1.3M 0 > > 43k 0 0 2.2G 22k 0 1.4M 0 > > 42k 0 0 2.2G 21k 0 1.4M 0 > > 39k 0 0 2G 19k 0 1.3M 0 > > 38k 0 0 1.9G 19k 0 1.2M 0 > > 42k 0 0 2.1G 21k 0 1.4M 0 > > 44k 0 0 2.2G 22k 0 1.4M 0 > > 41k 0 0 2.1G 20k 0 1.3M 0 > > 41k 0 0 2.1G 21k 0 1.4M 0 > > 40k 0 0 2.0G 20k 0 1.3M 0 > > input (vtnet1) output > > packets errs idrops bytes packets errs bytes colls > > 43k 0 0 2.2G 22k 0 1.4M 0 > > 41k 0 0 2.1G 20k 0 1.3M 0 > > 40k 0 0 2.0G 20k 0 1.3M 0 > > 42k 0 0 2.2G 21k 0 1.4M 0 > > 39k 0 0 2G 19k 0 1.3M 0 > > 42k 0 0 2.1G 21k 0 1.4M 0 > > 40k 0 0 2.0G 20k 0 1.3M 0 > > 42k 0 0 2.1G 21k 0 1.4M 0 > > 38k 0 0 2G 19k 0 1.3M 0 > > 39k 0 0 2G 20k 0 1.3M 0 > > 45k 0 0 2.3G 23k 0 1.5M 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > > It almost looks like something is limiting it to 10 packets per > second. So confusing! TCP super slow start? > > Thanks! > > (Sorry Rick, forgot to reply all so you got an extra! :( ) > > Also, here's the netstat from the client side showing the 10 packets > per second limit and eventual recovery: > > $ netstat -I net1 -ihw 1 > > input (net1) output > > packets errs idrops bytes packets errs bytes colls > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 15 0 0 962 11 0 114k 0 > > 17 0 0 1.1k 10 0 368k 0 > > 17 0 0 1.1k 10 0 411k 0 > > 17 0 0 1.1k 10 0 425k 0 > > 17 0 0 1.1k 10 0 432k 0 > > 17 0 0 1.1k 10 0 439k 0 > > 17 0 0 1.1k 10 0 452k 0 > > 16 0 0 1k 10 0 457k 0 > > 16 0 0 1k 10 0 467k 0 > > 16 0 0 1k 10 0 477k 0 > > 16 0 0 1k 10 0 481k 0 > > 16 0 0 1k 10 0 495k 0 > > 16 0 0 1k 10 0 498k 0 > > 16 0 0 1k 10 0 510k 0 > > 16 0 0 1k 10 0 515k 0 > > 16 0 0 1k 10 0 524k 0 > > 17 0 0 1.1k 10 0 532k 0 > > input (net1) output > > packets errs idrops bytes packets errs bytes colls > > 17 0 0 1.1k 10 0 538k 0 > > 17 0 0 1.1k 10 0 548k 0 > > 17 0 0 1.1k 10 0 552k 0 > > 17 0 0 1.1k 10 0 562k 0 > > 17 0 0 1.1k 10 0 566k 0 > > 16 0 0 1k 10 0 576k 0 > > 16 0 0 1k 10 0 580k 0 > > 16 0 0 1k 10 0 590k 0 > > 17 0 0 1.1k 10 0 594k 0 > > 16 0 0 1k 10 0 603k 0 > > 16 0 0 1k 10 0 609k 0 > > 16 0 0 1k 10 0 614k 0 > > 16 0 0 1k 10 0 623k 0 > > 16 0 0 1k 10 0 626k 0 > > 17 0 0 1.1k 10 0 637k 0 > > 18 0 0 1.1k 10 0 637k 0 > > 17k 0 0 1.1M 34k 0 1.7G 0 > > 21k 0 0 1.4M 42k 0 2.1G 0 > > 20k 0 0 1.3M 39k 0 2G 0 > > 19k 0 0 1.2M 38k 0 1.9G 0 > > 20k 0 0 1.3M 41k 0 2.0G 0 > > input (net1) output > > packets errs idrops bytes packets errs bytes colls > > 20k 0 0 1.3M 40k 0 2.0G 0 > > 19k 0 0 1.2M 38k 0 1.9G 0 > > 22k 0 0 1.5M 45k 0 2.3G 0 > > 20k 0 0 1.3M 40k 0 2.1G 0 > > 20k 0 0 1.3M 40k 0 2.1G 0 > > 18k 0 0 1.2M 36k 0 1.9G 0 > > 21k 0 0 1.4M 41k 0 2.1G 0 > > 22k 0 0 1.4M 44k 0 2.2G 0 > > 21k 0 0 1.4M 43k 0 2.2G 0 > > 20k 0 0 1.3M 41k 0 2.1G 0 > > 20k 0 0 1.3M 40k 0 2.0G 0 > > 21k 0 0 1.4M 43k 0 2.2G 0 > > 21k 0 0 1.4M 43k 0 2.2G 0 > > 20k 0 0 1.3M 40k 0 2.0G 0 > > 21k 0 0 1.4M 43k 0 2.2G 0 > > 19k 0 0 1.2M 38k 0 1.9G 0 > > 21k 0 0 1.4M 42k 0 2.1G 0 > > 20k 0 0 1.3M 40k 0 2.0G 0 > > 21k 0 0 1.4M 42k 0 2.1G 0 > > 20k 0 0 1.3M 40k 0 2.0G 0 > > 20k 0 0 1.3M 40k 0 2.0G 0 > > input (net1) output > > packets errs idrops bytes packets errs bytes colls > > 24k 0 0 1.6M 48k 0 2.5G 0 > > 6.3k 0 0 417k 12k 0 647M 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > > 6 0 0 360 0 0 0 0 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" >