Date: Fri, 26 Jun 2015 20:42:08 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Scott Larson <stl@wiredrive.com> Cc: Gerrit =?utf-8?B?S8O8aG4=?= <gerrit.kuehn@aei.mpg.de>, freebsd-net@freebsd.org, carsten aulbert <carsten.aulbert@aei.mpg.de> Subject: Re: NFS on 10G interface terribly slow Message-ID: <1629011632.413406.1435365728977.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <CAFt8naF7xmZW8bgVrhrL=CaPXiVURqDLsNN5-NHDg=hiv-Qmtw@mail.gmail.com> References: <20150625145238.12cf9da3b368ef0b9a30f193@aei.mpg.de> <CAFt8naF7xmZW8bgVrhrL=CaPXiVURqDLsNN5-NHDg=hiv-Qmtw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Scott Larson wrote: > We've got 10.0 and 10.1 servers accessing Isilon and Nexenta via NFS > with Intel 10G gear and bursting to near wire speed with the stock > MTU/rsize/wsize works as expected. TSO definitely needs to be enabled for > that performance. Btw, can you tell us what Intel chip(s) you're using? For example, from the "ix" driver: #define IXGBE_82598_SCATTER=09=09100 #define IXGBE_82599_SCATTER=09=0932 This implies that the 82598 won't have problems with 64K TSO segments, but the 82599 will end up doing calls to m_defrag() which copies the entire list of mbufs into 32 new mbuf clusters for each of them. --> Even for one driver, different chips may result in different NFS perf. Btw, it appears that the driver in head/current now sets if_hw_tsomaxsegcou= nt, but the driver in stable/10 does not. This means that the 82599 chip will e= nd up doing the m_defrag() calls for 10.x. rick > The fact iperf gives you the expected throughput but NFS > does not would have me looking at tuning for the NFS platform. Other thin= gs > to look at: Are all the servers involved negotiating the correct speed an= d > duplex, with TSO? Does it need to have the network stack tuned with > whatever it's equivalent of maxsockbuf and send/recvbuf are? Do the switc= h > ports and NIC counters show any drops or errors? On the FBSD servers you > could also run 'netstat -i -w 1' under load to see if drops are occurring > locally, or 'systat -vmstat' for resource contention problems. But again,= a > similar setup here and no such issues have appeared. >=20 >=20 > *[image: userimage]Scott Larson[image: los angeles] > <https://www.google.com/maps/place/4216+Glencoe+Ave,+Marina+Del+Rey,+CA+9= 0292/@33.9892151,-118.4421334,17z/data=3D!3m1!4b1!4m2!3m1!1s0x80c2ba88ffae9= 14d:0x14e1d00084d4d09c>Lead > Systems Administrator[image: wdlogo] <https://www.wiredrive.com/> [image: > linkedin] <https://www.linkedin.com/company/wiredrive> [image: facebook] > <https://www.twitter.com/wiredrive> [image: twitter] > <https://www.facebook.com/wiredrive> [image: instagram] > <https://www.instagram.com/wiredrive>T 310 823 8238 x1106 > <310%20823%208238%20x1106> | M 310 904 8818 <310%20904%208818>* >=20 > On Thu, Jun 25, 2015 at 5:52 AM, Gerrit K=C3=BChn <gerrit.kuehn@aei.mpg.d= e> > wrote: >=20 > > Hi all, > > > > We have a recent FreeBSD 10.1 installation here that is supposed to act= as > > nfs (v3) client to an Oracle x4-2l server running Soalris 11.2. > > We have Intel 10-Gigabit X540-AT2 NICs on both ends, iperf is showing > > plenty of bandwidth (9.xGB/s) in both directions. > > However, nfs appears to be terribly slow, especially for writing: > > > > root@crest:~ # dd if=3D/dev/zero of=3D/net/hellpool/Z bs=3D1024k count= =3D1000 > > 1000+0 records in > > 1000+0 records out > > 1048576000 bytes transferred in 20.263190 secs (51747824 bytes/sec) > > > > > > Reading appears to be faster, but still far away from full bandwidth: > > > > root@crest:~ # dd of=3D/dev/null if=3D/net/hellpool/Z bs=3D1024k > > 1000+0 records in > > 1000+0 records out > > 1048576000 bytes transferred in 5.129869 secs (204406000 bytes/sec) > > > > > > We have already tried to tune rsize/wsize parameters, but they appear t= o > > have little (if any) impact on these results. Also, neither stripping d= own > > rxsum, txsum, tso etc. from the interface nor increasing MTU to 9000 fo= r > > jumbo frames did improve anything. > > It is quite embarrassing to achieve way less than 1GBE performance with > > 10GBE equipment. Are there any hints what else might be causing this (a= nd > > how to fix it)? > > > > > > cu > > Gerrit > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1629011632.413406.1435365728977.JavaMail.zimbra>