FreeBSD Mail Archives

Date:      Fri, 26 Jun 2015 20:42:08 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Scott Larson <stl@wiredrive.com>
Cc:        Gerrit =?utf-8?B?S8O8aG4=?= <gerrit.kuehn@aei.mpg.de>,  freebsd-net@freebsd.org, carsten aulbert <carsten.aulbert@aei.mpg.de>
Subject:   Re: NFS on 10G interface terribly slow
Message-ID:  <1629011632.413406.1435365728977.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <CAFt8naF7xmZW8bgVrhrL=CaPXiVURqDLsNN5-NHDg=hiv-Qmtw@mail.gmail.com>
References:  <20150625145238.12cf9da3b368ef0b9a30f193@aei.mpg.de> <CAFt8naF7xmZW8bgVrhrL=CaPXiVURqDLsNN5-NHDg=hiv-Qmtw@mail.gmail.com>

Scott Larson wrote:
> We've got 10.0 and 10.1 servers accessing Isilon and Nexenta via NFS
> with Intel 10G gear and bursting to near wire speed with the stock
> MTU/rsize/wsize works as expected. TSO definitely needs to be enabled for
> that performance.
Btw, can you tell us what Intel chip(s) you're using?

For example, from the "ix" driver:
#define IXGBE_82598_SCATTER=09=09100
#define IXGBE_82599_SCATTER=09=0932

This implies that the 82598 won't have problems with 64K TSO segments, but
the 82599 will end up doing calls to m_defrag() which copies the entire
list of mbufs into 32 new mbuf clusters for each of them.
--> Even for one driver, different chips may result in different NFS perf.

Btw, it appears that the driver in head/current now sets if_hw_tsomaxsegcou=
nt,
but the driver in stable/10 does not. This means that the 82599 chip will e=
nd
up doing the m_defrag() calls for 10.x.

rick

> The fact iperf gives you the expected throughput but NFS
> does not would have me looking at tuning for the NFS platform. Other thin=
gs
> to look at: Are all the servers involved negotiating the correct speed an=
d
> duplex, with TSO? Does it need to have the network stack tuned with
> whatever it's equivalent of maxsockbuf and send/recvbuf are? Do the switc=
h
> ports and NIC counters show any drops or errors? On the FBSD servers you
> could also run 'netstat -i -w 1' under load to see if drops are occurring
> locally, or 'systat -vmstat' for resource contention problems. But again,=
 a
> similar setup here and no such issues have appeared.
>=20
>=20
> *[image: userimage]Scott Larson[image: los angeles]
> <https://www.google.com/maps/place/4216+Glencoe+Ave,+Marina+Del+Rey,+CA+9=
0292/@33.9892151,-118.4421334,17z/data=3D!3m1!4b1!4m2!3m1!1s0x80c2ba88ffae9=
14d:0x14e1d00084d4d09c>Lead
> Systems Administrator[image: wdlogo] <https://www.wiredrive.com/>; [image:
> linkedin] <https://www.linkedin.com/company/wiredrive>; [image: facebook]
> <https://www.twitter.com/wiredrive>; [image: twitter]
> <https://www.facebook.com/wiredrive>; [image: instagram]
> <https://www.instagram.com/wiredrive>T 310 823 8238 x1106
> <310%20823%208238%20x1106>  |  M 310 904 8818 <310%20904%208818>*
>=20
> On Thu, Jun 25, 2015 at 5:52 AM, Gerrit K=C3=BChn <gerrit.kuehn@aei.mpg.d=
e>
> wrote:
>=20
> > Hi all,
> >
> > We have a recent FreeBSD 10.1 installation here that is supposed to act=
 as
> > nfs (v3) client to an Oracle x4-2l server running Soalris 11.2.
> > We have Intel 10-Gigabit X540-AT2 NICs on both ends, iperf is showing
> > plenty of bandwidth (9.xGB/s) in both directions.
> > However, nfs appears to be terribly slow, especially for writing:
> >
> > root@crest:~ # dd if=3D/dev/zero of=3D/net/hellpool/Z bs=3D1024k count=
=3D1000
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes transferred in 20.263190 secs (51747824 bytes/sec)
> >
> >
> > Reading appears to be faster, but still far away from full bandwidth:
> >
> > root@crest:~ # dd of=3D/dev/null if=3D/net/hellpool/Z bs=3D1024k
> > 1000+0 records in
> > 1000+0 records out
> > 1048576000 bytes transferred in 5.129869 secs (204406000 bytes/sec)
> >
> >
> > We have already tried to tune rsize/wsize parameters, but they appear t=
o
> > have little (if any) impact on these results. Also, neither stripping d=
own
> > rxsum, txsum, tso etc. from the interface nor increasing MTU to 9000 fo=
r
> > jumbo frames did improve anything.
> > It is quite embarrassing to achieve way less than 1GBE performance with
> > 10GBE equipment. Are there any hints what else might be causing this (a=
nd
> > how to fix it)?
> >
> >
> > cu
> >   Gerrit
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> >
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1629011632.413406.1435365728977.JavaMail.zimbra>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation