Date: Fri, 26 Jun 2015 19:53:50 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Damien Fleuriot <ml@my.gd> Cc: Gerrit =?utf-8?B?S8O8aG4=?= <gerrit.kuehn@aei.mpg.de>, freebsd-net@freebsd.org Subject: Re: NFS on 10G interface terribly slow Message-ID: <1709150198.407064.1435362830724.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <CAE63ME4wwXhwL4T411aGFeo3BC1bfEdZa71N1FyWNyV7%2Bp15oA@mail.gmail.com> References: <20150625145238.12cf9da3b368ef0b9a30f193@aei.mpg.de> <623856025.328424.1435279751389.JavaMail.zimbra@uoguelph.ca> <20150626115943.7d0b441cda2c6cc5b817b181@aei.mpg.de> <CAE63ME4wwXhwL4T411aGFeo3BC1bfEdZa71N1FyWNyV7%2Bp15oA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Damien Fleuriot wrote: > Gerrit, >=20 >=20 > Everyone's talking about the network performance and to some extent NFS > tuning. > I would argue that given your iperf results, the network itself is not at > fault. >=20 In this case, I think you might be correct. However, I need to note that NFS traffic is very different than what iperf generates and a good result from iperf does not imply that there isn't a network related problem causing NFS grief. A couple of examples: - NFS generates TSO segments that are sometimes just under 64K in length. If the network interface has TSO enabled but cannot handle a list of 35 or more transmit segments (mbufs in list), this can cause problems. Systems more than about 1year old could fail completely when the TSO segment + IP header exceeded 64K for network interfaces limited to 32 transmit segments (32 * MCLBYTES =3D=3D 64K). Also, some interfaces used m_collapse() to try and fix the case where the TSO segment had too many transmit segments in it and this almost always failed (you need to use m_defrag()). --> The worst case failures have been fixed by reducing the default maximum TSO segment size to slightly less than 64K (by the maximum MAC header length). However, drivers limited to less than 35 transmit segments (which includes at least one of the most common Intel chips) still end up generating a lot of overhead by calling m_defrag() over and over and over again (with the possibility of failure if mbuf clusters become exhausted). --> To fix this well, net device drivers need to set a field called if_hw_tsomaxsegcount, but if you look in -head, you won't find it set in many drivers. (I've posted to freebsd-net multiple times asking the net device driver authors to do this, but it hasn't happen= ed yet.) Usually avoided by disabling TSO. Another failure case I've seen in the past was where a network interface would drop a packet in a stream of closely spaced packets on the receive side while concurrently transmitting. (NFS traffic is bi-directional and it is common to be receiving and transmitting on a TCP socket concurrently.= ) NFS traffic is also very bursty, and that seems to cause problems for certa= in network interfaces. These can usually be worked around by reducing rsize, wsize. (Reducing rsiz= e, wsize also "fixes" the 64K TSO segment problem, since the TSO segments won't be a= s large.) There are also issues w.r.t. kernel address space (the area used for mbuf c= luster mapping) exhaustion when jumbo packets are used, resulting in allocation of multiple sized mbuf clusters. I think you can see not all of these will be evident from iperf results. rick =20 > In your first post I see no information regarding the local performance o= f > your disks, sans le NFS that is. >=20 > You may want to look into that first and ensure you get good read and wri= te > results on the Solaris box, before trying to fix that which might not be = at > fault. > Perhaps your NFS implementation is already giving you the maximum speed t= he > disks can achieve, or close enough. >=20 > You may also want to compare the results with another NFS client to the > Oracle server, say, god forbid, a *nux box for example. >=20 >=20 > On 26 June 2015 at 11:59, Gerrit K=C3=BChn <gerrit.kuehn@aei.mpg.de> wrot= e: >=20 > > On Thu, 25 Jun 2015 20:49:11 -0400 (EDT) Rick Macklem > > <rmacklem@uoguelph.ca> wrote about Re: NFS on 10G interface terribly sl= ow: > > > > > > RM> Recent commits to stable/10 (not in 10.1) done by Alexander Motin > > RM> (mav@) might help w.r.t. write performance (it avoids large writes > > RM> doing synchronous writes when the wcommitsize is exceeded). If you = can > > RM> try stable/10, that might be worth it. > > > > Ok, I'll schedule an update then, I guess. OTOH, Scott reported that a > > similar setup is working fine for him with 10.0 and 10.1, so there is > > probably not much to gain. I'll try anyway... > > > > RM> Otherwise, the main mount option you can try is "wcommitsize", whic= h > > RM> you probably want to make larger. > > > > Hm, which size would you recommend? I cannot find anything about this > > setting, not even what the default value would be. Is this reflected in > > some sysctl, or how can I find out what the actual value is? > > > > > > cu > > Gerrit > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1709150198.407064.1435362830724.JavaMail.zimbra>