Date: Thu, 8 Oct 2015 08:32:27 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Hans Petter Selasky <hps@selasky.org> Cc: Daniel Braniss <danny@cs.huji.ac.il>, pyunyh@gmail.com, FreeBSD Net <freebsd-net@freebsd.org>, FreeBSD stable <freebsd-stable@freebsd.org> Subject: Re: ix(intel) vs mlxen(mellanox) 10Gb performance Message-ID: <855415533.28142431.1444307547414.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <56163896.3020907@selasky.org> References: <1D52028A-B39F-4F9B-BD38-CB1D73BF5D56@cs.huji.ac.il> <49173B1F-7B5E-4D59-8651-63D97B0CB5AC@cs.huji.ac.il> <1815942485.29539597.1440370972998.JavaMail.zimbra@uoguelph.ca> <55DAC623.60006@selasky.org> <62C7B1A3-CC6B-41A1-B254-6399F19F8FF7@cs.huji.ac.il> <2112273205.29795512.1440419111720.JavaMail.zimbra@uoguelph.ca> <1E679659-BA50-42C3-B569-03579E322685@cs.huji.ac.il> <56163896.3020907@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hans Petter Selasky wrote: > Hi, > > I've now MFC'ed r287775 to 10-stable and 9-stable. I hope this will > resolve the issues with m_defrag() being called on too long mbuf chains > due to an off-by-one in the driver TSO parameters and that it will be > easier to maintain these parameters in the future. > > Some comments were made that we might want to have an option to select > if the IP-header should be counted or not. Certain network drivers > require copying of the whole ETH/TCP/IP-header into separate memory > areas, and can then handle one more data payload mbuf for TSO. Others > required DMA-ing of the whole mbuf TSO chain. I think it is acceptable > to have one TX-DMA segment slot free, in case of 2K mbuf clusters being > used for TSO. From my experience the limitation typically kicks in when > 2K mbuf clusters are used for TSO instead of 4K mbuf clusters. 65536 / > 4096 = 16, whereas 65536 / 2048 = 32. If an ethernet hardware driver has > a limitation of 24 data segments (mlxen), and assuming that each mbuf > represent a single segment, then iff the majority of mbufs being > transmitted are 2K clusters we may have a small, 1/24 = 4.2%, loss of TX > capability per TSO packet. From what I've seen using iperf, which in > turn calls m_uiotombuf() which in turn calls m_getm2(), MJUMPPAGESIZE'ed > mbuf clusters are preferred for large data transfers, so this issue > might only happen in case of NODELAY being used on the socket and if the > writes are small from the application point of view. If an application > is writing small amounts of data per send() system call, it is expected > to degrade the system performance. > Btw, last year I did some testing with NFS generating chains of 4K (page size) clusters instead of 2K (MCLBYTES). Although not easily reproduced, I was able to fragment the KVM used for the cluster enough that allocations would fail. (I could only get it to happen when the code used 4K clusters for large NFS requests/replies and 2K clusters otherwise, resulting in a mix of allocations of both sizes.) As such, I never committed the changes to head. Any kernel change that does 4K cluster allocations needs to be carefully tested carefully (a small i386 like I have), imho. > Please file a PR if it becomes an issue. > > Someone asked me to MFC r287775 to 10.X release aswell. Is this still > required? > > --HPS Thanks for doing this, rick > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?855415533.28142431.1444307547414.JavaMail.zimbra>