From owner-freebsd-stable@freebsd.org Tue Aug 18 21:54:49 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C14229BB92C; Tue, 18 Aug 2015 21:54:49 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 4CC721DA6; Tue, 18 Aug 2015 21:54:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:LX178BFLw6jt9mxB25wmPZ1GYnF86YWxBRYc798ds5kLTJ75oMSwAkXT6L1XgUPTWs2DsrQf27GQ7/2rAT1IyK3CmU5BWaQEbwUCh8QSkl5oK+++Imq/EsTXaTcnFt9JTl5v8iLzG0FUHMHjew+a+SXqvnYsExnyfTB4Ov7yUtaLyZ/njKbuptaLMk1hv3mUX/BbFF2OtwLft80b08NJC50a7V/3mEZOYPlc3mhyJFiezF7W78a0+4N/oWwL46pyv+YJa6jxfrw5QLpEF3xmdjltvIy4/SXEGCuG4GBUamgKjhdSSzPI6BjhXYa55ivircJm1S2TJs7nC7cuVmLxwb1sTUrSiSwEfxsw+2LTh8k42LheqRmioxF665PTb5yYMOJ+OKjUK4BJDVFdV9pcAnQSSri3aJECWq9YZb5V X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2BAAgBNqdNV/61jaINdg29pBoMeumQBCYFtCoUxSgKBcxQBAQEBAQEBAYEJgh2CBgEBAQMBAQEBIAQnIAsFCwIBCBgCAg0ZAgInAQkmAgQIBwQBHASIBQgNu2yWHwEBAQEBAQEBAQEBAQEBAQEBARYEgSKKMYQyBgEBHDQHgmmBQwWVIYUEhQadDwImhBkiMwd/CBcjgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,705,1432612800"; d="scan'208";a="233260254" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 18 Aug 2015 17:54:09 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3C93B15F565; Tue, 18 Aug 2015 17:54:09 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id YF57cG3-UtGe; Tue, 18 Aug 2015 17:54:08 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 74D0115F56D; Tue, 18 Aug 2015 17:54:08 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id e_V1IkBsZs1R; Tue, 18 Aug 2015 17:54:08 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 53ACE15F565; Tue, 18 Aug 2015 17:54:08 -0400 (EDT) Date: Tue, 18 Aug 2015 17:54:08 -0400 (EDT) From: Rick Macklem To: Hans Petter Selasky Cc: Daniel Braniss , FreeBSD Net , Christopher Forgeron , FreeBSD stable , Slawa Olhovchenkov Message-ID: <1325951625.25292515.1439934848268.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <55D333D6.5040102@selasky.org> References: <1D52028A-B39F-4F9B-BD38-CB1D73BF5D56@cs.huji.ac.il> <17871443-E105-4434-80B1-6939306A865F@cs.huji.ac.il> <473274181.23263108.1439814072514.JavaMail.zimbra@uoguelph.ca> <7F892C70-9C04-4468-9514-EDBFE75CF2C6@cs.huji.ac.il> <805850043.24018217.1439848150695.JavaMail.zimbra@uoguelph.ca> <9D8B0503-E8FA-43CA-88F0-01F184F84D9B@cs.huji.ac.il> <1721122651.24481798.1439902381663.JavaMail.zimbra@uoguelph.ca> <55D333D6.5040102@selasky.org> Subject: Re: ix(intel) vs mlxen(mellanox) 10Gb performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: ix(intel) vs mlxen(mellanox) 10Gb performance Thread-Index: 2kCu0WEXLG1Xa/qUNDzhyfV3vrvVUQ== X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Aug 2015 21:54:49 -0000 Hans Petter Selasky wrote: > On 08/18/15 14:53, Rick Macklem wrote: > > 2572 ifp->if_hw_tsomax = 65518; > >> >2573 ifp->if_hw_tsomaxsegcount = IXGBE_82599_SCATTER; > >> >2574 ifp->if_hw_tsomaxsegsize = 2048; > > Hi, > > If IXGBE_82599_SCATTER is the maximum scatter/gather entries the > hardware can do, remember to subtract one fragment for the TCP/IP-header > mbuf! > Ouch! Yes, I now see that the code that counts the # of mbufs is before the code that adds the tcp/ip header mbuf. In my opinion, this should be fixed by setting if_hw_tsomaxsegcount to whatever the driver provides - 1. It is not the driver's responsibility to know if a tcp/ip header mbuf will be added and is a lot less confusing that expecting the driver author to know to subtract one. (I had mistakenly thought that tcp_output() had added the tc/ip header mbuf before the loop that counts mbufs in the list. Btw, this tcp/ip header mbuf also has leading space for the MAC layer header.) > I think there is an off-by-one here: > > ifp->if_hw_tsomax = 65518; > ifp->if_hw_tsomaxsegcount = IXGBE_82599_SCATTER - 1; > ifp->if_hw_tsomaxsegsize = 2048; > > Refer to: > > > * > > * NOTE: The TSO limits only apply to the data payload part of > > * a TCP/IP packet. That means there is no need to subtract > > * space for ethernet-, vlan-, IP- or TCP- headers from the > > * TSO limits unless the hardware driver in question requires > > * so. > This comment suggests that the driver author doesn't need to do this. However, unless this is fixed in tcp_output(), the above patch should be applied to the driver. > In sys/net/if_var.h > > Thank you! > > --HPS > The problem I see is that, after doing the calculation of how many mbufs can be in the TSO segment, the code in tcp_output() will have calculated a value for "len" that will always be less that "tp->t_maxopd - optlen" when the if_hw_tsosegcount limit has been hit (see where it does a "break;" out of the while loop). --> This does not imply "too many small fragments" for NFS, just that the driver's transmit segment limit has been reached, where most of them are mbuf clusters, but not the first ones. As such the code: /* * In case there are too many small fragments * don't use TSO: */ if (len <= max_len) { len = max_len; sendalot = 1; tso = 0; } Will always happen for this case and "tso" gets set to 0. Not what we want to happen, imho. The above code block was what I suggested should be commented out or deleted for the test. It appears you should also add the "- 1" in the driver sys/dev/ixgbe/if_ix.c. rick > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >