From owner-freebsd-net@freebsd.org Wed Aug 19 07:50:23 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 264AF9BDE74; Wed, 19 Aug 2015 07:50:23 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D8AEBA2C; Wed, 19 Aug 2015 07:50:22 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 995EE1FE023; Wed, 19 Aug 2015 09:50:19 +0200 (CEST) Subject: Re: ix(intel) vs mlxen(mellanox) 10Gb performance To: pyunyh@gmail.com References: <1D52028A-B39F-4F9B-BD38-CB1D73BF5D56@cs.huji.ac.il> <17871443-E105-4434-80B1-6939306A865F@cs.huji.ac.il> <473274181.23263108.1439814072514.JavaMail.zimbra@uoguelph.ca> <7F892C70-9C04-4468-9514-EDBFE75CF2C6@cs.huji.ac.il> <805850043.24018217.1439848150695.JavaMail.zimbra@uoguelph.ca> <9D8B0503-E8FA-43CA-88F0-01F184F84D9B@cs.huji.ac.il> <1721122651.24481798.1439902381663.JavaMail.zimbra@uoguelph.ca> <55D333D6.5040102@selasky.org> <1325951625.25292515.1439934848268.JavaMail.zimbra@uoguelph.ca> <55D429A4.3010407@selasky.org> <20150819074212.GB964@michelle.fasterthan.com> Cc: Rick Macklem , FreeBSD stable , FreeBSD Net , Slawa Olhovchenkov , Christopher Forgeron , Daniel Braniss From: Hans Petter Selasky Message-ID: <55D43590.8050508@selasky.org> Date: Wed, 19 Aug 2015 09:51:44 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20150819074212.GB964@michelle.fasterthan.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Aug 2015 07:50:23 -0000 On 08/19/15 09:42, Yonghyeon PYUN wrote: > On Wed, Aug 19, 2015 at 09:00:52AM +0200, Hans Petter Selasky wrote: >> On 08/18/15 23:54, Rick Macklem wrote: >>> Ouch! Yes, I now see that the code that counts the # of mbufs is before the >>> code that adds the tcp/ip header mbuf. >>> >>> In my opinion, this should be fixed by setting if_hw_tsomaxsegcount to >>> whatever >>> the driver provides - 1. It is not the driver's responsibility to know if >>> a tcp/ip >>> header mbuf will be added and is a lot less confusing that expecting the >>> driver >>> author to know to subtract one. (I had mistakenly thought that >>> tcp_output() had >>> added the tc/ip header mbuf before the loop that counts mbufs in the list. >>> Btw, >>> this tcp/ip header mbuf also has leading space for the MAC layer header.) >>> >> >> Hi Rick, >> >> Your question is good. With the Mellanox hardware we have separate >> so-called inline data space for the TCP/IP headers, so if the TCP stack >> subtracts something, then we would need to add something to the limit, >> because then the scatter gather list is only used for the data part. >> > > I think all drivers in tree don't subtract 1 for > if_hw_tsomaxsegcount. Probably touching Mellanox driver would be > simpler than fixing all other drivers in tree. Hi, If you change the behaviour don't forget to update and/or add comments describing it. Maybe the amount of subtraction could be defined by some macro? Then drivers which inline the headers can subtract it? Your suggestion is fine by me. The initial TSO limits were tried to be preserved, and I believe that TSO limits never accounted for IP/TCP/ETHERNET/VLAN headers! > >> Maybe it can be controlled by some kind of flag, if all the three TSO >> limits should include the TCP/IP/ethernet headers too. I'm pretty sure >> we want both versions. >> > > Hmm, I'm afraid it's already complex. Drivers have to tell almost > the same information to both bus_dma(9) and network stack. You're right it's complicated. Not sure if bus_dma can provide an API for this though. --HPS