From owner-freebsd-stable@freebsd.org Thu Oct 8 09:32:42 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA15E9D1892; Thu, 8 Oct 2015 09:32:42 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8A477BC5; Thu, 8 Oct 2015 09:32:41 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id E51F81FE022; Thu, 8 Oct 2015 11:32:31 +0200 (CEST) Subject: Re: ix(intel) vs mlxen(mellanox) 10Gb performance To: Daniel Braniss , Rick Macklem References: <1D52028A-B39F-4F9B-BD38-CB1D73BF5D56@cs.huji.ac.il> <1153838447.28656490.1440193567940.JavaMail.zimbra@uoguelph.ca> <15D19823-08F7-4E55-BBD0-CE230F67D26E@cs.huji.ac.il> <818666007.28930310.1440244756872.JavaMail.zimbra@uoguelph.ca> <49173B1F-7B5E-4D59-8651-63D97B0CB5AC@cs.huji.ac.il> <1815942485.29539597.1440370972998.JavaMail.zimbra@uoguelph.ca> <55DAC623.60006@selasky.org> <62C7B1A3-CC6B-41A1-B254-6399F19F8FF7@cs.huji.ac.il> <2112273205.29795512.1440419111720.JavaMail.zimbra@uoguelph.ca> <1E679659-BA50-42C3-B569-03579E322685@cs.huji.ac.il> Cc: pyunyh@gmail.com, FreeBSD stable , FreeBSD Net From: Hans Petter Selasky Message-ID: <56163896.3020907@selasky.org> Date: Thu, 8 Oct 2015 11:34:14 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <1E679659-BA50-42C3-B569-03579E322685@cs.huji.ac.il> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Oct 2015 09:32:42 -0000 Hi, I've now MFC'ed r287775 to 10-stable and 9-stable. I hope this will resolve the issues with m_defrag() being called on too long mbuf chains due to an off-by-one in the driver TSO parameters and that it will be easier to maintain these parameters in the future. Some comments were made that we might want to have an option to select if the IP-header should be counted or not. Certain network drivers require copying of the whole ETH/TCP/IP-header into separate memory areas, and can then handle one more data payload mbuf for TSO. Others required DMA-ing of the whole mbuf TSO chain. I think it is acceptable to have one TX-DMA segment slot free, in case of 2K mbuf clusters being used for TSO. From my experience the limitation typically kicks in when 2K mbuf clusters are used for TSO instead of 4K mbuf clusters. 65536 / 4096 = 16, whereas 65536 / 2048 = 32. If an ethernet hardware driver has a limitation of 24 data segments (mlxen), and assuming that each mbuf represent a single segment, then iff the majority of mbufs being transmitted are 2K clusters we may have a small, 1/24 = 4.2%, loss of TX capability per TSO packet. From what I've seen using iperf, which in turn calls m_uiotombuf() which in turn calls m_getm2(), MJUMPPAGESIZE'ed mbuf clusters are preferred for large data transfers, so this issue might only happen in case of NODELAY being used on the socket and if the writes are small from the application point of view. If an application is writing small amounts of data per send() system call, it is expected to degrade the system performance. Please file a PR if it becomes an issue. Someone asked me to MFC r287775 to 10.X release aswell. Is this still required? --HPS