From owner-freebsd-stable@freebsd.org Tue Sep 20 08:11:39 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 160D6BE1C65 for ; Tue, 20 Sep 2016 08:11:39 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CF54BED1 for ; Tue, 20 Sep 2016 08:11:38 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bmG9P-000JKq-6d; Tue, 20 Sep 2016 11:11:35 +0300 Date: Tue, 20 Sep 2016 11:11:35 +0300 From: Slawa Olhovchenkov To: Lyndon Nerenberg Cc: FreeBSD Stable Subject: Re: LAGG and Jumbo Frames Message-ID: <20160920081135.GH2960@zxy.spb.ru> References: <48926c6013f938af832c17e4ad10b232@dweimer.net> <04c9065ee4a780c6f8986d1b204c4198@dweimer.net> <20160919220812.GG2960@zxy.spb.ru> <42A03EA9-7F8E-446E-B430-7431AB9CE2E6@orthanc.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42A03EA9-7F8E-446E-B430-7431AB9CE2E6@orthanc.ca> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Sep 2016 08:11:39 -0000 On Mon, Sep 19, 2016 at 03:59:20PM -0700, Lyndon Nerenberg wrote: > > > On Sep 19, 2016, at 3:08 PM, Slawa Olhovchenkov wrote: > > > > This is because RTT of this link for jumbo frames higher 1500 bytes > > frame for store-and-forward switch chain. > > For TCP, RTT isn't really a factor (in this scenario), I am don't see scenario in first message. For may scenario this is limiting factor > as the windowing and congestion avoidance algorithms will adapt to the actual bandwidth-delay product of the link, and the delays in each direction will be symmetrical. > > Now the ack for a single 9000 octet packet will take longer than > that for a 1500 octet one, but that's because you're sending six > times as many octets before the ACK can be generated. The time to > send six 1500 octet packets and receive the ACK from sixth packet is > going to be comparable to that of receiving the ack from a single > 9000 octet packet. It's simple arithmetic to calculate the extra > protocol header overhead for 6x1500 vs 1x9000. Time to send send six 1500 octet packets significant less then for send one 9000 octet packet over multiple switch: H1-[S1]-[S2]-[S3]-H2 Sendig single 1500 octet packet from H1 to S1 over 1Gbit link: (1500+14+4+12+8)*8/10^9 = 12us switch delayed for 3us same for s1-s2, s2-s3, s3-h2. 2'nd packet delayed for 12us. 3..6 -- same. Sending all six packets (5 inter packets over 4 hop): (12+3)*4 + 12*5 = 120us. Sending single 9000 octet packet from H1 to S1 over 1Gbit link: (9000+14+4+12+8)*8/10^9 = 72us switch delayed for 3us Sending single 9000 octet packet over 4 hop: (72+3)*4 = 300us. 300/120 = 2.5 time slower > If there *is* a significant difference (beyond the extra protocol header overhead), it's time to take a very close look at the NICs you are using in the end hosts. A statistically significant difference would hint at poor interrupt handling performance on the part of one or more of the NICs and their associated device drivers. > > The intermediate switch overhead will be a constant (unless the switch backplane becomes saturated from unrelated traffic). You lost serelisation time.