From owner-freebsd-net@FreeBSD.ORG Mon Mar 2 12:11:06 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F298C2D4; Mon, 2 Mar 2015 12:11:05 +0000 (UTC) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B5CF4B6F; Mon, 2 Mar 2015 12:11:05 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 1AF5B1FE022; Mon, 2 Mar 2015 13:11:03 +0100 (CET) Message-ID: <54F45387.3060306@selasky.org> Date: Mon, 02 Mar 2015 13:11:51 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: "freebsd-net@freebsd.org" , Navdeep Parhar , Jack F Vogel Subject: Very Large LRO in FreeBSD Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Mar 2015 12:11:06 -0000 Hi, I would like to move forward with support for very large LRO support for FreeBSD. I currently have the following patch for review: https://reviews.freebsd.org/D1761 D1761 basically extends the current LRO support more or less in a hackish way. We need very large LRO support in order to reduce the number of calls into the TCP stack when doing 40GBit and above. The current LRO limitations force us to call the TCP stack every 64KBytes and at rates above 40GBit we are spending a significant amount of time doing "tcp_input()" and also the work done by "tcp_input()" at these rates is not very useful. We need to assume error-free transmission to get very high rates anyway. Testing results done at work showed a CPU usage reduction in the order of between 2 and 4 times per high speed TCP stream. There is another approach which is possible and that is to have a multiple input if_input_multi() function in "struct ifnet" like outlined below. The purpose of such a function would be to skip the LRO-ing in the network drivers, and instead forward an array of mbufs with all the received packets. void if_input_multi(struct mbuf **ppmbuf, uint8_t log2_size); The if_input_multi() then begins quick sorting the packets according to: 1) ethernet address 2) vlan prefix 3) IP address 4) TCP port numbers 5) received sequence number We want the size to be power of 2 to allow a very quick sorting. Then if_input_multi() will collect packets which go to the same destination, and remove the headers from all of them and forward like this: typedef int pr_multi_input_t(mbuf_array, num_mbufs, &off, proto); In case a pr_input_multi_t method is not available there will be a fallback to "pr_input_t". Any comments? Anyone already working on such a feature? --HPS