Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Mar 2015 13:11:51 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>,  Navdeep Parhar <np@FreeBSD.org>, Jack F Vogel <jfvogel@gmail.com>
Subject:   Very Large LRO in FreeBSD
Message-ID:  <54F45387.3060306@selasky.org>

next in thread | raw e-mail | index | archive | help
Hi,

I would like to move forward with support for very large LRO support for 
FreeBSD. I currently have the following patch for review:

https://reviews.freebsd.org/D1761

D1761 basically extends the current LRO support more or less in a 
hackish way.

We need very large LRO support in order to reduce the number of calls 
into the TCP stack when doing 40GBit and above. The current LRO 
limitations force us to call the TCP stack every 64KBytes and at rates 
above 40GBit we are spending a significant amount of time doing 
"tcp_input()" and also the work done by "tcp_input()" at these rates is 
not very useful. We need to assume error-free transmission to get very 
high rates anyway.


Testing results done at work showed a CPU usage reduction in the order 
of between 2 and 4 times per high speed TCP stream.


There is another approach which is possible and that is to have a 
multiple input if_input_multi() function in "struct ifnet" like outlined 
below.

The purpose of such a function would be to skip the LRO-ing in the 
network drivers, and instead forward an array of mbufs with all the 
received packets.

void if_input_multi(struct mbuf **ppmbuf, uint8_t log2_size);

The if_input_multi() then begins quick sorting the packets according to:

1) ethernet address
2) vlan prefix
3) IP address
4) TCP port numbers
5) received sequence number

We want the size to be power of 2 to allow a very quick sorting.

Then if_input_multi() will collect packets which go to the same 
destination, and remove the headers from all of them and forward like this:

typedef int pr_multi_input_t(mbuf_array, num_mbufs, &off, proto);

In case a pr_input_multi_t method is not available there will be a 
fallback to "pr_input_t".

Any comments?

Anyone already working on such a feature?

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54F45387.3060306>