Date: Wed, 17 Sep 2014 10:27:39 +0200 From: Stefano Garzarella <stefanogarzarella@gmail.com> To: freebsd-current <freebsd-current@freebsd.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, gnn@neville-neil.com Cc: Luigi Rizzo <rizzo@iet.unipi.it> Subject: [RFC] Patch to add Software/Generic Segmentation Offload (GSO) support in FreeBSD Message-ID: <CAO0mX5bDoCe8oRNmd%2BUBbr4bJcgQEzhAQKyTsyxEQciyvGTdgQ@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi all, I have recently worked, during my master=E2=80=99s thesis with the supervis= ion of Prof. Luigi Rizzo, on a project to add GSO (Generic Segmentation Offload) support in FreeBSD. I will present this project at EuroBSDcon 2014, in Sofia (Bulgaria) on September 28, 2014. Following is a brief description of our project: The use of large frames makes network communication much less demanding for the CPU. Yet, backward compatibility and slow links requires the use of 1500 byte or smaller frames. Modern NICs with hardware TCP segmentation offloading (TSO) address this problem. However, a generic software version (GSO) provided by the OS has reason to exist, for use on paths with no suitable hardware, such as between virtual machines or with older or buggy NICs. Much of the advantage of TSO comes from crossing the network stack only once per (large) segment instead of once per 1500-byte frame. GSO does the same both for segmentation (TCP) and fragmentation (UDP) by doing these operations as late as possible. Ideally, this could be done within the device driver, but that would require modifications to all drivers. A more convenient, similarly effective approach is to segment just before the packet is passed to the driver (in ether_output()). Our preliminary implementation supports TCP and UDP on IPv4/IPv6; it only intercepts packets large than the MTU (others are left unchanged), and only when GSO is marked as enabled for the interface. Segments larger than the MTU are not split in tcp_output(), udp_output(), or ip_output(), but marked with a flag (contained in m_pkthdr.csum_flags), which is processed by ether_output() just before calling the device driver. ether_output(), through gso_dispatch(), splits the large frame as needed, creating headers and possibly doing checksums if not supported by the hardware. In experiments agains an LRO-enabled receiver (otherwise TSO/GSO are ineffective) we have seen the following performance, taken at different clock speeds (because at top speeds the 10G link becomes the bottleneck): Testing enviroment (all with Intel 10Gbit NIC) Sender: FreeBSD 11-CURRENT - CPU i7-870 at 2.93 GHz + Turboboost Receiver: Linux 3.12.8 - CPU i7-3770K at 3.50GHz + Turboboost Benchmark tool: netperf 2.6.0 --- TCP/IPv4 packets (checksum offloading enabled) --- Freq. TSO GSO none Speedup [GHz] [Gbps] [Gbps] [Gbps] GSO-none 2.93 9347 9298 8308 12 % 2.53 9266 9401 6771 39 % 2.00 9408 9294 5499 69 % 1.46 9408 8087 4075 98 % 1.05 9408 5673 2884 97 % 0.45 6760 2206 1244 77 % --- TCP/IPv6 packets (checksum offloading enabled) --- Freq. TSO GSO none Speedup [GHz] [Gbps] [Gbps] [Gbps] GSO-none 2.93 7530 6939 4966 40 % 2.53 5133 7145 4008 78 % 2.00 5965 6331 3152 101 % 1.46 5565 5180 2348 121 % 1.05 8501 3607 1732 108 % 0.45 3665 1505 651 131 % --- UDP/IPv4 packets (9K) --- Freq. GSO none Speedup [GHz] [Gbps] [Gbps] GSO-none 2.93 9440 8084 17 % 2.53 7772 6649 17 % 2.00 6336 5338 19 % 1.46 4748 4014 18 % 1.05 3359 2831 19 % 0.45 1312 1120 17 % --- UDP/IPv6 packets (9K) --- Freq. GSO none Speedup [GHz] [Gbps] [Gbps] GSO-none 2.93 7281 6197 18 % 2.53 5953 5020 19 % 2.00 4804 4048 19 % 1.46 3582 3004 19 % 1.05 2512 2092 20 % 0.45 998 826 21 % We tried to change as little as possible the network stack to add GSO support. To avoid changing API/ABI, we temporarily used spare fields in struct tcpcb (TCP Control Block) and struct ifnet to store some information related to GSO (enabled, max burst size, etc.). The code that performs the segmentation/fragmentation is contained in the file gso.[h|c] in sys/net. We used 4 bit in m_pkthdr.csum_flags (CSUM_GSO_MASK) to encode the packet type (TCP/IPv4, TCP/IPv6, etc) to prevent access to the TCP/IP/Ethernet headers of each packet. In ether_output_frame(), if the packet requires the GSO ((m->m_pkthdr.csum_flags & CSUM_GSO_MASK) !=3D 0), it is segmented or fragmented, and then they are sent to the device driver. At https://github.com/stefano-garzarella/freebsd-gso you can find the kernel patches for FreeBSD-current, FreeBSD 10-stable, FreeBSD 9-stable, a simple application (gso-stats.c) that prints the GSO statistics and picobsd images with GSO support. At https://github.com/stefano-garzarella/freebsd-gso-src you can get the FreeBSD source with GSO patch (various branch for FreeBSD current, 10-stable, 9-stable). Any feedbacks, comments, questions are welcome=E2=80=8B. Thank you very much, Stefano Garzarella ---------------------------------------------------------------------------= --------------------------------- How to use GSO: - Apply the right kernel patch. - To compile the GSO support add =E2=80=98 options GSO ' to your kernel con= fig file and rebuild a kernel. - To manage the GSO parameters there are some sysctls: - net.inet.tcp.gso - GSO enable on TCP communications (!=3D0) - net.inet.udp.gso - GSO enable on UDP communications (!=3D0) - for each interface: - net.gso.dev."ifname=E2=80=9D.max_burst - GSO burst length limit [default: IP_MAXPACKET=3D65535] - net.gso.dev."ifname=E2=80=9D.enable_gso - GSO enable on =E2=80= =9Cifname=E2=80=9D interface (!=3D0) - To show statistics: - make sure that the GSO_STATS macro is defined in sys/net/gso.h - use the simple gso-stats.c application to access the sysctl net.gso.stats that contains the address of the gsostats structure (defined in gso.h) which records the statistics. (compile with -I/path/to/kernel/src/patched/) ---------------------------------------------------------------------------= --------------------------------- --=20 *Stefano Garzarella* stefano.garzarella@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAO0mX5bDoCe8oRNmd%2BUBbr4bJcgQEzhAQKyTsyxEQciyvGTdgQ>