Date: Sat, 20 Aug 2016 13:48:12 GMT From: vincenzo@FreeBSD.org To: svn-soc-all@FreeBSD.org Subject: socsvn commit: r308087 - soc2016/vincenzo Message-ID: <201608201348.u7KDmCql099004@socsvn.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: vincenzo Date: Sat Aug 20 13:48:12 2016 New Revision: 308087 URL: http://svnweb.FreeBSD.org/socsvn/?view=rev&rev=308087 Log: Add README for the project Added: soc2016/vincenzo/README Added: soc2016/vincenzo/README ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ soc2016/vincenzo/README Sat Aug 20 13:48:12 2016 (r308087) @@ -0,0 +1,225 @@ +== High-performance TCP/IP networking for bhyve VMs using netmap passthrough == + + Student: Vincenzo Maffione (vincenzo AT freebsd DOT org) + Mentor: Luigi Rizzo (luigi AT freebsd DOT org) + + +=========================== Project proposal ========================== + +Netmap passhthrough (ptnetmap) has been recently introduced on Linux/FreeBSD +platforms, where QEMU-KVM/bhyve hypervisors allow VMs to exchange over 20 Mpps +through VALE switches. Unfortunately, the original ptnetmap implementation +was not able to exchange packets with the guest TCP/IP stack, it only +supported guest applications running directly over netmap. Moreover, ptnetmap +was not able to support multi-ring netmap ports. + +I have recently developed a prototype of ptnet, a new multi-ring +paravirtualized device for Linux and QEMU/KVM that builds on ptnetmap to allow +VMs to exchange TCP traffic at 20 Gbps, while still offering the same ptnetmap +performance to native netmap applications. + +In this project I would like to implement ptnet for FreeBSD and bhyve, which +do not currently allow TCP/IP traffic with such high performance. Taking +the above prototype as a reference, the following work is required: + + - Implement a ptnet driver for FreeBSD guests that is able to attach to netmap + to support native netmap applications (estimated new code ~700 loc). + - Export a network interface to the FreeBSD guest kernel that allows ptnet + to be used by the network stack, including virtio-net header support + (estimated new code ~800 loc). + - Extend bhyve to emulate the ptnet device model and interact with the netmap + instance used by the hypervisor (estimated new code ~600 loc). + + +================== An overview of netmap and ptnetmap ==================== + +Netmap is a framework for high performance network I/O. It exposes an +hardware-independent API which allows userspace application to directly +interact +with NIC hardware rings, in order to receive and transmit Ethernet frames. +Rings are always accessed in the context of system calls and NIC interrups +are used to notify applications about NIC processing completion. +The performance boost of netmap w.r.t. traditional socket API primarily comes +from: (i) batching, since it is possible to send/receive hundreds of packets +with a single system call, (ii) preallocation of packet buffers and memory +mapping of those in the application address space. + +Several netmap extension have been developed to support virtualization. +Netmap support for various paravirtualized drivers - e.g. virtio-net, Xen +netfront/netback - allows netmap applications to run in the guest over fast +paravirtualized I/O devices. + +The Virtual Ethernet (VALE) software switch, which supports scalable high +performance local communication (over 20 Mpps between two switch ports), can +then be used to connect together multiple VMs. + +However, in a typical scenario with two communicating netmap applications +running in different VMs (on the same host) connected through a VALE switch, +the journey of a packet is still quite convoluted. As a matter of facts, +while netmap is fast on both the host (the VALE switch) and the guest +(interaction between application and the emulated device), each packet still +needs to be processed from the hypervisor, which needs to emulate the +device model used in the guest (e.g. e1000, virtio-net). The emulation +involves device-specific overhead - queue processing, format conversions, +packet copies, address translations, etc. As a consequence, the maximum +packet rate between the two VMs is often limited by 2-5 Mpps. + +To overcome these limitations, ptnetmap has been introduced as a passthrough +technique to completely avoid hypervisor processing in the packet +datapath, unblocking the full potential of netmap also for virtual machine +environments. +With ptnetmap, a netmap port on the host can be exposed to the guest in a +protected way, so that netmap applications in the guest can directly access +the rings and packet buffers of the host port, avoiding all the extra overhead +involved in the emulation of network devices. System calls issued by guest +applications on ptnetmap ports are served by kernel threads (one +per ring) running in the netmap host. + +Similarly to VirtIO paravirtualization, synchronization between +guest netmap (driver) and host netmap (kernel threads) happens through a +shared memory area called Communication Status Block (CSB), which is used +to store producer-consumer state and notification suppression flags. + +Two notification mechanisms needs to be supported by the hypervisor to allow +guest and host netmap to wake up each other. +On QEMU/bhyve, notifications from guest to host are implemented with accesses +to I/O registers which cause a trap in the hypervisor. Notifications in the +other direction are implemented using KVM/bhyve interrupt injection mechanisms. +MSI-X interrupts are used since they have less overhead than traditional +PCI interrupts. + +Since I/O register accesses and interrupts are very expensive in the common +case of hardware assisted virtualization, they are suppressed when not needed, +i.e. each time the host (or the guest) is actively polling the CSB to +check for more work. From an high-level perspective, the system tries to +dynamically switch between polling operation under high load, and +interrupt-based operation under lower loads. + + +===================== The ptnet paravirtualized device ================ + +The original ptnetmap implementation required ptnetmap-enabled virtio-net/e1000 +drivers. Only the notification functionalities of those devices were reused, +while the datapath (e.g. e1000 rings or virtio-net Virtual Queues) was +completely bypassed. + +The ptnet device has been introduced as a cleaner approach to ptnetmap that +also adds the ability to interact with the standard TCP/IP network stack +and supports multi-ring netmap ports. The introduction of a new device model +does not limit the adoption of this solution, since ptnet drivers are +distributed together with netmap, and hypervisor modifications are needed in +any case. + +The ptnet device belongs to the classes of paravirtualized devices, like +virtio-net. Unlike virtio-net, however, ptnet does not define an interface +to exchange packets (datapath), but the existing netmap API is used instead. +However, a CSB - cleaned up and extended to support an arbitrary number of +rings - is still used for producer-consumer synchronization and notification +suppression. + +A number of device registers are used for configuration (number of rings and +slots, device MAC address, supported features, ...) while "kick" registers +are used for guest-to-host notifications. +The ptnetmap kthread infrastructure, moreover, has been already extended to +suppor an arbitrary number of rings, where currently each ring is served +by a different kernel thread. + + +=============================== Deliverables =============================== + +==== D1 (due by week 3) ==== +Implement a ptnet driver for FreeBSD guests, which only supports native netmap +applications. This new driver can be tested using Linux and QEMU-KVM as +hypervisor, which already supports ptnetmap and emulates the ptnet device model. +Since the datapath will be equivalent, we expect to have the same performance +of the original ptnetmap (over 20 Mpps for VALE ports, 14.88 Mpps for hardware +10Gbit ports). + +==== D2 (due by mid-term) ==== +Extend the ptnet FreeBSD driver to export a regular network interface to the +FreeBSD kernel. In terms of latency, we expect a performance similar to the the +ptnet linux driver. + +==== D3 (due by week 9) ==== +Extend the ptnet FreeBSD driver to support TCP Segmentation Offloading (TSO) +and Checksum offloading, by means of the virtio-net header, similarly to +what is done in the linux driver. After this step we expect to have a TCP +performance similar to the Linux one. + +==== D4 (due by the end of project) ==== +Implement the emulation of the ptnet device model in bhyve, starting from a +bhyve version supporting netmap and ptnetmap, which is already available. +At this point we expect FreeBSD guests over bhyve to see similar TCP/IP +throughput and latency performance as Linux guests over QEMU-KVM (about 20 Gbps +for TCP bulk traffic and about 40 thousand HTTP-like transactions per second +between two guests running through a VALE switch). + + +================================= Milestones ================================ + +Start date: 2016/05/23 + +Estimated end dates: 2016/08/23 + +Timetable: + + * Week 1-2: Write ptnet FreeBSD driver supporting netmap native applications [D1] --> COMPLETED + * Week 3: Tests, bug-fixing, and performance evaluation [D1] --> COMPLETED + * Week 4-5: Write FreeBSD network interface support for the ptnet driver. [D2] --> COMPLETED + * Week 6: Tests, bug-fixing [D2]. Prepare documents for mid-term evaluation. --> COMPLETED + * Week 7-8: Add virtio-net header support to the ptnet driver [D3]. --> COMPLETED + * Week 9: Test, bug-fixes and performance evaluation [D3]. --> COMPLETED + * Week 10: Write ptnet device model emulation for bhyve [D4] --> COMPLETED + * Week 11: Test and performance evaluation over bhyve [D4]. + * Week 12: Clean code and prepare documentation for final evaluation. + + +========================== Final submission ================================= + +Final code of my project is available at the following SVN repository: + + https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/ + +which refers to FreeBSD head (11.0-CURRENT). + +Moreover, all modifications I did to netmap (see below) have also been +merged also in the netmap GIT repository, so that can be also found at +https://github.com/luigirizzo/netmap. + +My code modifications belong to two different subsystems: + + (1) netmap, where I added the ptnet device driver, implemented as a single + source file, named head/sys/dev/netmap/if_ptnet.c. + The file is available at the following link into the SVN repository: + https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/head/sys/dev/netmap/if_ptnet.c?view=markup + + Moreover, some code reorganization and bug-fixing to other parts of + netmap were necessary, including rearrangements of the ptnet driver + for Linux that I had already developed. A complete patch (which + also includes the if_ptnet.c FreeBSD driver) can be obtained with + the following command on the github netmap repository: + git diff --author="Vincenzo Maffione" 09936864fa5b67b82ef4a9907819b7018e9a38f2 master + + (2) bhyve, where I reworked and fixed the netmap support and added the + emulation of the ptnet device. Code modifications can be obtained + with the following SVN diff on the SVN repository: + + $ svn diff -r 302612 usr.sbin/bhyve + + +A modified version of QEMU that supports ptnet (not developed in the +context of this GSOC project) is available here: + + https://github.com/vmaffione/qemu/tree/ptnet + + + + +======================= Useful links ============================== + + * [0] http://info.iet.unipi.it/~luigi/netmap/ + * [1] https://wiki.freebsd.org/SummerOfCode2016/PtnetDriverAndDeviceModel#preview + * [2] https://svnweb.freebsd.org/socsvn/soc2016/vincenzo/ + * [3] https://github.com/luigirizzo/netmap + * [4] https://github.com/vmaffione/qemu/tree/ptnet +
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201608201348.u7KDmCql099004>