From owner-freebsd-net@freebsd.org Mon May 4 12:18:18 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4BC372CE93B for ; Mon, 4 May 2020 12:18:18 +0000 (UTC) (envelope-from pavel@x3me.net) Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49G2210K5cz47gB for ; Mon, 4 May 2020 12:18:16 +0000 (UTC) (envelope-from pavel@x3me.net) Received: by mail-ej1-x634.google.com with SMTP id x1so13651506ejd.8 for ; Mon, 04 May 2020 05:18:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=x3me-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=cuW1U0TrKZPbKGXy9iiR8nFBpvA3n42ow7si6CISlmg=; b=CcpGVH1w9ArVAgFmXZy6fUKN93hLgmJ6DlXvhNxr0i7So//6jvMejLKByYqBXw5xu6 A3a6KZHJ3wmM/rNLZpv4NtJ2nz3XOi+dlAAtP6H8aJGFXY1jPPmt0gnRFptXOToN2aX8 CgZqH/xN0lzR9f2CoG1j/50BDbkR0JgIRCW7Dt+/zYvSEyUJlhapHy0PGUwi/9HmaZoR jXM/kBtJeB5IEJiq0OVixIzy4txF+RqFL58SUkRAaP20WsC9uaXhSs/oFZsmtnt72ffa T6I/yU5kjBk0T5FsWs/rmIdVc3T222U+0zfLhPyzt0OSPlR8U4BwTp0HpxE2/HozWX1U EKFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=cuW1U0TrKZPbKGXy9iiR8nFBpvA3n42ow7si6CISlmg=; b=EH3eqNxPVy2rk4FTnMStiHK9Wd6hHhNiskS8Oem02I4s1iGp0gH+w0xEFD5bzHd71A A8qQ0znwr3+TAL42tzmbQigXeyau391fLz5moeXF9mr1bwie9kl9F3x4JrrElBib0E/G 9AUoS65zQTj2mvAAhe5mha+L0N+p6ZeL+gkkaqLg0gzbeoeUAetixohz4jUryjpHYD4l x1T5g+gfFirhI73nHQZ/QGkpx53M5ghzVHF4/QD+fw5b5e70rzJVe716RtC9ft5P7vMo 5HHKnjbeVPDwKYrx61geW16GL4UT1sd9qII5mpFWLW+/w/lImjt0TJKRDDVZUpa3k1cU ed4A== X-Gm-Message-State: AGi0PuYKo13JEn1CUGQM3OmsyELfohNdqAlWyoq6rVs3bMFTlnqFFwvD 0s4mpwKhaMhEKtKb9bPzuQ5CcjLeCxwAnqbTnKgwRy7c X-Google-Smtp-Source: APiQypLyi682hawa1W6/ls+XPd7TgKuOfPkEb3RaXn4H4OhJKnW/evKp7GVL9exK//AZjZTUqMZAEs2b0hgZs8b85xE= X-Received: by 2002:a17:906:ecb8:: with SMTP id qh24mr14420733ejb.299.1588594693744; Mon, 04 May 2020 05:18:13 -0700 (PDT) MIME-Version: 1.0 References: <20200501213705.GA52782@neutralgood.org> In-Reply-To: <20200501213705.GA52782@neutralgood.org> From: Pavel Vazharov Date: Mon, 4 May 2020 15:18:01 +0300 Message-ID: Subject: Fwd: Performance troubleshouting of FreeBSD networking stack and/or kevent functionality To: freebsd-net@freebsd.org X-Rspamd-Queue-Id: 49G2210K5cz47gB X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=x3me-net.20150623.gappssmtp.com header.s=20150623 header.b=CcpGVH1w; dmarc=none; spf=pass (mx1.freebsd.org: domain of pavel@x3me.net designates 2a00:1450:4864:20::634 as permitted sender) smtp.mailfrom=pavel@x3me.net X-Spamd-Result: default: False [-3.06 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[x3me-net.20150623.gappssmtp.com:s=20150623]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[x3me.net]; DKIM_TRACE(0.00)[x3me-net.20150623.gappssmtp.com:+]; RCVD_IN_DNSWL_NONE(0.00)[4.3.6.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; IP_SCORE(-0.56)[ipnet: 2a00:1450::/32(-2.31), asn: 15169(-0.43), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2020 12:18:18 -0000 Hi there, First I want to say that I understand that the following questions are very broad and possibly only indirectly related to the FreeBSD networking (not sure). It's just that after more than a week spent on the below issue the only option I can see is to ask for help or some piece of advice. There is a project called F-Stack . It glues together the networking stack from FreeBSD 11.01 over DPDK . It uses the DPDK to get the packets from the network card in user space and then uses the FreeBSD stack to handle the packets again in user space. It also provides socket API and epoll API which uses internally kqueue/kevent from the FreeBSD. We made a setup to test the performance of transparent TCP proxy based on F-Stack and another one running on Standard Linux kernel. We did the tests on KVM with 2 cores (Intel(R) Xeon(R) Gold 6139 CPU @ 2.30GHz) and 32GB RAM. 10Gbs NIC was attached in passthrough mode. The application level code, the one which handles epoll notifications and memcpy data between the sockets, of the both proxy applications is 100% the same. Both proxy applications are single threaded and in all tests we pinned the applications on core 1. The interrupts from the network card were pinned to the same core 1 for the test with the standard Linux application. Here are the test results: 1. The Linux based proxy was able to handle about 1.7-1.8 Gbps before it started to throttle the traffic. No visible CPU usage was observed on core 0 during the tests, only core 1, where the application and the IRQs were pinned, took the load. 2. The DPDK+FreeBSD proxy was able to thandle 700-800 Mbps before it started to throttle the traffic. No visible CPU usage was observed on core 0 during the tests only core 1, where the application was pinned, took the load. 3. We did another test with the DPDK+FreeBSD proxy just to give us some more info about the problem. We disabled the TCP proxy functionality and let the packets be simply ip forwarded by the FreeBSD stack. In this test we reached up to 5Gbps without being able to throttle the traffic. We just don't have more traffic to redirect there at the moment. 4. We did a profiling with Linux perf of the DPDK+FreeBSD proxy with 200 Mbps of traffic just to check if some functionality is visible bottleneck. If I understand the results correctly, the application spends most of its time reading packets from the network card and after that the time is spent in kevent related functionality. # Children Self Samples Command Shared Object Symbol # ........ ........ ............ ............... .................. .................................................... # 43.46% 39.67% 9071 xproxy.release xproxy.release [.] main_loop | |--35.31%--main_loop | | | --3.71%--_recv_raw_pkts_vec_avx2 | |--5.44%--0x305f6e695f676e69 | main_loop | --2.68%--0 main_loop 25.51% 0.00% 0 xproxy.release xproxy.release [.] 0x0000000000cdbc40 | ---0xcdbc40 | |--5.03%--__cap_rights_set | |--4.65%--kern_kevent | |--3.85%--kqueue_kevent | |--3.62%--__cap_rights_init | |--3.45%--kern_kevent_fp | |--1.90%--fget | |--1.61%--uma_zalloc_arg | --1.40%--fget_unlocked 10.01% 0.00% 0 xproxy.release [unknown] [k] 0x00007fa0761d8010 | ---0x7fa0761d8010 | |--4.23%--ff_kevent_do_each | |--2.33%--net::ff_epoll_reactor_impl:: process_events <-- Only this function is ours | |--1.96%--kern_kevent | --1.48%--ff_epoll_wait 7.13% 7.12% 1627 xproxy.release xproxy.release [.] kqueue_kevent | |--3.84%--0xcdbc40 | kqueue_kevent | |--2.41%--0 | kqueue_kevent | --0.88%--kqueue_kevent 6.82% 0.00% 0 xproxy.release [unknown] [.] 0x0000000001010010 | ---0x1010010 | |--2.40%--uma_zalloc_arg | --1.22%--uma_zero_item 5. We did another profiling, just doing intrusive timing of some blocks of code, using again around 200Mbps of traffic and found again that about 30% of the application time is spent in the epoll_wait function which just setups the parameter for calling kern_kevent and call the function. The whole application can be very roughly represented in the following way: - Read incoming packets from the network card - Write pending outgoing packets to the network card - Push the incoming packets to the FreeBSD stack - Call epoll_wait/kevent without waiting <- About 25-30% of the application time seems to be spent here - Handle the events - loop from the beginning Here is the configuration for FreeBSD which was used for the tests [freebsd.boot] hz=100 fd_reserve=1024 kern.ncallout=524288 kern.sched.slice=1 kern.maxvnodes=524288 kern.ipc.nmbclusters=262144 kern.ipc.maxsockets=524000 net.inet.ip.fastforwarding=1 net.inet.tcp.syncache.hashsize=32768 net.inet.tcp.syncache.bucketlimit=32 net.inet.tcp.syncache.cachelimit=1048576 net.inet.tcp.tcbhashsize=524288 net.inet.tcp.syncache.rst_on_sock_fail=0 net.link.ifqmaxlen=4096 kern.features.inet6=0 net.inet6.ip6.auto_linklocal=0 net.inet6.ip6.accept_rtadv=2 net.inet6.icmp6.rediraccept=1 net.inet6.ip6.forwarding=0 [freebsd.sysctl] kern.maxfiles=524288 kern.maxfilesperproc=524288 kern.ipc.soacceptqueue=4096 kern.ipc.somaxconn=4096 kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=262144 kern.ipc.maxsockets=524288 net.link.ether.inet.maxhold=5 net.inet.ip.redirect=0 net.inet.ip.forwarding=1 net.inet.ip.portrange.first=1025 net.inet.ip.portrange.last=65535 net.inet.ip.intr_queue_maxlen=4096 net.inet.tcp.syncache.rst_on_sock_fail=0 net.inet.tcp.rfc1323=1 net.inet.tcp.fast_finwait2_recycle=1 net.inet.tcp.sendspace=16384 net.inet.tcp.recvspace=16384 net.inet.tcp.cc.algorithm=cubic net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendbuf_auto=1 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.sendbuf_inc=16384 net.inet.tcp.recvbuf_inc=524288 net.inet.tcp.sack.enable=1 net.inet.tcp.msl=2000 net.inet.tcp.delayed_ack=1 net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 Something important!!! We've added functionality to the FreeBSD networking stack which allows us to open transparent TCP sockets when the first data packet after the 3 way handshake is received. I can explain why we need this functionality, if needed. I can show you the code/patch also, if needed. I checked this functionality multiple times. I can't see how it can lead to throttling the traffic, causing packets to be dropped and the whole thing to stop responds to regular pings and arpings due to the packet drops. This functionality was only applied for TCP traffic on port 80 during the tests. I mean, of course I could be missing something but bugs in this functionality usually lead to completely broken TCP connections or tapped connections due to the wrong TCP window. At least this is my experience so far having implemented similar functionality in the Linux kernel which we have been using for 3-4 years already. But again I could be wrong here. >From the above tests and measurements I made the following conclusions/observations: - The FreeBSD stack has no problems forwarding 5Gbps of traffic and thus the performance decrease should be caused of some of the above layers - TCP handling in the stack, kevent functionality and the application working with the kevent? - The kevent functionality appears in the CPU profiling with much higher numbers than any application code. This could be because the application is using the kevent in some wrong way or it could be just because the function is frequently called? On the other hand all of the functions in the loop are equally called. - For the Linux proxy case, the IRQs may be handled on a given core but the actual packet processing within the networking stack could happen on both cores and this could lead to better performance. However, we did not observe visible CPU usage on the core 0 during the tests. And finally, after this long post, here are my questions: 1. Does somebody have observations or educated guesses about what amount of traffic should I expect the FreeBSD stack + kevent to process in the above scenario? Are the numbers low or expected? 2. Does somebody can think of some the kevent specifics compared to Linux epoll which can lead to worse performance? For example the usage of EV_CLEAR flag? Reading too many or too few events at a time? 3. Can I check some counters of the FreeBSD stack which will point me to potential bottlenecks? 3. If somebody can give me some other advice, what more to check/debug/profile, or what config/sysctl settings to tweak to improve the performance of the DPDK+FreeBSD based proxy? One last thing which I was thinking in the last few days. As far as I know the interrupts will always preempt the currently working user space code. So, if this is right, for the Linux case we'll have much more time spent hanling interrutps and much less time spent in the user space and epoll notifications handling. The siutation is different though in the F-stack application loop. There pending packets are sent, 32 packets are read from the network card, pushed to the FreeBSD stack and kevent is called and the loop repeats. This means that the time slice for packets read and processing in the stack is limited by the kevent call. So, I thought that, just for the test, changing the ratio between the packets processing and kevent calls should improve the situation. So, I did test where the kevent was not called every iteration of the loop but once 1024 packets were read and pushed to the network stack. However, by some reason this didn't improve the situaion and currently I have no explanation for this, too. Maybe I did something wrong when testing. Any help is appreciated! Thanks in advance, Pavel.