From owner-freebsd-net@FreeBSD.ORG Tue Jun 10 17:35:15 2014 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07456AF9; Tue, 10 Jun 2014 17:35:15 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 920242EBA; Tue, 10 Jun 2014 17:35:14 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1WuM1x-0009rk-1x; Tue, 10 Jun 2014 17:24:01 +0400 Message-ID: <5397415B.5070409@FreeBSD.org> Date: Tue, 10 Jun 2014 21:33:15 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Bryan Venteicher , current@FreeBSD.org, net@FreeBSD.org Subject: Re: dhclient sucks cpu usage... References: <20140610000246.GW31367@funkthat.com> <100488220.4292.1402369436876.JavaMail.root@daemoninthecloset.org> <5396CD41.2080300@FreeBSD.org> <20140610162443.GD31367@funkthat.com> In-Reply-To: <20140610162443.GD31367@funkthat.com> Content-Type: multipart/mixed; boundary="------------070504090202070208030308" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jun 2014 17:35:15 -0000 This is a multi-part message in MIME format. --------------070504090202070208030308 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 10.06.2014 20:24, John-Mark Gurney wrote: > Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 13:17 +0400: >> On 10.06.2014 07:03, Bryan Venteicher wrote: >>> Hi, >>> >>> ----- Original Message ----- >>>> So, after finding out that nc has a stupidly small buffer size (2k >>>> even though there is space for 16k), I was still not getting as good >>>> as performance using nc between machines, so I decided to generate some >>>> flame graphs to try to identify issues... (Thanks to who included a >>>> full set of modules, including dtraceall on memstick!) >>>> >>>> So, the first one is: >>>> https://www.funkthat.com/~jmg/em.stack.svg >>>> >>>> As I was browsing around, the em_handle_que was consuming quite a bit >>>> of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows >>>> me that the taskqueue for em was consuming about 50% cpu... Also pretty >>>> high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is >>>> consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump >>>> or anything, but I think dhclient uses bpf to be able to inject packets >>>> and listen in on them, so I kill off dhclient, and instantly, the >>>> taskqueue >>>> thread for em drops down to 40% CPU... (transfer rate only marginally >>>> improves, if it does) >>>> >>>> I decide to run another flame graph w/o dhclient running: >>>> https://www.funkthat.com/~jmg/em.stack.nodhclient.svg >>>> >>>> and now _rxeof drops from 17.22% to 11.94%, pretty significant... >>>> >>>> So, if you care about performance, don't run dhclient... >>>> >>> Yes, I've noticed the same issue. It can absolutely kill performance >>> in a VM guest. It is much more pronounced on only some of my systems, >>> and I hadn't tracked it down yet. I wonder if this is fallout from >>> the callout work, or if there was some bpf change. >>> >>> I've been using the kludgey workaround patch below. >> Hm, pretty interesting. >> dhclient should setup proper filter (and it looks like it does so: >> 13:10 [0] m@ptichko s netstat -B >> Pid Netif Flags Recv Drop Match Sblen Hblen Command >> 1224 em0 -ifs--l 41225922 0 11 0 0 dhclient >> ) >> see "match" count. >> And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for >> each consumer on interface). >> It should not introduce significant performance penalties. > Don't forget that it has to process the returning ack's... So, you're Well, it can be still captured with the proper filter like "ip && udp && port 67 or port 68". We're using tcpdump on high packet ratios (>1M) and it does not influence process _much_. We should probably convert its rwlock to rmlock and use per-cpu counters for statistics, but that's a different story. > looking around 10k+ pps that you have to handle and pass through the > filter... That's a lot of packets to process... > > Just for a bit more "double check", instead of using the HD as a > source, I used /dev/zero... I ran a netstat -w 1 -I em0 when > running the test, and I was getting ~50.7MiB/s w/ dhclient running and > then I killed dhclient and it instantly jumped up to ~57.1MiB/s.. So I > launched dhclient again, and it dropped back to ~50MiB/s... dhclient uses different BPF sockets for reading and writing (and it moves write socket to privileged child process via fork(). The problem we're facing with is the fact that dhclient does not set _any_ read filter on write socket: 21:27 [0] zfscurr0# netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1529 em0 --fs--l 86774 86769 86784 4044 3180 dhclient --------------------------------------- ^^^^^ -------------------------- 1526 em0 -ifs--l 86789 0 1 0 0 dhclient so all traffic is pushed down introducing contention on BPF descriptor mutex. (That's why I've asked for netstat -B output.) Please try an attached patch to fix this. This is not the right way to fix this, we'd better change BPF behavior not to attach to interface readers for write-only consumers. This have been partially implemented as net.bpf.optimize_writers hack, but it does not work for all direct BPF consumers (which are not using pcap(3) API). > > and some of this slowness is due to nc using small buffers which I will > fix shortly.. > > And with witness disabled it goes from 58MiB/s to 65.7MiB/s.. In > both cases, that's a 13% performance improvement by running w/o > dhclient... > > This is using the latest memstick image, r266655 on a (Lenovo T61): > FreeBSD 11.0-CURRENT #0 r266655: Sun May 25 18:55:02 UTC 2014 > root@grind.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 > FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 > WARNING: WITNESS option enabled, expect reduced performance. > CPU: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz (1995.05-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x6fb Family=0x6 Model=0xf Stepping=11 > Features=0xbfebfbff > Features2=0xe3bd > AMD Features=0x20100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > real memory = 2147483648 (2048 MB) > avail memory = 2014019584 (1920 MB) > --------------070504090202070208030308 Content-Type: text/x-patch; name="dhclient_fix.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dhclient_fix.diff" Index: sbin/dhclient/bpf.c =================================================================== --- sbin/dhclient/bpf.c (revision 266306) +++ sbin/dhclient/bpf.c (working copy) @@ -131,6 +131,11 @@ struct bpf_insn dhcp_bpf_wfilter[] = { int dhcp_bpf_wfilter_len = sizeof(dhcp_bpf_wfilter) / sizeof(struct bpf_insn); +struct bpf_insn dhcp_bpf_dfilter[] = { + BPF_STMT(BPF_RET+BPF_K, 0) +}; +int dhcp_bpf_dfilter_len = sizeof(dhcp_bpf_dfilter) / sizeof(struct bpf_insn); + void if_register_send(struct interface_info *info) { @@ -160,6 +165,12 @@ if_register_send(struct interface_info *info) if (ioctl(info->wfdesc, BIOCSETWF, &p) < 0) error("Can't install write filter program: %m"); + /* Set deny-all read filter for write socket */ + p.bf_len = dhcp_bpf_dfilter_len; + p.bf_insns = dhcp_bpf_dfilter; + if (ioctl(info->wfdesc, BIOCSETFNR, &p) < 0) + error("Can't install write filter program: %m"); + if (ioctl(info->wfdesc, BIOCLOCK, NULL) < 0) error("Cannot lock bpf"); --------------070504090202070208030308--