From owner-svn-src-head@freebsd.org Wed Dec 19 22:17:25 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B7785134ACFF; Wed, 19 Dec 2018 22:17:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id C55FC72627; Wed, 19 Dec 2018 22:17:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 8907B105526E; Thu, 20 Dec 2018 09:17:03 +1100 (AEDT) Date: Thu, 20 Dec 2018 09:17:02 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Andrew Gallatin , Slava Shwartsman , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r341578 - head/sys/dev/mlx5/mlx5_en In-Reply-To: <20181219023918.E1895@besplex.bde.org> Message-ID: <20181220075104.H1360@besplex.bde.org> References: <201812051420.wB5EKwxr099242@repo.freebsd.org> <9e09a2f8-d9fd-7fde-8e5a-b7c566cdb6a9@cs.duke.edu> <20181218033137.Q2217@besplex.bde.org> <20181219001335.A1412@besplex.bde.org> <20181219023918.E1895@besplex.bde.org> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=nlC_4_pT8q9DhB4Ho9EA:9 a=HnwGlJzEaBgL8TtGiP0A:9 a=45ClL6m2LaAA:10 X-Rspamd-Queue-Id: C55FC72627 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-4.96 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.997,0]; RCVD_COUNT_TWO(0.00)[2]; MX_INVALID(0.50)[greylisted]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[optusnet.com.au]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; MIME_GOOD(-0.10)[multipart/mixed,text/plain]; MIME_TRACE(0.00)[0:+,1:+]; DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[6]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-0.95)[-0.951,0]; CTYPE_MIXED_BOGUS(1.00)[]; IP_SCORE(-3.22)[ip: (-8.47), ipnet: 211.28.0.0/14(-4.21), asn: 4804(-3.37), country: AU(-0.03)]; FREEMAIL_TO(0.00)[optusnet.com.au]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; RWL_MAILSPIKE_POSSIBLE(0.00)[249.132.29.211.rep.mailspike.net : 127.0.0.17] Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2018 22:17:26 -0000 On Wed, 19 Dec 2018, Bruce Evans wrote: > On Wed, 19 Dec 2018, Bruce Evans wrote: > >> On Mon, 17 Dec 2018, Andrew Gallatin wrote: >>=20 >>> On 12/17/18 2:08 PM, Bruce Evans wrote: >> * ... >>>> iflib uses queuing techniques to significantly pessimize em NICs with = 1 >>>> hardware queue.=C2=A0 On fast machines, it attempts to do 1 context sw= itch=20 >>>> per >> ... >>> This can happen even w/o contention when "abdicate" is enabled in mp >>> ring. I complained about this as well, and the default was changed in >>> mp ring to not always "abdicate" (eg, switch to the tq to handle the >>> packet). Abdication substantially pessimizes Netflix style web unconten= ded=20 >>> workloads, but it generally helps small packet forwarding. >>>=20 >>> It is interesting that you see the opposite. I should try benchmarking >>> with just a single ring. >>=20 >> Hmm, I didn't remember "abdicated" and never knew about the sysctl for i= t >> (the sysctl is newer), but I notices the slowdown from near the first >> commit for it (r323954) and already used the folowing workaround for it: >> ... >> This essentialy just adds back the previous code with a flag to check >> both versions. Hopefully the sysctl can do the same thing. > > It doesn't. Setting tx_abdicate to 1 gives even more context switches=20 > (almost > twice as many, 800k/sec instead of 400k/sec, on i386 pessimized by > INVARIANTS, WITNESS, !WITNESS_SKIPSPIN, 4G KVA and more. Without > ... I now understand most of the slownesses and variations in benchmarks. Short summary: After arcane tuning including a sysctl only available in my version of SCHED_4BSD, on amd64 iflib in -current runs as fast as old old em with EM_MULTIQUEUE and no other tuning in FreeBSD-11; i386 also needs a CPU almost 3 times faster to compensate for the overhead of having 4G KVA (bit no other security pessimizations in either). Long summary: iflib with tx_abdicate=3D0 runs a bit like old em without EM_MULTIQUEUE, provided the NIC is I218V and not PRO1000 and/or the CPU is too slow to saturate the NIC and/or the network. iflib is just 10% slower. Neither does excessive context switches to tgq with I218V (context switches seem to be limited to not much more than 2 per h/w interrupt, and h/w interrupts are normally moderated to 8kHz). However, iflib does excessive context switches for PRO1000. I don't know if this is for hardware reasons or just for dropping packets. iflib with tx_abdicate=3D1 runs a bit like old em with EM_MULTIQUEUE. Due to general slowness, even a 4GHz i7 has difficulty saturating 1Gbps etherne= t with small packets. tx_abdicate=3D1 allows it to saturate by using tgq mor= e. This causes lots of context switches and otherwise uses lots of CPU (60% of a 4GHz i7 for iflib). Old em with EM_MULTIQUEUE gives identical kpps and saturation and dropped packets for spare cycles on the CPU producing the packets, but I think it does less context switches and uses less CPU for tgq. This is mostly for the I218V. I got apparently-qualitativly-different results on i386 because I mostly tested i386 with the PRO1000 where there are excessive context switches on both amd64 and i386 with tx_abdicate=3D0. tx_abdicate=3D1 gives even mo= re excessive context switches (about twice as many) for the PRO1000. I got apparently-qualitativly-different results for some old benchmarks because I used an old version of FreeBSD (r332488) for many of them, and also had version problems within this version. iflib in this version forces tx_abdicate=3D1. I noticed the extra context switches from this long ago, and had an option which defaulted to using older iflib code which seemed to work better. But I misedited the non-default case of this and had the double drainage check bug that was added in -current in r366560 and fixed in -current in r341824. This gave excessive extra context switches, so the commit that added abdication (r323954) seemed to be even slower than it was. The fastest case by a significant amount (saturation on I218V using 1.6 times less CPU) is with netblast bound to the same CPU as tgq, PREEMPTION* not configured, and my scheduler modification that reduces preemption even further, and this modification selected using a sysctl), and tx_abdicate=3D1. Then the scheduler modification delays most switches to tgq, and tx_abdicate=3D1 apparently allows such context switches when they are useful (I think netblast fills a queue and then tx_abdiscate=3D1 gives a context switch immediately, but tx_abdicate=3D0 doesn't give a context switch soon enough). But without the scheduler modification, this is the slowest case (tx_abdicate=3D1 forces context switches to tgq after every packet, and since netblast is bound to the same CPU, it can't run. In both cases, only 1 CPU is used, but the context switches reduce throughput by about a factor of 2. It is less clear why througput counting dropped packets is lower for netblast not bound and tx_abdicate=3D0. Then tgq apparently doesn't run promptly enough to saturate the net, but netblast has its own CPU so it doesn't stop when tgq runs so it should be able to produce even more packets (many more dropped ones) than in the fastest case. This might be caused by lock contention. > When netblast is bound to the tgq's CPU, the tgq actually runs on another > CPU. Apparently, the binding is weak ot this is a bugfeature in my=20 > scheduler. It is a feature that I have forgotten about. It was originally a bug, but I happened to noticed that it reduced the context switches for iflib long ago, so made it a feature. I didn't noticed then that it also improved throughput signficantly. I made it the default for !PREEMPTION only, then forgot that it was stronger than !PREEMPTION. The feature is to not reschedule on all CPUs when a thread becomes runnable= =2E Only reschedule on the current CPU. Also, run an idle CPU if there is one. This is not suitable for general use since it results in low priority threa= ds staying running until the end of their quantum or voluntary context switch instead of running a higher priority thread. Plain !PREEMPTION does much the same thing for threads at a user priority, but preempts from user priority to kernel priority. > ... > Another test with amd64 and I218V instead of PRO1000: > > netblast bound, !abdicate: 1243kpps sent, 0kpps dropped (16k/sec csw= ) > netblast unbound, !abdicate: 1236kpps sent, 0kpps dropped (16k/sec csw= ) > netblast bound, abdicate: 1485kpps sent, 243kpps dropped (16k/sec csw= ) > netblast unbound, abdicate: 1407kpps sent, 1.7kpps dropped (850k/sec csw= ) All working correctly, except the throughput is a bit low with !abdicate and abdicate takes too much CPU. This must be with my anti-preemption for the low csw in the 3rd case. > There is an i386 dependency after all! !abdicate works on amd64 but not > on i386 to prevent the excessive context switches. Unfortunately, it als= o > reduces kpps by almost 20% and leaves no spare CPU for dropping packets. The deependency was actually on the NIC. > Why would tx_abdicate=3D0 give extra context switches for i386 but not > for amd64? More interestingly, what does it do wrong to lose 20% in > kpps sent and more in kpps dropped? Apparently, some dependency on the NIC. kpps is lost because less CPU is used and 1 4GHz CPU can't keep up. (I use CC and CFLAGS optimized for debugging. This costs about 10% in sys time.) > Another test with PREEMPTION*: > > netblast bound, !abdicate: same as above > netblast unbound, !abdicate: same as above > netblast bound, abdicate: 578kpps sent, 0kpps dropped (1160k/sec csw) > netblast unbound, abdicate: 1106kpps sent, 0kpps dropped (850k/sec csw) > > That is, abdicate with PREEMPTION to make it work as fully intended > destroys performance for the netblast bound case where it fixes most > peformance problems without PREEMPTION; for the network unbound case it > only reduces performance by 30%. It uses the same amount of CPU as > !PREEMPTION. This is because PREEMPTION makes the netblast unbound case run into tgq always. Unquoted benchmarks show that the netblast unbound case is slower than usual because mis-scheduling makes netblast run into tgq sometimes. The regression in average ping latency is actually from 73 usec to 84 usec. Anti-preemption might give latencies of 100 msec, but actually makes no difference for ping latency However, I now rememeber that the maximum ping latency is often 2 msec. That is too high. This is with itr=3D0, which is the main part of turning off interrupt moderation. Even the default itr of 125 usec only increases worst-case latencies by 125 usec. Bruce From owner-svn-src-head@freebsd.org Wed Dec 19 22:30:27 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9EA51134B254; Wed, 19 Dec 2018 22:30:27 +0000 (UTC) (envelope-from mjg@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 36C1672EDE; Wed, 19 Dec 2018 22:30:27 +0000 (UTC) (envelope-from mjg@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 262A81C3D7; Wed, 19 Dec 2018 22:30:27 +0000 (UTC) (envelope-from mjg@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wBJMUQCW039051; Wed, 19 Dec 2018 22:30:26 GMT (envelope-from mjg@FreeBSD.org) Received: (from mjg@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id wBJMUQI7039050; Wed, 19 Dec 2018 22:30:26 GMT (envelope-from mjg@FreeBSD.org) Message-Id: <201812192230.wBJMUQI7039050@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mjg set sender to mjg@FreeBSD.org using -f From: Mateusz Guzik Date: Wed, 19 Dec 2018 22:30:26 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r342247 - head/sys/security/mac X-SVN-Group: head X-SVN-Commit-Author: mjg X-SVN-Commit-Paths: head/sys/security/mac X-SVN-Commit-Revision: 342247 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 36C1672EDE X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.98 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_SHORT(-0.98)[-0.978,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_LONG(-1.00)[-0.998,0] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Dec 2018 22:30:27 -0000 Author: mjg Date: Wed Dec 19 22:30:26 2018 New Revision: 342247 URL: https://svnweb.freebsd.org/changeset/base/342247 Log: mac: reduce pessimization of sdt probe handling Prior to the change the code would branch on return value and then check if probes are enabled. Since vast majority of the time they are not, this is clearly wasteful. Check probes first. Sponsored by: The FreeBSD Foundation Modified: head/sys/security/mac/mac_internal.h Modified: head/sys/security/mac/mac_internal.h ============================================================================== --- head/sys/security/mac/mac_internal.h Wed Dec 19 22:17:24 2018 (r342246) +++ head/sys/security/mac/mac_internal.h Wed Dec 19 22:30:26 2018 (r342247) @@ -98,12 +98,14 @@ SDT_PROVIDER_DECLARE(mac_framework); /* Entry points t "int", arg0); #define MAC_CHECK_PROBE4(name, error, arg0, arg1, arg2, arg3) do { \ - if (error) { \ - SDT_PROBE5(mac_framework, , name, mac__check__err, \ - error, arg0, arg1, arg2, arg3); \ - } else { \ - SDT_PROBE5(mac_framework, , name, mac__check__ok, \ - 0, arg0, arg1, arg2, arg3); \ + if (SDT_PROBES_ENABLED()) { \ + if (error) { \ + SDT_PROBE5(mac_framework, , name, mac__check__err,\ + error, arg0, arg1, arg2, arg3); \ + } else { \ + SDT_PROBE5(mac_framework, , name, mac__check__ok,\ + 0, arg0, arg1, arg2, arg3); \ + } \ } \ } while (0) @@ -122,12 +124,14 @@ SDT_PROVIDER_DECLARE(mac_framework); /* Entry points t "int", arg0, arg1); #define MAC_GRANT_PROBE2(name, error, arg0, arg1) do { \ - if (error) { \ - SDT_PROBE3(mac_framework, , name, mac__grant__err, \ - error, arg0, arg1); \ - } else { \ - SDT_PROBE3(mac_framework, , name, mac__grant__ok, \ - error, arg0, arg1); \ + if (SDT_PROBES_ENABLED()) { \ + if (error) { \ + SDT_PROBE3(mac_framework, , name, mac__grant__err,\ + error, arg0, arg1); \ + } else { \ + SDT_PROBE3(mac_framework, , name, mac__grant__ok,\ + error, arg0, arg1); \ + } \ } \ } while (0)