Date: Wed, 19 Dec 2018 05:42:00 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Bruce Evans <brde@optusnet.com.au> Cc: Andrew Gallatin <gallatin@cs.duke.edu>, Slava Shwartsman <slavash@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r341578 - head/sys/dev/mlx5/mlx5_en Message-ID: <20181219023918.E1895@besplex.bde.org> In-Reply-To: <20181219001335.A1412@besplex.bde.org> References: <201812051420.wB5EKwxr099242@repo.freebsd.org> <9e09a2f8-d9fd-7fde-8e5a-b7c566cdb6a9@cs.duke.edu> <20181218033137.Q2217@besplex.bde.org> <b81d9232-d703-2d4f-eec2-f9b48a0ccd3b@cs.duke.edu> <20181219001335.A1412@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 19 Dec 2018, Bruce Evans wrote: > On Mon, 17 Dec 2018, Andrew Gallatin wrote: > >> On 12/17/18 2:08 PM, Bruce Evans wrote: >* ... >>> iflib uses queuing techniques to significantly pessimize em NICs with 1 >>> hardware queue.=C2=A0 On fast machines, it attempts to do 1 context swi= tch per > ... >> This can happen even w/o contention when "abdicate" is enabled in mp >> ring. I complained about this as well, and the default was changed in >> mp ring to not always "abdicate" (eg, switch to the tq to handle the >> packet). Abdication substantially pessimizes Netflix style web uncontend= ed=20 >> workloads, but it generally helps small packet forwarding. >>=20 >> It is interesting that you see the opposite. I should try benchmarking >> with just a single ring. > > Hmm, I didn't remember "abdicated" and never knew about the sysctl for it > (the sysctl is newer), but I notices the slowdown from near the first > commit for it (r323954) and already used the folowing workaround for it: > ... > This essentialy just adds back the previous code with a flag to check > both versions. Hopefully the sysctl can do the same thing. It doesn't. Setting tx_abdicate to 1 gives even more context switches (alm= ost twice as many, 800k/sec instead of 400k/sec, on i386 pessimized by INVARIANTS, WITNESS, !WITNESS_SKIPSPIN, 4G KVA and more. Without pessimizations it does 1M/sec instea of 400k/sec). The behaviour is easy to understand by watchomg top -SH -m io with netblast bound to the same CPU as the main tgq. Then netblast does involuntary context switches at the same rate that the tgq does voluntary context switches, and tx_abdicate= =3D1 doubles this rare. netblast only switches at the quantum rate (11 per seco= nd) when not bound (I think it does null switches and it is a bug to count thes= e as switches, but even null switches do too much). This is also without my usual default of !PREEMPTION && !IPI_PREEMPTION. Binding the netblast to the same CPU as the tgq only stops the excessive context switches wihen !PREEMPTION. My hack might depend on this too. Unfortunately, the hack is not in the same kernels as the sysctl, and I already have too many combinations to test. Another test with only 4G KVA (no INVARIANTS, etc., no PREEMPTION): tx_abdicate=3D0: tgq switch rate 997-1017k/sec (16k/sec if netblast bound) tx_abdicate=3D1: tgq switch rate 1300-1350k/sec (16k/sec if netblast bound) Another test on amd64 to escape i386 4G KVA pessimizations: tx_abdicate=3D0: tgq switch rate 1110-1220k/sec (16k/sec if netblast bound) tx_abdicate=3D1: tgq switch rate 1360-1430k/sec (16k/sec if netblast bound) When netblast is bound to the tgq's CPU, the tgq actually runs on another CPU. Apparently, the binding is weak ot this is a bugfeature in my schedul= er. When tx_abdicate=3D1, the switch rate is close to the packet rate. Since t= he NIC can't keep up, most packets are dropped. On amd64 with tx_abdicate=3D1= , the packet rates are: netblast bound: 313kpps sent, 1604kpps dropped netblast unbound: 253kpps sent, 1153kpps dropped 253kpps sent is bad. This indicates large latencies (not due to !PREEMPTIO= N or secheduler bugs AFAIK). Most tests with netblast unbound seemed to satu= rate the NIC at 280kpps (but the tests with netblast bound shows that the NIC ca= n go a little faster). Even an old 2GHz CPU can reach 280kpps. This shows another problem with taskqueues. It takes context switches just to decide to drop packets. Previous versions of iflib were much slower at dropping packets. Some had rates closer to the low send rate than the 1604= kpps achieved above. FreeBSD-5 running on a single 3 times slower CPU can drop packets at 2124kpps, mainly by dropping them in ip_output() after peeking a= t the software ifqs to see that there is no space. IFF_MONITOR gives better tests of the syscall overhead. Another test with amd64 and I218V instead of PRO1000: netblast bound, !abdicate: 1243kpps sent, 0kpps dropped (16k/sec csw) netblast unbound, !abdicate: 1236kpps sent, 0kpps dropped (16k/sec csw) netblast bound, abdicate: 1485kpps sent, 243kpps dropped (16k/sec csw) netblast unbound, abdicate: 1407kpps sent, 1.7kpps dropped (850k/sec csw) There is an i386 dependency after all! !abdicate works on amd64 but not on i386 to prevent the excessive context switches. Unfortunately, it also reduces kpps by almost 20% and leaves no spare CPU for dropping packets. The best case of netblast bound, abdicate is competitive with FreeBSD-11 on i386 with EM_MULTIQUEUE: above result repeated: netblast bound, abdicate: 1485kpps sent, 243kpps dropped (16k/sec csw) previous best result: FBSD-11 SMP-8 1486+241 # no iflib, use EM_MULTIQUEUE (now saturate 1Gbp= s) (this is without PREEMPTION* and without binding netblast). The above for -current also has the lowest possible CPU use (100% of 1 CPU for all threads, while netblast unbound takes 100% of 1 CPU for netblast an= d 60% of another CPU for tgq), and I think the FBSD=3D11 case takes 100% of 1 CPU for netblast unbound and a tiny% of another CPU for the taskqueue and a tiny unaccounted % of various CPUs for the fast interrupt handler. The fast interrupt handler is not accounted for in all cases. Since interrupt moderation gives a rate of 8 kHz, the interrupt handler doesn't take very long, but if it does a single PCI read then that might take 1 usec so 8 kHz costs 1% for that alone. Why would tx_abdicate=3D0 give extra context switches for i386 but not for amd64? More interestingly, what does it do wrong to lose 20% in kpps sent and more in kpps dropped? Another test with PREEMPTION*: netblast bound, !abdicate: same as above netblast unbound, !abdicate: same as above netblast bound, abdicate: 578kpps sent, 0kpps dropped (1160k/sec csw) netblast unbound, abdicate: 1106kpps sent, 0kpps dropped (850k/sec csw) That is, abdicate with PREEMPTION to make it work as fully intended destroys performance for the netblast bound case where it fixes most peformance problems without PREEMPTION; for the network unbound case it only reduces performance by 30%. It uses the same amount of CPU as !PREEMPTION. Another test with PREEMPTION* and SCHED_ULE instead of SCHED_4BSD (PREEMPTION* works a little differently in different schedulers. IIRC, IPI_PREEMPTION is useless and is mostly ignored in 4BSD and entirely ignored in ULE, and PREEMPTION loses a little more preemption in ULE than in 4BSD): netblast bound, !abdicate: same as above netblast unbound, !abdicate: same as above netblast bound, abdicate: same as above (very bad) netblast unbound, abdicate: 1485kpps sent, 0kpps dropped (850k/sec csw) In the 2 !abdicate cases, the CPU uses is below 1% for tgq. The very bad case is independent of the scheduler. This is probably inherent (some sort of contention on the common bound CPU when preemption is done technically correctly). I expected this case was much slower before I tried it). But ULE doesn't have the 30% loss for PREEMPTION && netblast unbound && abdicate. The case where ULE is better has large latency for 4BSD. It can be mostly fixed by binding netblast to any set of CPUs not containing the one of tgq or the one in the same HTT pair as that. The speed is then only 85kpps slower than with ULE. Otherwise, even my version of 4BSD sees no reason not to run netblast on the one of the CPUs on the same core as the tgq. (My 4BSD changes affinity more often since in other benchmarks it is bad to wait until the preferred is available; binding of taskqueues and ithread= s too often steals preferred CPUs and causes this migration.) When netblast decides to run on the same CPU as tgq, it contends with the tgq. When it decides to run on the HTT pair, both CPUs run 33% slower. Anyway, I don't want the ~1M/sec context switches given by abdicate. Context switching is especially bad for 4BSD. It still uses sched_lock for everything (it appears to use thread_lock(), but this just uses sched_lock for 4BSD). The slowness from this is remarkably small in most cases. Bruce From owner-svn-src-head@freebsd.org Tue Dec 18 18:52:13 2018 Return-Path: <owner-svn-src-head@freebsd.org> Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3AE8B133ADDB; Tue, 18 Dec 2018 18:52:13 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D763882455; Tue, 18 Dec 2018 18:52:12 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id CA5E02F67; Tue, 18 Dec 2018 18:52:12 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wBIIqCKG090178; Tue, 18 Dec 2018 18:52:12 GMT (envelope-from imp@FreeBSD.org) Received: (from imp@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id wBIIqCI4090177; Tue, 18 Dec 2018 18:52:12 GMT (envelope-from imp@FreeBSD.org) Message-Id: <201812181852.wBIIqCI4090177@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: imp set sender to imp@FreeBSD.org using -f From: Warner Losh <imp@FreeBSD.org> Date: Tue, 18 Dec 2018 18:52:12 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r342195 - head X-SVN-Group: head X-SVN-Commit-Author: imp X-SVN-Commit-Paths: head X-SVN-Commit-Revision: 342195 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D763882455 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.96 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; NEURAL_HAM_SHORT(-0.96)[-0.964,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; NEURAL_HAM_LONG(-1.00)[-0.998,0] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current <svn-src-head.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>, <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/> List-Post: <mailto:svn-src-head@freebsd.org> List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>, <mailto:svn-src-head-request@freebsd.org?subject=subscribe> X-List-Received-Date: Tue, 18 Dec 2018 18:52:13 -0000 Author: imp Date: Tue Dec 18 18:52:12 2018 New Revision: 342195 URL: https://svnweb.freebsd.org/changeset/base/342195 Log: add pre-commit review request for drm*. Move dev/usb/wlan to sys/dev/usb/wlan as it was the odd-man-out. Modified: head/MAINTAINERS Modified: head/MAINTAINERS ============================================================================== --- head/MAINTAINERS Tue Dec 18 17:31:31 2018 (r342194) +++ head/MAINTAINERS Tue Dec 18 18:52:12 2018 (r342195) @@ -44,7 +44,6 @@ contrib/llvm dim Pre-commit review preferred. contrib/llvm/tools/lldb emaste Pre-commit review preferred. contrib/netbsd-tests freebsd-testing,ngie Pre-commit review requested. contrib/pjdfstest freebsd-testing,asomers,ngie,pjd Pre-commit review requested. -dev/usb/wlan adrian Pre-commit review requested, send to freebsd-wireless@freebsd.org *env(3) secteam Due to the problematic security history of this code, please have patches reviewed by secteam. etc/mail gshapiro Pre-commit review requested. Keep in sync with -STABLE. @@ -110,5 +109,8 @@ autofs(5) trasz Pre-commit review recommended. iscsi(4) trasz Pre-commit review recommended. rctl(8) trasz Pre-commit review recommended. sys/dev/ofw nwhitehorn Pre-commit review recommended. +sys/dev/drm* imp Pre-commit review requested in phabricator. Changes need to + be mirrored in gitub repo. +sys/dev/usb/wlan adrian Pre-commit review requested, send to freebsd-wireless@freebsd.org sys/arm/allwinner manu Pre-commit review requested sys/arm64/rockchip manu Pre-commit review requested
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20181219023918.E1895>