Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Dec 2018 05:42:00 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Andrew Gallatin <gallatin@cs.duke.edu>,  Slava Shwartsman <slavash@freebsd.org>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r341578 - head/sys/dev/mlx5/mlx5_en
Message-ID:  <20181219023918.E1895@besplex.bde.org>
In-Reply-To: <20181219001335.A1412@besplex.bde.org>
References:  <201812051420.wB5EKwxr099242@repo.freebsd.org> <9e09a2f8-d9fd-7fde-8e5a-b7c566cdb6a9@cs.duke.edu> <20181218033137.Q2217@besplex.bde.org> <b81d9232-d703-2d4f-eec2-f9b48a0ccd3b@cs.duke.edu> <20181219001335.A1412@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 19 Dec 2018, Bruce Evans wrote:

> On Mon, 17 Dec 2018, Andrew Gallatin wrote:
>
>> On 12/17/18 2:08 PM, Bruce Evans wrote:
>* ...
>>> iflib uses queuing techniques to significantly pessimize em NICs with 1
>>> hardware queue.=C2=A0 On fast machines, it attempts to do 1 context swi=
tch per
> ...
>> This can happen even w/o contention when "abdicate" is enabled in mp
>> ring. I complained about this as well, and the default was changed in
>> mp ring to not always "abdicate" (eg, switch to the tq to handle the
>> packet). Abdication substantially pessimizes Netflix style web uncontend=
ed=20
>> workloads, but it generally helps small packet forwarding.
>>=20
>> It is interesting that you see the opposite.  I should try benchmarking
>> with just a single ring.
>
> Hmm, I didn't remember "abdicated" and never knew about the sysctl for it
> (the sysctl is newer), but I notices the slowdown from near the first
> commit for it (r323954) and already used the folowing workaround for it:
> ...
> This essentialy just adds back the previous code with a flag to check
> both versions.  Hopefully the sysctl can do the same thing.

It doesn't.  Setting tx_abdicate to 1 gives even more context switches (alm=
ost
twice as many, 800k/sec instead of 400k/sec, on i386 pessimized by
INVARIANTS, WITNESS, !WITNESS_SKIPSPIN, 4G KVA and more.  Without
pessimizations it does 1M/sec instea of 400k/sec).  The behaviour is easy
to understand by watchomg top -SH -m io with netblast bound to the same
CPU as the main tgq.  Then netblast does involuntary context switches at
the same rate that the tgq does voluntary context switches, and tx_abdicate=
=3D1
doubles this rare.  netblast only switches at the quantum rate (11 per seco=
nd)
when not bound (I think it does null switches and it is a bug to count thes=
e
as switches, but even null switches do too much).

This is also without my usual default of !PREEMPTION && !IPI_PREEMPTION.
Binding the netblast to the same CPU as the tgq only stops the excessive
context switches wihen !PREEMPTION.  My hack might depend on this too.
Unfortunately, the hack is not in the same kernels as the sysctl, and I
already have too many combinations to test.

Another test with only 4G KVA (no INVARIANTS, etc., no PREEMPTION):
tx_abdicate=3D0: tgq switch rate  997-1017k/sec (16k/sec if netblast bound)
tx_abdicate=3D1: tgq switch rate 1300-1350k/sec (16k/sec if netblast bound)

Another test on amd64 to escape i386 4G KVA pessimizations:
tx_abdicate=3D0: tgq switch rate 1110-1220k/sec (16k/sec if netblast bound)
tx_abdicate=3D1: tgq switch rate 1360-1430k/sec (16k/sec if netblast bound)

When netblast is bound to the tgq's CPU, the tgq actually runs on another
CPU.  Apparently, the binding is weak ot this is a bugfeature in my schedul=
er.

When tx_abdicate=3D1, the switch rate is close to the packet rate.  Since t=
he
NIC can't keep up, most packets are dropped.  On amd64 with tx_abdicate=3D1=
,
the packet rates are:

netblast bound:   313kpps sent, 1604kpps dropped
netblast unbound: 253kpps sent, 1153kpps dropped

253kpps sent is bad.  This indicates large latencies (not due to !PREEMPTIO=
N
or secheduler bugs AFAIK).  Most tests with netblast unbound seemed to satu=
rate
the NIC at 280kpps (but the tests with netblast bound shows that the NIC ca=
n
go a little faster).  Even an old 2GHz CPU can reach 280kpps.

This shows another problem with taskqueues.  It takes context switches just
to decide to drop packets.  Previous versions of iflib were much slower at
dropping packets.  Some had rates closer to the low send rate than the 1604=
kpps
achieved above.  FreeBSD-5 running on a single 3 times slower CPU can drop
packets at 2124kpps, mainly by dropping them in ip_output() after peeking a=
t
the software ifqs to see that there is no space.  IFF_MONITOR gives better
tests of the syscall overhead.

Another test with amd64 and I218V instead of PRO1000:

netblast bound, !abdicate:   1243kpps sent,   0kpps dropped  (16k/sec csw)
netblast unbound, !abdicate: 1236kpps sent,   0kpps dropped  (16k/sec csw)
netblast bound, abdicate:    1485kpps sent, 243kpps dropped  (16k/sec csw)
netblast unbound, abdicate:  1407kpps sent, 1.7kpps dropped (850k/sec csw)

There is an i386 dependency after all!  !abdicate works on amd64 but not
on i386 to prevent the excessive context switches.  Unfortunately, it also
reduces kpps by almost 20% and leaves no spare CPU for dropping packets.

The best case of netblast bound, abdicate is competitive with FreeBSD-11
on i386 with EM_MULTIQUEUE: above result repeated:

netblast bound, abdicate:    1485kpps sent, 243kpps dropped  (16k/sec csw)

previous best result:

FBSD-11     SMP-8 1486+241 # no iflib, use EM_MULTIQUEUE (now saturate 1Gbp=
s)

(this is without PREEMPTION* and without binding netblast).

The above for -current also has the lowest possible CPU use (100% of 1 CPU
for all threads, while netblast unbound takes 100% of 1 CPU for netblast an=
d
60% of another CPU for tgq), and I think the FBSD=3D11 case takes 100% of 1
CPU for netblast unbound and a tiny% of another CPU for the taskqueue and
a tiny unaccounted % of various CPUs for the fast interrupt handler.  The
fast interrupt handler is not accounted for in all cases.  Since interrupt
moderation gives a rate of 8 kHz, the interrupt handler doesn't take very
long, but if it does a single PCI read then that might take 1 usec so 8 kHz
costs 1% for that alone.

Why would tx_abdicate=3D0 give extra context switches for i386 but not
for amd64?  More interestingly, what does it do wrong to lose 20% in
kpps sent and more in kpps dropped?

Another test with PREEMPTION*:

netblast bound, !abdicate:   same as above
netblast unbound, !abdicate: same as above
netblast bound, abdicate:     578kpps sent, 0kpps dropped (1160k/sec csw)
netblast unbound, abdicate:  1106kpps sent, 0kpps dropped  (850k/sec csw)

That is, abdicate with PREEMPTION to make it work as fully intended
destroys performance for the netblast bound case where it fixes most
peformance problems without PREEMPTION; for the network unbound case it
only reduces performance by 30%.  It uses the same amount of CPU as
!PREEMPTION.

Another test with PREEMPTION* and SCHED_ULE instead of SCHED_4BSD
(PREEMPTION* works a little differently in different schedulers.  IIRC,
IPI_PREEMPTION is useless and is mostly ignored in 4BSD and entirely
ignored in ULE, and PREEMPTION loses a little more preemption in ULE
than in 4BSD):

netblast bound, !abdicate:   same as above
netblast unbound, !abdicate: same as above
netblast bound, abdicate:    same as above (very bad)
netblast unbound, abdicate:  1485kpps sent, 0kpps dropped  (850k/sec csw)

In the 2 !abdicate cases, the CPU uses is below 1% for tgq.  The very
bad case is independent of the scheduler.  This is probably inherent
(some sort of contention on the common bound CPU when preemption is
done technically correctly).  I expected this case was much slower
before I tried it).  But ULE doesn't have the 30% loss for PREEMPTION
&& netblast unbound && abdicate.

The case where ULE is better has large latency for 4BSD.  It can be mostly
fixed by binding netblast to any set of CPUs not containing the one of
tgq or the one in the same HTT pair as that.  The speed is then only 85kpps
slower than with ULE.  Otherwise, even my version of 4BSD sees no reason
not to run netblast on the one of the CPUs on the same core as the tgq.
(My 4BSD changes affinity more often since in other benchmarks it is bad
to wait until the preferred is available; binding of taskqueues and ithread=
s
too often steals preferred CPUs and causes this migration.)  When netblast
decides to run on the same CPU as tgq, it contends with the tgq.  When it
decides to run on the HTT pair, both CPUs run 33% slower.

Anyway, I don't want the ~1M/sec context switches given by abdicate.
Context switching is especially bad for 4BSD.  It still uses sched_lock
for everything (it appears to use thread_lock(), but this just uses
sched_lock for 4BSD).  The slowness from this is remarkably small in
most cases.

Bruce
From owner-svn-src-head@freebsd.org  Tue Dec 18 18:52:13 2018
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3AE8B133ADDB;
 Tue, 18 Dec 2018 18:52:13 +0000 (UTC) (envelope-from imp@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (4096 bits) client-digest SHA256)
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D763882455;
 Tue, 18 Dec 2018 18:52:12 +0000 (UTC) (envelope-from imp@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id CA5E02F67;
 Tue, 18 Dec 2018 18:52:12 +0000 (UTC) (envelope-from imp@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wBIIqCKG090178;
 Tue, 18 Dec 2018 18:52:12 GMT (envelope-from imp@FreeBSD.org)
Received: (from imp@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id wBIIqCI4090177;
 Tue, 18 Dec 2018 18:52:12 GMT (envelope-from imp@FreeBSD.org)
Message-Id: <201812181852.wBIIqCI4090177@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: imp set sender to imp@FreeBSD.org
 using -f
From: Warner Losh <imp@FreeBSD.org>
Date: Tue, 18 Dec 2018 18:52:12 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-head@freebsd.org
Subject: svn commit: r342195 - head
X-SVN-Group: head
X-SVN-Commit-Author: imp
X-SVN-Commit-Paths: head
X-SVN-Commit-Revision: 342195
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: D763882455
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-2.96 / 15.00];
 local_wl_from(0.00)[FreeBSD.org];
 NEURAL_HAM_MEDIUM(-1.00)[-0.998,0];
 NEURAL_HAM_SHORT(-0.96)[-0.964,0];
 ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US];
 NEURAL_HAM_LONG(-1.00)[-0.998,0]
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>;
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Dec 2018 18:52:13 -0000

Author: imp
Date: Tue Dec 18 18:52:12 2018
New Revision: 342195
URL: https://svnweb.freebsd.org/changeset/base/342195

Log:
  add pre-commit review request for drm*.
  Move dev/usb/wlan to sys/dev/usb/wlan as it was the odd-man-out.

Modified:
  head/MAINTAINERS

Modified: head/MAINTAINERS
==============================================================================
--- head/MAINTAINERS	Tue Dec 18 17:31:31 2018	(r342194)
+++ head/MAINTAINERS	Tue Dec 18 18:52:12 2018	(r342195)
@@ -44,7 +44,6 @@ contrib/llvm		dim	Pre-commit review preferred.
 contrib/llvm/tools/lldb	emaste	Pre-commit review preferred.
 contrib/netbsd-tests	freebsd-testing,ngie	Pre-commit review requested.
 contrib/pjdfstest	freebsd-testing,asomers,ngie,pjd	Pre-commit review requested.
-dev/usb/wlan	adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
 *env(3)		secteam	Due to the problematic security history of this
 			code, please have patches reviewed by secteam.
 etc/mail	gshapiro	Pre-commit review requested.  Keep in sync with -STABLE.
@@ -110,5 +109,8 @@ autofs(5)	trasz	Pre-commit review recommended.
 iscsi(4)	trasz	Pre-commit review recommended.
 rctl(8)		trasz	Pre-commit review recommended.
 sys/dev/ofw	nwhitehorn	Pre-commit review recommended.
+sys/dev/drm*	imp	Pre-commit review requested in phabricator. Changes need to
+				be mirrored in gitub repo.
+sys/dev/usb/wlan adrian	Pre-commit review requested, send to freebsd-wireless@freebsd.org
 sys/arm/allwinner	manu	Pre-commit review requested
 sys/arm64/rockchip	manu	Pre-commit review requested



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20181219023918.E1895>