Date: Sat, 4 Jan 2020 22:34:49 -1000 (HST) From: Jeff Roberson <jroberson@jroberson.net> To: Mark Linimon <linimon@lonesome.com> Cc: Jeff Roberson <jeff@FreeBSD.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r356348 - in head/sys: kern vm Message-ID: <alpine.BSF.2.21.9999.2001042216480.1198@desktop> In-Reply-To: <20200105013314.GA3681@lonesome.com> References: <202001040315.0043FYhn047977@repo.freebsd.org> <20200105013314.GA3681@lonesome.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 5 Jan 2020, Mark Linimon wrote: > On Sat, Jan 04, 2020 at 03:15:34AM +0000, Jeff Roberson wrote: >> Use a separate lock for the zone and keg. > > Out of curiosity, will there be measurable real-world speedups from > this an similar work, or will this mostly apply to edge cases, or ... ? It depends on which real world. A lot of workloads don't really show much allocator activity. For very high speed networking, and especially very high speed networking on big NUMA machines, the speedup is considerable. Netflix reported the earlier round of work cut the time spent in uma by about 30%. For non-numa machines the last ~6 patches cut another 30% off of that in my tests. Even for Netflix, uma was not in the top 5 of their profile before this work. The major perf upshot was somewhere around an 8x improvement when freeing on a different NUMA domain than you allocated from when the allocation policy is first-touch. This is called a cross-domain or 'xdomain' free in the code. This made it possible to enable first-touch for UMA by default on all NUMA machines. I wrote a simple allocator perf test that loops allocating 2k mbufs and appending them to a random remote core's queue after which it drains its local queue. 10 million iterations across 32 cores in two numa domains gives 320,000,000 packets allocated and freed. The time within the same domain was about 4 seconds, before this patch series going to a different domain was around 40 seconds and after it was around 5 seconds. So only a ~25% penalty when doing 2 million packets-per-second-per-core. Many of the recent changes were really as much about code organization and readability as performance. After 18 years of features coming and going, reorganizations, etc. it was getting a bit crufty. Jeff > > mcl >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.21.9999.2001042216480.1198>