From owner-svn-src-head@freebsd.org Mon Sep 9 12:34:40 2019 Return-Path: Delivered-To: svn-src-head@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 01798D101B; Mon, 9 Sep 2019 12:34:40 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46Rnfl5wKXz3RKL; Mon, 9 Sep 2019 12:34:39 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1i7IsJ-000KCu-29; Mon, 09 Sep 2019 15:34:31 +0300 Date: Mon, 9 Sep 2019 15:34:31 +0300 From: Slawa Olhovchenkov To: Mark Johnston Cc: Andriy Gapon , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r351673 - in head: lib/libmemstat share/man/man9 sys/cddl/compat/opensolaris/kern sys/kern sys/vm Message-ID: <20190909123430.GI3953@zxy.spb.ru> References: <201909012222.x81MMh0F022462@repo.freebsd.org> <79c74018-1329-ee69-3480-e2f99821fa93@FreeBSD.org> <20190903161427.GA38096@zxy.spb.ru> <20190903220106.GB26733@raichu> <20190904144524.GD3953@zxy.spb.ru> <20190907145034.GB6523@spy> <20190907153110.GG3953@zxy.spb.ru> <20190908201819.GA49837@raichu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190908201819.GA49837@raichu> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-Rspamd-Queue-Id: 46Rnfl5wKXz3RKL X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-5.99 / 15.00]; NEURAL_HAM_MEDIUM(-0.99)[-0.994,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Sep 2019 12:34:40 -0000 On Sun, Sep 08, 2019 at 04:18:19PM -0400, Mark Johnston wrote: > On Sat, Sep 07, 2019 at 06:31:10PM +0300, Slawa Olhovchenkov wrote: > > On Sat, Sep 07, 2019 at 10:50:34AM -0400, Mark Johnston wrote: > > > > > On Wed, Sep 04, 2019 at 05:45:24PM +0300, Slawa Olhovchenkov wrote: > > > > On Tue, Sep 03, 2019 at 06:01:06PM -0400, Mark Johnston wrote: > > > > > > Mostly problem I am see at this > > > > > > work -- very slowly vm_page_free(). May be currenly this is more > > > > > > speedy... > > > > > > > > > > How did you determine this? > > > > > > > > This is you guess: > > > > > > So was the guess correct? > > > > I am just trust to you. > > How to check this guess? > > You can try to measure time spent in pmap_remove() relative to the rest > of _kmem_unback(). === fbt:kernel:_kmem_unback:entry { self->kts = timestamp; } fbt:kernel:_kmem_unback:return /self->kts/ { @ts["kts"] = sum(timestamp - self->kts); self->kts = 0; } fbt:kernel:pmap_remove:entry /self->kts/ { self->pts = timestamp; } fbt:kernel:pmap_remove:entry /self->kts == 0/ { self->ats = timestamp; } fbt:kernel:pmap_remove:return /self->pts/ { @ts["pts"] = sum(timestamp - self->pts); self->pts = 0; } fbt:kernel:pmap_remove:return /self->ats/ { @ts["ats"] = sum(timestamp - self->ats); self->ats = 0; } tick-1s { printa("%-8s %20@d\n", @ts); printf("\n"); clear(@ts); } === pts 7166680 ats 8143193 kts 16721822 pts 3458647 ats 5393452 kts 10504523 kts 0 pts 0 ats 2758752 pts 3387748 ats 4322653 kts 8100282 pts 4002444 ats 5422748 kts 11761955 pts 1151713 kts 2742176 ats 8242958 pts 34441 kts 145683 ats 4822328 kts 0 pts 0 ats 5357808 pts 16008 kts 148203 ats 4947368 pts 26945 kts 156011 ats 8368502 pts 323620 kts 1154981 ats 10093137 pts 79504 kts 228135 ats 7131059 pts 734062 kts 1747364 ats 4619796 pts 401453 kts 1036605 ats 8751919 > > > If so, IMO the real solution is to avoid kmem_* > > > for data buffers and use ABD instead. > > > > What problem resolve this? > > To allocate buffers larger than PAGE_SIZE the kernel must allocate a > number of physical pages and map them using the page tables. The step > of creating and destroying mappings is expensive and doesn't scale well > to many CPUs. With ABD, ZFS avoids this expense when its caches are > shrunk. > > > ABD any way is slowly vs kmem_*. > > Can we solve this problem instead? slowly ABD? or expensive of creating and destroying mappings? slowly ABD -- don't use ABD and scatter gather. expensive of creating and destroying mappings: - do kernel more NUMA-aware and go mostly alloc and free to zone cache (curenly may be allocation can do from dom1 and freeing in dom0. I am see this.) - calculate sensible working set size of zone cache and aim trim to do nothing - my be do batch removed/unmaped and do TLB shootdown only once at end of batch? > > > > ====== > > > > > while ((slab = SLIST_FIRST(&freeslabs)) != NULL) { > > > > > SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink); > > > > > keg_free_slab(keg, slab, keg->uk_ipers); > > > > > } > > > > > 2019 Feb 2 19:49:54.800524364 zio_data_buf_1048576 1032605 cache_reclaim limit 100 dom 0 nitems 1672 imin 298 > > > > > 2019 Feb 2 19:49:54.800524364 zio_data_buf_1048576 1033736 cache_reclaim recla 149 dom 0 nitems 1672 imin 298 > > > > > 2019 Feb 2 19:49:54.802524468 zio_data_buf_1048576 3119710 cache_reclaim limit 100 dom 1 nitems 1 imin 0 > > > > > 2019 Feb 2 19:49:54.802524468 zio_data_buf_1048576 3127550 keg_drain2 > > > > > 2019 Feb 2 19:49:54.803524487 zio_data_buf_1048576 4444219 keg_drain3 > > > > > 2019 Feb 2 19:49:54.838524634 zio_data_buf_1048576 39553705 keg_drain4 > > > > > 2019 Feb 2 19:49:54.838524634 zio_data_buf_1048576 39565323 zone_reclaim:return > > > > > > > > > > 35109.486 ms for last loop, 149 items to freed. > > > > > > > > 35ms to free 149MB (38144 4KB pages), so roughly 1us per page. That > > > > does seem like a lot, but freeing a page (vm_page_free(m)) is much > > > > more expensive than freeing an item to UMA (i.e., uma_zfree()). > > > > Most of that time will be spent in _kmem_unback(). > > > > ======