Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Sep 2019 15:34:31 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Andriy Gapon <avg@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r351673 - in head: lib/libmemstat share/man/man9 sys/cddl/compat/opensolaris/kern sys/kern sys/vm
Message-ID:  <20190909123430.GI3953@zxy.spb.ru>
In-Reply-To: <20190908201819.GA49837@raichu>
References:  <201909012222.x81MMh0F022462@repo.freebsd.org> <79c74018-1329-ee69-3480-e2f99821fa93@FreeBSD.org> <20190903161427.GA38096@zxy.spb.ru> <20190903220106.GB26733@raichu> <20190904144524.GD3953@zxy.spb.ru> <20190907145034.GB6523@spy> <20190907153110.GG3953@zxy.spb.ru> <20190908201819.GA49837@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Sep 08, 2019 at 04:18:19PM -0400, Mark Johnston wrote:

> On Sat, Sep 07, 2019 at 06:31:10PM +0300, Slawa Olhovchenkov wrote:
> > On Sat, Sep 07, 2019 at 10:50:34AM -0400, Mark Johnston wrote:
> > 
> > > On Wed, Sep 04, 2019 at 05:45:24PM +0300, Slawa Olhovchenkov wrote:
> > > > On Tue, Sep 03, 2019 at 06:01:06PM -0400, Mark Johnston wrote:
> > > > > > Mostly problem I am see at this
> > > > > > work -- very slowly vm_page_free(). May be currenly this is more
> > > > > > speedy...
> > > > > 
> > > > > How did you determine this?
> > > > 
> > > > This is you guess:
> > > 
> > > So was the guess correct?
> > 
> > I am just trust to you.
> > How to check this guess?
> 
> You can try to measure time spent in pmap_remove() relative to the rest
> of _kmem_unback().

===
fbt:kernel:_kmem_unback:entry                 {   self->kts = timestamp; }
fbt:kernel:_kmem_unback:return /self->kts/    {  @ts["kts"] = sum(timestamp - self->kts);  self->kts = 0; }
fbt:kernel:pmap_remove:entry   /self->kts/    {  self->pts = timestamp; }
fbt:kernel:pmap_remove:entry  /self->kts == 0/ { self->ats = timestamp; }
fbt:kernel:pmap_remove:return /self->pts/     {  @ts["pts"] = sum(timestamp - self->pts); self->pts = 0; }
fbt:kernel:pmap_remove:return /self->ats/     {  @ts["ats"] = sum(timestamp - self->ats); self->ats = 0; }

tick-1s
{
  printa("%-8s %20@d\n", @ts);
  printf("\n");
  clear(@ts);
}
===

pts                   7166680
ats                   8143193
kts                  16721822

pts                   3458647
ats                   5393452
kts                  10504523

kts                         0
pts                         0
ats                   2758752

pts                   3387748
ats                   4322653
kts                   8100282

pts                   4002444
ats                   5422748
kts                  11761955

pts                   1151713
kts                   2742176
ats                   8242958

pts                     34441
kts                    145683
ats                   4822328

kts                         0
pts                         0
ats                   5357808

pts                     16008
kts                    148203
ats                   4947368

pts                     26945
kts                    156011
ats                   8368502

pts                    323620
kts                   1154981
ats                  10093137

pts                     79504
kts                    228135
ats                   7131059

pts                    734062
kts                   1747364
ats                   4619796

pts                    401453
kts                   1036605
ats                   8751919

> > > If so, IMO the real solution is to avoid kmem_*
> > > for data buffers and use ABD instead.
> > 
> > What problem resolve this?
> 
> To allocate buffers larger than PAGE_SIZE the kernel must allocate a
> number of physical pages and map them using the page tables.  The step
> of creating and destroying mappings is expensive and doesn't scale well
> to many CPUs.  With ABD, ZFS avoids this expense when its caches are
> shrunk.
> 
> > ABD any way is slowly vs kmem_*.
> 
> Can we solve this problem instead?

slowly ABD? or expensive of creating and destroying mappings?
slowly ABD -- don't use ABD and scatter gather.
expensive of creating and destroying mappings:

- do kernel more NUMA-aware and go mostly alloc and free to zone cache
  (curenly may be allocation can do from dom1 and freeing in dom0. I am see this.)
- calculate sensible working set size of zone cache and aim trim to do nothing
- my be do batch removed/unmaped and do TLB shootdown only once at end of batch?

> > > > ======
> > > > >         while ((slab = SLIST_FIRST(&freeslabs)) != NULL) {
> > > > >                 SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink);
> > > > >                 keg_free_slab(keg, slab, keg->uk_ipers);
> > > > >         }
> > > > > 2019 Feb  2 19:49:54.800524364       zio_data_buf_1048576  1032605 cache_reclaim limit      100 dom 0 nitems     1672 imin      298
> > > > > 2019 Feb  2 19:49:54.800524364       zio_data_buf_1048576  1033736 cache_reclaim recla      149 dom 0 nitems     1672 imin      298
> > > > > 2019 Feb  2 19:49:54.802524468       zio_data_buf_1048576  3119710 cache_reclaim limit      100 dom 1 nitems        1 imin        0
> > > > > 2019 Feb  2 19:49:54.802524468       zio_data_buf_1048576  3127550 keg_drain2
> > > > > 2019 Feb  2 19:49:54.803524487       zio_data_buf_1048576  4444219 keg_drain3
> > > > > 2019 Feb  2 19:49:54.838524634       zio_data_buf_1048576 39553705 keg_drain4
> > > > > 2019 Feb  2 19:49:54.838524634       zio_data_buf_1048576 39565323 zone_reclaim:return
> > > > >
> > > > > 35109.486 ms for last loop, 149 items to freed.
> > > > 
> > > > 35ms to free 149MB (38144 4KB pages), so roughly 1us per page.  That
> > > > does seem like a lot, but freeing a page (vm_page_free(m)) is much
> > > > more expensive than freeing an item to UMA (i.e., uma_zfree()).
> > > > Most of that time will be spent in _kmem_unback().
> > > > ======



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190909123430.GI3953>