Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Aug 2014 20:51:03 +0100
From:      "Steven Hartland" <smh@freebsd.org>
To:        "Peter Wemm" <peter@wemm.org>, "Alan Cox" <alc@rice.edu>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Dmitry Morozovsky <marck@rinet.ru>, "Matthew D. Fuller" <fullermd@over-yonder.net>
Subject:   Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm
Message-ID:  <5A300D962A1B458B951D521EA2BE35E8@multiplay.co.uk>
References:  <201408281950.s7SJo90I047213@svn.freebsd.org> <4A4B2C2D36064FD9840E3603D39E58E0@multiplay.co.uk> <5400B052.6030103@rice.edu> <1592506.xpuae4IYcM@overcee.wemm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Friday 29 August 2014 11:54:42 Alan Cox wrote:
snip...

> > > Others have also confirmed that even with r265945 they can still 
> > > trigger
> > > performance issue.
> > >
> > > In addition without it we still have loads of RAM sat their 
> > > unused, in my
> > > particular experience we have 40GB of 192GB sitting their unused 
> > > and that
> > > was with a stable build from last weekend.
> >
> > The Solaris code only imposed this limit on 32-bit machines where 
> > the
> > available kernel virtual address space may be much less than the
> > available physical memory.  Previously, FreeBSD imposed this limit 
> > on
> > both 32-bit and 64-bit machines.  Now, it imposes it on neither. 
> > Why
> > continue to do this differently from Solaris?

My understanding is these limits where totally different on Solaris see 
the
#ifdef sun block in arc_reclaim_needed() for details. I actually started 
at
matching the Solaris flow but this had already been tested and proved 
not
to work as well as the current design.

> Since the question was asked below, we don't have zfs machines in the 
> cluster
> running i386.  We can barely get them to boot as it is due to kva 
> pressure.
> We have to reduce/cap physical memory and change the user/kernel 
> virtual split
> from 3:1 to 2.5:1.5.
>
> We do run zfs on small amd64 machines with 2G of ram, but I can't 
> imagine it
> working on the 10G i386 PAE machines that we have.
>
>
> > > With the patch we confirmed that both RAM usage and performance 
> > > for those
> > > seeing that issue are resolved, with no reported regressions.
> > >
> > >> (I should know better than to fire a reply off before full fact
> > >> checking, but
> > >> this commit worries me..)
> > >
> > > Not a problem, its great to know people pay attention to changes, 
> > > and
> > > raise
> > > their concerns. Always better to have a discussion about potential 
> > > issues
> > > than to wait for a problem to occur.
> > >
> > > Hopefully the above gives you some piece of mind, but if you still
> > > have any
> > > concerns I'm all ears.
> >
> > You didn't really address Peter's initial technical issue.  Peter
> > correctly observed that cache pages are just another flavor of free
> > pages.  Whenever the VM system is checking the number of free pages
> > against any of the thresholds, it always uses the sum of 
> > v_cache_count
> > and v_free_count.  So, to anyone familiar with the VM system, like
> > Peter, what you've done, which is to derive a threshold from
> > v_free_target but only compare v_free_count to that threshold, looks
> > highly suspect.
>
> I think I'd like to see something like this:
>
> Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c
> ===================================================================
> --- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824)
> +++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy)
> @@ -152,7 +152,8 @@
>  kmem_free_count(void)
>  {
>
> - return (vm_cnt.v_free_count);
> + /* "cache" is just a flavor of free pages in FreeBSD */
> + return (vm_cnt.v_free_count + vm_cnt.v_cache_count);
>  }
>
>  u_int

This has apparently already been tried and the response from Karl was:

- No, because memory in "cache" is subject to being either reallocated 
or freed.
- When I was developing this patch that was my first impression as well 
and how
- I originally coded it, and it turned out to be wrong.
-
- The issue here is that you have two parts of the system contending for 
RAM --
- the VM system generally, and the ARC cache.  If the ARC cache frees 
space before
- the VM system activates and starts pruning then you wind up with the 
ARC pinned
- at the minimum after some period of time, because it releases "early."

I've asked him if he would retest just to be sure.

> The rest of the system looks at the "big picture" it would be happy to 
> let the
> "free" pool run quite a way down so long as there's "cache" pages 
> available to
> satisfy the free space requirements.  This would lead ZFS to 
> mistakenly
> sacrifice ARC for no reason.  I'm not sure how big a deal this is, but 
> I can't
> imagine many scenarios where I want ARC to be discarded in order to 
> save some
> effectively free pages.

>From Karl's response from the original PR (above) it seems like this 
causes
unexpected behaviour due to the two systems being seperate.

> > That said, I can easily believe that your patch works better than 
> > the
> > existing code, because it is closer in spirit to my interpretation 
> > of
> > what the Solaris code does.  Specifically, I believe that the 
> > Solaris
> > code starts trimming the ARC before the Solaris page daemon starts
> > writing dirty pages to secondary storage.  Now, you've made FreeBSD 
> > do
> > the same.  However, you've expressed it in a way that looks broken.
> >
> > To wrap up, I think that you can easily write this in a way that
> > simultaneously behaves like Solaris and doesn't look wrong to a VM 
> > expert.
> >
> > > Out of interest would it be possible to update machines in the 
> > > cluster to
> > > see how their workload reacts to the change?
> > >
>
> I'd like to see the free vs cache thing resolved first but it's going 
> to be
> tricky to get a comparison.

Does Karl's explaination as to why this doesn't work above change your 
mind?

> For the first few months of the year, things were really troublesome. 
> It was
> quite easy to overtax the machines and run them into the ground.
>
> This is not the case now - things are working pretty well under 
> pressure
> (prior to the commit).  Its got to the point that we feel comfortable
> thrashing the machines really hard again.  Getting a comparison when 
> it
> already works well is going to be tricky.
>
> We don't have large memory machines that aren't already tuned for
> vfs.zfs.arc_max caps for tmpfs use.
>
> For context to the wider audience, we do not run -release or -pN in 
> the
> freebsd cluster.  We mostly run -current, and some -stable.   I am 
> well aware
> that there is significant discomfort in 10.0-R with zfs but we already 
> have the
> fixes for that. 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A300D962A1B458B951D521EA2BE35E8>