From owner-svn-src-all@FreeBSD.ORG  Fri Aug 29 19:57:33 2014
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 87A24BC;
 Fri, 29 Aug 2014 19:57:33 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 13D231E3F;
 Fri, 29 Aug 2014 19:57:32 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id D6DE820E7088A; Fri, 29 Aug 2014 19:51:07 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
 smtp1.multiplay.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX,
 FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC,STOX_REPLY_TYPE autolearn=no version=3.3.1
Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTP id 194B620E70885;
 Fri, 29 Aug 2014 19:51:06 +0000 (UTC)
Message-ID: <5A300D962A1B458B951D521EA2BE35E8@multiplay.co.uk>
From: "Steven Hartland" <smh@freebsd.org>
To: "Peter Wemm" <peter@wemm.org>,
	"Alan Cox" <alc@rice.edu>
References: <201408281950.s7SJo90I047213@svn.freebsd.org>
 <4A4B2C2D36064FD9840E3603D39E58E0@multiplay.co.uk>
 <5400B052.6030103@rice.edu> <1592506.xpuae4IYcM@overcee.wemm.org>
Subject: Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern
 cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm
Date: Fri, 29 Aug 2014 20:51:03 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org, Dmitry Morozovsky <marck@rinet.ru>,
 "Matthew D. Fuller" <fullermd@over-yonder.net>
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Aug 2014 19:57:33 -0000

> On Friday 29 August 2014 11:54:42 Alan Cox wrote:
snip...

> > > Others have also confirmed that even with r265945 they can still 
> > > trigger
> > > performance issue.
> > >
> > > In addition without it we still have loads of RAM sat their 
> > > unused, in my
> > > particular experience we have 40GB of 192GB sitting their unused 
> > > and that
> > > was with a stable build from last weekend.
> >
> > The Solaris code only imposed this limit on 32-bit machines where 
> > the
> > available kernel virtual address space may be much less than the
> > available physical memory.  Previously, FreeBSD imposed this limit 
> > on
> > both 32-bit and 64-bit machines.  Now, it imposes it on neither. 
> > Why
> > continue to do this differently from Solaris?

My understanding is these limits where totally different on Solaris see 
the
#ifdef sun block in arc_reclaim_needed() for details. I actually started 
at
matching the Solaris flow but this had already been tested and proved 
not
to work as well as the current design.

> Since the question was asked below, we don't have zfs machines in the 
> cluster
> running i386.  We can barely get them to boot as it is due to kva 
> pressure.
> We have to reduce/cap physical memory and change the user/kernel 
> virtual split
> from 3:1 to 2.5:1.5.
>
> We do run zfs on small amd64 machines with 2G of ram, but I can't 
> imagine it
> working on the 10G i386 PAE machines that we have.
>
>
> > > With the patch we confirmed that both RAM usage and performance 
> > > for those
> > > seeing that issue are resolved, with no reported regressions.
> > >
> > >> (I should know better than to fire a reply off before full fact
> > >> checking, but
> > >> this commit worries me..)
> > >
> > > Not a problem, its great to know people pay attention to changes, 
> > > and
> > > raise
> > > their concerns. Always better to have a discussion about potential 
> > > issues
> > > than to wait for a problem to occur.
> > >
> > > Hopefully the above gives you some piece of mind, but if you still
> > > have any
> > > concerns I'm all ears.
> >
> > You didn't really address Peter's initial technical issue.  Peter
> > correctly observed that cache pages are just another flavor of free
> > pages.  Whenever the VM system is checking the number of free pages
> > against any of the thresholds, it always uses the sum of 
> > v_cache_count
> > and v_free_count.  So, to anyone familiar with the VM system, like
> > Peter, what you've done, which is to derive a threshold from
> > v_free_target but only compare v_free_count to that threshold, looks
> > highly suspect.
>
> I think I'd like to see something like this:
>
> Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c
> ===================================================================
> --- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824)
> +++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy)
> @@ -152,7 +152,8 @@
>  kmem_free_count(void)
>  {
>
> - return (vm_cnt.v_free_count);
> + /* "cache" is just a flavor of free pages in FreeBSD */
> + return (vm_cnt.v_free_count + vm_cnt.v_cache_count);
>  }
>
>  u_int

This has apparently already been tried and the response from Karl was:

- No, because memory in "cache" is subject to being either reallocated 
or freed.
- When I was developing this patch that was my first impression as well 
and how
- I originally coded it, and it turned out to be wrong.
-
- The issue here is that you have two parts of the system contending for 
RAM --
- the VM system generally, and the ARC cache.  If the ARC cache frees 
space before
- the VM system activates and starts pruning then you wind up with the 
ARC pinned
- at the minimum after some period of time, because it releases "early."

I've asked him if he would retest just to be sure.

> The rest of the system looks at the "big picture" it would be happy to 
> let the
> "free" pool run quite a way down so long as there's "cache" pages 
> available to
> satisfy the free space requirements.  This would lead ZFS to 
> mistakenly
> sacrifice ARC for no reason.  I'm not sure how big a deal this is, but 
> I can't
> imagine many scenarios where I want ARC to be discarded in order to 
> save some
> effectively free pages.

>From Karl's response from the original PR (above) it seems like this 
causes
unexpected behaviour due to the two systems being seperate.

> > That said, I can easily believe that your patch works better than 
> > the
> > existing code, because it is closer in spirit to my interpretation 
> > of
> > what the Solaris code does.  Specifically, I believe that the 
> > Solaris
> > code starts trimming the ARC before the Solaris page daemon starts
> > writing dirty pages to secondary storage.  Now, you've made FreeBSD 
> > do
> > the same.  However, you've expressed it in a way that looks broken.
> >
> > To wrap up, I think that you can easily write this in a way that
> > simultaneously behaves like Solaris and doesn't look wrong to a VM 
> > expert.
> >
> > > Out of interest would it be possible to update machines in the 
> > > cluster to
> > > see how their workload reacts to the change?
> > >
>
> I'd like to see the free vs cache thing resolved first but it's going 
> to be
> tricky to get a comparison.

Does Karl's explaination as to why this doesn't work above change your 
mind?

> For the first few months of the year, things were really troublesome. 
> It was
> quite easy to overtax the machines and run them into the ground.
>
> This is not the case now - things are working pretty well under 
> pressure
> (prior to the commit).  Its got to the point that we feel comfortable
> thrashing the machines really hard again.  Getting a comparison when 
> it
> already works well is going to be tricky.
>
> We don't have large memory machines that aren't already tuned for
> vfs.zfs.arc_max caps for tmpfs use.
>
> For context to the wider audience, we do not run -release or -pN in 
> the
> freebsd cluster.  We mostly run -current, and some -stable.   I am 
> well aware
> that there is significant discomfort in 10.0-R with zfs but we already 
> have the
> fixes for that.