From owner-svn-src-all@FreeBSD.ORG Fri Aug 29 19:57:33 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 87A24BC; Fri, 29 Aug 2014 19:57:33 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 13D231E3F; Fri, 29 Aug 2014 19:57:32 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id D6DE820E7088A; Fri, 29 Aug 2014 19:51:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=0.8 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC,STOX_REPLY_TYPE autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTP id 194B620E70885; Fri, 29 Aug 2014 19:51:06 +0000 (UTC) Message-ID: <5A300D962A1B458B951D521EA2BE35E8@multiplay.co.uk> From: "Steven Hartland" To: "Peter Wemm" , "Alan Cox" References: <201408281950.s7SJo90I047213@svn.freebsd.org> <4A4B2C2D36064FD9840E3603D39E58E0@multiplay.co.uk> <5400B052.6030103@rice.edu> <1592506.xpuae4IYcM@overcee.wemm.org> Subject: Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm Date: Fri, 29 Aug 2014 20:51:03 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Dmitry Morozovsky , "Matthew D. Fuller" X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Aug 2014 19:57:33 -0000 > On Friday 29 August 2014 11:54:42 Alan Cox wrote: snip... > > > Others have also confirmed that even with r265945 they can still > > > trigger > > > performance issue. > > > > > > In addition without it we still have loads of RAM sat their > > > unused, in my > > > particular experience we have 40GB of 192GB sitting their unused > > > and that > > > was with a stable build from last weekend. > > > > The Solaris code only imposed this limit on 32-bit machines where > > the > > available kernel virtual address space may be much less than the > > available physical memory. Previously, FreeBSD imposed this limit > > on > > both 32-bit and 64-bit machines. Now, it imposes it on neither. > > Why > > continue to do this differently from Solaris? My understanding is these limits where totally different on Solaris see the #ifdef sun block in arc_reclaim_needed() for details. I actually started at matching the Solaris flow but this had already been tested and proved not to work as well as the current design. > Since the question was asked below, we don't have zfs machines in the > cluster > running i386. We can barely get them to boot as it is due to kva > pressure. > We have to reduce/cap physical memory and change the user/kernel > virtual split > from 3:1 to 2.5:1.5. > > We do run zfs on small amd64 machines with 2G of ram, but I can't > imagine it > working on the 10G i386 PAE machines that we have. > > > > > With the patch we confirmed that both RAM usage and performance > > > for those > > > seeing that issue are resolved, with no reported regressions. > > > > > >> (I should know better than to fire a reply off before full fact > > >> checking, but > > >> this commit worries me..) > > > > > > Not a problem, its great to know people pay attention to changes, > > > and > > > raise > > > their concerns. Always better to have a discussion about potential > > > issues > > > than to wait for a problem to occur. > > > > > > Hopefully the above gives you some piece of mind, but if you still > > > have any > > > concerns I'm all ears. > > > > You didn't really address Peter's initial technical issue. Peter > > correctly observed that cache pages are just another flavor of free > > pages. Whenever the VM system is checking the number of free pages > > against any of the thresholds, it always uses the sum of > > v_cache_count > > and v_free_count. So, to anyone familiar with the VM system, like > > Peter, what you've done, which is to derive a threshold from > > v_free_target but only compare v_free_count to that threshold, looks > > highly suspect. > > I think I'd like to see something like this: > > Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c > =================================================================== > --- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824) > +++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy) > @@ -152,7 +152,8 @@ > kmem_free_count(void) > { > > - return (vm_cnt.v_free_count); > + /* "cache" is just a flavor of free pages in FreeBSD */ > + return (vm_cnt.v_free_count + vm_cnt.v_cache_count); > } > > u_int This has apparently already been tried and the response from Karl was: - No, because memory in "cache" is subject to being either reallocated or freed. - When I was developing this patch that was my first impression as well and how - I originally coded it, and it turned out to be wrong. - - The issue here is that you have two parts of the system contending for RAM -- - the VM system generally, and the ARC cache. If the ARC cache frees space before - the VM system activates and starts pruning then you wind up with the ARC pinned - at the minimum after some period of time, because it releases "early." I've asked him if he would retest just to be sure. > The rest of the system looks at the "big picture" it would be happy to > let the > "free" pool run quite a way down so long as there's "cache" pages > available to > satisfy the free space requirements. This would lead ZFS to > mistakenly > sacrifice ARC for no reason. I'm not sure how big a deal this is, but > I can't > imagine many scenarios where I want ARC to be discarded in order to > save some > effectively free pages. >From Karl's response from the original PR (above) it seems like this causes unexpected behaviour due to the two systems being seperate. > > That said, I can easily believe that your patch works better than > > the > > existing code, because it is closer in spirit to my interpretation > > of > > what the Solaris code does. Specifically, I believe that the > > Solaris > > code starts trimming the ARC before the Solaris page daemon starts > > writing dirty pages to secondary storage. Now, you've made FreeBSD > > do > > the same. However, you've expressed it in a way that looks broken. > > > > To wrap up, I think that you can easily write this in a way that > > simultaneously behaves like Solaris and doesn't look wrong to a VM > > expert. > > > > > Out of interest would it be possible to update machines in the > > > cluster to > > > see how their workload reacts to the change? > > > > > I'd like to see the free vs cache thing resolved first but it's going > to be > tricky to get a comparison. Does Karl's explaination as to why this doesn't work above change your mind? > For the first few months of the year, things were really troublesome. > It was > quite easy to overtax the machines and run them into the ground. > > This is not the case now - things are working pretty well under > pressure > (prior to the commit). Its got to the point that we feel comfortable > thrashing the machines really hard again. Getting a comparison when > it > already works well is going to be tricky. > > We don't have large memory machines that aren't already tuned for > vfs.zfs.arc_max caps for tmpfs use. > > For context to the wider audience, we do not run -release or -pN in > the > freebsd cluster. We mostly run -current, and some -stable. I am > well aware > that there is significant discomfort in 10.0-R with zfs but we already > have the > fixes for that.