Date: Fri, 29 Aug 2014 20:51:03 +0100 From: "Steven Hartland" <smh@freebsd.org> To: "Peter Wemm" <peter@wemm.org>, "Alan Cox" <alc@rice.edu> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Dmitry Morozovsky <marck@rinet.ru>, "Matthew D. Fuller" <fullermd@over-yonder.net> Subject: Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm Message-ID: <5A300D962A1B458B951D521EA2BE35E8@multiplay.co.uk> References: <201408281950.s7SJo90I047213@svn.freebsd.org> <4A4B2C2D36064FD9840E3603D39E58E0@multiplay.co.uk> <5400B052.6030103@rice.edu> <1592506.xpuae4IYcM@overcee.wemm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Friday 29 August 2014 11:54:42 Alan Cox wrote: snip... > > > Others have also confirmed that even with r265945 they can still > > > trigger > > > performance issue. > > > > > > In addition without it we still have loads of RAM sat their > > > unused, in my > > > particular experience we have 40GB of 192GB sitting their unused > > > and that > > > was with a stable build from last weekend. > > > > The Solaris code only imposed this limit on 32-bit machines where > > the > > available kernel virtual address space may be much less than the > > available physical memory. Previously, FreeBSD imposed this limit > > on > > both 32-bit and 64-bit machines. Now, it imposes it on neither. > > Why > > continue to do this differently from Solaris? My understanding is these limits where totally different on Solaris see the #ifdef sun block in arc_reclaim_needed() for details. I actually started at matching the Solaris flow but this had already been tested and proved not to work as well as the current design. > Since the question was asked below, we don't have zfs machines in the > cluster > running i386. We can barely get them to boot as it is due to kva > pressure. > We have to reduce/cap physical memory and change the user/kernel > virtual split > from 3:1 to 2.5:1.5. > > We do run zfs on small amd64 machines with 2G of ram, but I can't > imagine it > working on the 10G i386 PAE machines that we have. > > > > > With the patch we confirmed that both RAM usage and performance > > > for those > > > seeing that issue are resolved, with no reported regressions. > > > > > >> (I should know better than to fire a reply off before full fact > > >> checking, but > > >> this commit worries me..) > > > > > > Not a problem, its great to know people pay attention to changes, > > > and > > > raise > > > their concerns. Always better to have a discussion about potential > > > issues > > > than to wait for a problem to occur. > > > > > > Hopefully the above gives you some piece of mind, but if you still > > > have any > > > concerns I'm all ears. > > > > You didn't really address Peter's initial technical issue. Peter > > correctly observed that cache pages are just another flavor of free > > pages. Whenever the VM system is checking the number of free pages > > against any of the thresholds, it always uses the sum of > > v_cache_count > > and v_free_count. So, to anyone familiar with the VM system, like > > Peter, what you've done, which is to derive a threshold from > > v_free_target but only compare v_free_count to that threshold, looks > > highly suspect. > > I think I'd like to see something like this: > > Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c > =================================================================== > --- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824) > +++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy) > @@ -152,7 +152,8 @@ > kmem_free_count(void) > { > > - return (vm_cnt.v_free_count); > + /* "cache" is just a flavor of free pages in FreeBSD */ > + return (vm_cnt.v_free_count + vm_cnt.v_cache_count); > } > > u_int This has apparently already been tried and the response from Karl was: - No, because memory in "cache" is subject to being either reallocated or freed. - When I was developing this patch that was my first impression as well and how - I originally coded it, and it turned out to be wrong. - - The issue here is that you have two parts of the system contending for RAM -- - the VM system generally, and the ARC cache. If the ARC cache frees space before - the VM system activates and starts pruning then you wind up with the ARC pinned - at the minimum after some period of time, because it releases "early." I've asked him if he would retest just to be sure. > The rest of the system looks at the "big picture" it would be happy to > let the > "free" pool run quite a way down so long as there's "cache" pages > available to > satisfy the free space requirements. This would lead ZFS to > mistakenly > sacrifice ARC for no reason. I'm not sure how big a deal this is, but > I can't > imagine many scenarios where I want ARC to be discarded in order to > save some > effectively free pages. >From Karl's response from the original PR (above) it seems like this causes unexpected behaviour due to the two systems being seperate. > > That said, I can easily believe that your patch works better than > > the > > existing code, because it is closer in spirit to my interpretation > > of > > what the Solaris code does. Specifically, I believe that the > > Solaris > > code starts trimming the ARC before the Solaris page daemon starts > > writing dirty pages to secondary storage. Now, you've made FreeBSD > > do > > the same. However, you've expressed it in a way that looks broken. > > > > To wrap up, I think that you can easily write this in a way that > > simultaneously behaves like Solaris and doesn't look wrong to a VM > > expert. > > > > > Out of interest would it be possible to update machines in the > > > cluster to > > > see how their workload reacts to the change? > > > > > I'd like to see the free vs cache thing resolved first but it's going > to be > tricky to get a comparison. Does Karl's explaination as to why this doesn't work above change your mind? > For the first few months of the year, things were really troublesome. > It was > quite easy to overtax the machines and run them into the ground. > > This is not the case now - things are working pretty well under > pressure > (prior to the commit). Its got to the point that we feel comfortable > thrashing the machines really hard again. Getting a comparison when > it > already works well is going to be tricky. > > We don't have large memory machines that aren't already tuned for > vfs.zfs.arc_max caps for tmpfs use. > > For context to the wider audience, we do not run -release or -pN in > the > freebsd cluster. We mostly run -current, and some -stable. I am > well aware > that there is significant discomfort in 10.0-R with zfs but we already > have the > fixes for that.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A300D962A1B458B951D521EA2BE35E8>