From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 27 21:49:34 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F21E10656B0 for ; Fri, 27 Aug 2010 21:49:34 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0599E8FC19 for ; Fri, 27 Aug 2010 21:49:33 +0000 (UTC) Received: by qwg5 with SMTP id 5so3558142qwg.13 for ; Fri, 27 Aug 2010 14:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=iLiTEytXkfEuS2WvUBkV9Get3wqngE6MdXonG0anCHQ=; b=k0tvoXV/X4Prmg7E7+js+ecluluwZHKCq1bF59+9AoWzSsdznOsHkxea0ZxotRIwZh 00EDinPRNqTFkYyrIm+6JMFw0l+EtoQFfAMd+gNrIRVQp8GcfBdQwv01rqIQ5FY0JMOY 3mbhpOTieAdQIYbpz2Yxi0Jp1FNSp6ogCDF5I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=dprtl1kW3e1FNbbRwZhguivky3Gje2Ldzl9LLGuAH2GrpVI7hJ0h15yN/IxQEgoscj ih1MjzAm6GzJ7/cUZd2vccKviVA8OiTcStdfrkhKJiDQ2pXaD0yGVlIvv8oyNRtVgTSU tyGhBw1kFKkOqbRFXiEOlEPiHdi8DzYadkRV8= MIME-Version: 1.0 Received: by 10.220.179.7 with SMTP id bo7mr2459144vcb.2.1282945771095; Fri, 27 Aug 2010 14:49:31 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.220.66.227 with HTTP; Fri, 27 Aug 2010 14:49:31 -0700 (PDT) In-Reply-To: <4C718C60.2010205@icyb.net.ua> References: <4C718C60.2010205@icyb.net.ua> Date: Fri, 27 Aug 2010 14:49:31 -0700 X-Google-Sender-Auth: QWwim_NQcLF7XfuduhsHq-CoabQ Message-ID: From: Artem Belevich To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, Jeff Roberson , "Robert N. M. Watson" Subject: Re: uma: zone fragmentation X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Aug 2010 21:49:34 -0000 On Sun, Aug 22, 2010 at 1:45 PM, Andriy Gapon wrote: > Unfortunately I don't have any conclusive results to report. > The numbers seem to be better with the patch, but they are changing all t= he time > depending on system usage. > I couldn't think of any good test that would reflect real-world usage pat= terns, > which I believe to be not entirely random. I do see measurable improvement on my system. Without this change my ARC would grow up to ~5200M and would oscillate pretty close to that number. With your patch applied ARC reaches ~5900M. In both cases I end up with ~7000M worth of wired memory. So, in normal case we have about 1800M of memory lost to fragmentation and your patch reduces that amount down to ~1100M which is a noticeable improvement. On a side note -- how hard would that be to voluntairly drain uma zones used by ARC? When I enable vfs.zfs.zio.use_uma I see that there's a lot of memory there listed as 'free' which could be used for something else. In absolute terms it's large-item zones that seem to waste memory in my case. In my case there are ~1000 free items on rarely used ~100K-sized zones. That memory will be released when pagedaemon wakes up, but the same event would also back-pressure the ARC. With ZIO UMA allocator my ARC size never grows above ~4800M -- and that's *with* your patch applied. Without the patch it was even worse. My guess is that if we'd manually drain those zones the memory would find better use elsewhere. For instance as ARC data. --Artem On Sun, Aug 22, 2010 at 1:45 PM, Andriy Gapon wrote: > > It seems that with inclusion of ZFS, which is a significant UMA user even= when > it is not used for ARC, zone fragmentation becomes an issue. > For example, on my systems with 4GB of RAM I routinely observe several hu= ndred > megabytes in free items after zone draining (via lowmem event). > > I wrote a one-liner (quite long line though) for post-processing vmstat -= z > output and here's an example: > $ vmstat -z | sed -e 's/ /_/' -e 's/:_* / /' -e 's/,//g' | tail +3 | awk = 'BEGIN > { total =3D 0; } { total +=3D $2 * $5; print $2 * $5, $1, $4, $5, $2;} EN= D { print > total, "total"; }' | sort -n | tail -10 > 6771456 256 7749 26451 256 > 10710144 128 173499 83673 128 > 13400424 VM_OBJECT 33055 62039 216 > 17189568 zfs_znode_cache 33259 48834 352 > 19983840 VNODE 33455 41633 480 > 30936464 arc_buf_hdr_t 145387 148733 208 > 57030400 dmu_buf_impl_t 82816 254600 224 > 57619296 dnode_t 78811 73494 784 > 62067712 512 71050 121226 512 > 302164776 total > > When UMA is used for ARC, then "wasted" memory grows above 1GB effectevil= y > making that setup unusable for me. > > I see that in OpenSolaris they developed a few measures to (try to) preve= nt > fragmentation and perform defragmentation. > > First, they keep their equivalent of partial slab list sorted by number o= f used > items thus trying to fill up the most used slab. > Second, they allow to set a 'move' callback for a zone and have a special > monitoring thread that tries to compact slabs when zone fragmentation goe= s above > certain limit. > The details can be found here (lengthy comment at the beginning and links= in it): > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/= os/kmem.c > > Not sure if we would want to implement anything like or some alternative,= but > zone fragmentation seems to have become an issue, at least for ZFS. > > I am testing the following primitive patch that tries to "lazily sort" (o= r > pseudo sort) slab partial list. =A0Linked list is not the kind of data st= ructure > that's easy to keep sorted in efficient manner. > > diff --git a/sys/vm/uma_core.c b/sys/vm/uma_core.c > index 2dcd14f..ed07ecb 100644 > --- a/sys/vm/uma_core.c > +++ b/sys/vm/uma_core.c > @@ -2727,14 +2727,26 @@ zone_free_item(uma_zone_t zone, void *item, void = *udata, > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0MPASS(keg =3D=3D slab->us_keg); > > - =A0 =A0 =A0 /* Do we need to remove from any lists? */ > + =A0 =A0 =A0 /* Move to the appropriate list or re-queue further from th= e head. */ > =A0 =A0 =A0 =A0if (slab->us_freecount+1 =3D=3D keg->uk_ipers) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Partial -> free. */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0LIST_REMOVE(slab, us_link); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0LIST_INSERT_HEAD(&keg->uk_free_slab, slab,= us_link); > =A0 =A0 =A0 =A0} else if (slab->us_freecount =3D=3D 0) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Full -> partial. */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0LIST_REMOVE(slab, us_link); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0LIST_INSERT_HEAD(&keg->uk_part_slab, slab,= us_link); > =A0 =A0 =A0 =A0} > + =A0 =A0 =A0 else { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Partial -> partial. */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 uma_slab_t tmp; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 tmp =3D LIST_NEXT(slab, us_link); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (tmp !=3D NULL && slab->us_freecount > t= mp->us_freecount) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 LIST_REMOVE(slab, us_link); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 LIST_INSERT_AFTER(tmp, slab= , us_link); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 } > > =A0 =A0 =A0 =A0/* Slab management stuff */ > =A0 =A0 =A0 =A0freei =3D ((unsigned long)item - (unsigned long)slab->us_d= ata) > > > Unfortunately I don't have any conclusive results to report. > The numbers seem to be better with the patch, but they are changing all t= he time > depending on system usage. > I couldn't think of any good test that would reflect real-world usage pat= terns, > which I believe to be not entirely random. > > -- > Andriy Gapon > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " >