Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Aug 2012 12:47:29 -0700
From:      =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
To:        Alan Cox <alc@rice.edu>
Cc:        alc@freebsd.org, freebsd-performance@freebsd.org, Andrey Zonov <andrey@zonov.org>, kib@freebsd.org
Subject:   Re: vm.kmem_size_max and vm.kmem_size capped at 329853485875 (~307GB)
Message-ID:  <CAJKO3mX-0FZmSm98PvK0-RHq8EOsQxz_xghs7yA1iA2O4muCvw@mail.gmail.com>
In-Reply-To: <503D16FC.2080903@rice.edu>
References:  <CAJKO3mU8bfn=jmWNSpvAXOR1AWyAAM0Sio1D1PnOYg8P59V9cg@mail.gmail.com> <CAGH67wS=jue7%2B92jSCyaydOLHC=hPwtndV64FVtC7nhDsPvFng@mail.gmail.com> <CAGH67wTNfW45pgJ_%2BVn_sX%2BP9M5B5wzPT9270dRmWjYF6KerrA@mail.gmail.com> <B74BE4AB-AB67-45BD-BFC3-9AE33A85751C@gmail.com> <502DEAD9.6050304@zonov.org> <CAJKO3mVWOFa9Cby_EWsf_OFHux7YBGSV7aGYSP2YANeJkqZtoQ@mail.gmail.com> <CAJKO3mU1NdkQwNSEDk3wWyLN700=dQ0_jSXt_sx-ABpywNjfsg@mail.gmail.com> <502EB081.3030801@rice.edu> <CAJKO3mWEXUvLtdSvmjgNhhyVqw4j0DuTYm9MqLd9=i9==WLAaA@mail.gmail.com> <502FE98E.40807@rice.edu> <CAJKO3mVUMRfkUpSuk0fDdnEMc3hr087iH5u8b5N60CnPs-gP1g@mail.gmail.com> <50325634.7090904@rice.edu> <CAJKO3mXPZVhLo=si%2BEoFPGD5R_m297xedRFY-0N__WOsZBaiCA@mail.gmail.com> <CAJKO3mXQ2_XrdxWgE6JRVOpMu_cEBa_=nJCxFDJ%2BJ=f5_OUsPQ@mail.gmail.com> <503418C0.5000901@rice.edu> <CAJKO3mUkjEbY=t6K5MGphMQ_myxUHnScP8gy8v3J%2BARFMf15=g@mail.gmail.com> <50367E5D.1020702@rice.edu> <CAJKO3mW%2BJ55NFJiJS4sULi9Bq23ZCSj_oBxGN407YhJL=EqvWg@mail.gmail.com> <503D16FC.2080903@rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 28, 2012 at 12:07 PM, Alan Cox <alc@rice.edu> wrote:
> On 08/27/2012 17:23, Gezeala M. Bacu=F1o II wrote:
>>
>> On Thu, Aug 23, 2012 at 12:02 PM, Alan Cox<alc@rice.edu>  wrote:
>>>
>>> On 08/22/2012 12:09, Gezeala M. Bacu=F1o II wrote:
>>>>
>>>> On Tue, Aug 21, 2012 at 4:24 PM, Alan Cox<alc@rice.edu>   wrote:
>>>>>
>>>>> On 8/20/2012 8:26 PM, Gezeala M. Bacu=F1o II wrote:
>>>>>>
>>>>>> On Mon, Aug 20, 2012 at 9:07 AM, Gezeala M. Bacu=F1o
>>>>>> II<gezeala@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Aug 20, 2012 at 8:22 AM, Alan Cox<alc@rice.edu>   wrote:
>>>>>>>>
>>>>>>>> On 08/18/2012 19:57, Gezeala M. Bacu=F1o II wrote:
>>>>>>>>>
>>>>>>>>> On Sat, Aug 18, 2012 at 12:14 PM, Alan Cox<alc@rice.edu>    wrote=
:
>>>>>>>>>>
>>>>>>>>>> On 08/17/2012 17:08, Gezeala M. Bacu=F1o II wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 17, 2012 at 1:58 PM, Alan Cox<alc@rice.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> vm.kmem_size controls the maximum size of the kernel's heap,
>>>>>>>>>>>> i.e.,
>>>>>>>>>>>> the
>>>>>>>>>>>> region where the kernel's slab and malloc()-like memory
>>>>>>>>>>>> allocators
>>>>>>>>>>>> obtain
>>>>>>>>>>>> their memory.  While this heap may occupy the largest portion =
of
>>>>>>>>>>>> the
>>>>>>>>>>>> kernel's virtual address space, it cannot occupy the entirety =
of
>>>>>>>>>>>> the
>>>>>>>>>>>> address
>>>>>>>>>>>> space.  There are other things that must be given space within
>>>>>>>>>>>> the
>>>>>>>>>>>> kernel's
>>>>>>>>>>>> address space, for example, the file system buffer map.
>>>>>>>>>>>>
>>>>>>>>>>>> ZFS does not, however, use the regular file system buffer cach=
e.
>>>>>>>>>>>> The
>>>>>>>>>>>> ARC
>>>>>>>>>>>> takes its place, and the ARC abuses the kernel's heap like
>>>>>>>>>>>> nothing
>>>>>>>>>>>> else.
>>>>>>>>>>>> So, if you are running a machine that only makes trivial use o=
f
>>>>>>>>>>>> a
>>>>>>>>>>>> non-ZFS
>>>>>>>>>>>> file system, like you boot from UFS, but store all of your dat=
a
>>>>>>>>>>>> in
>>>>>>>>>>>> ZFS,
>>>>>>>>>>>> then
>>>>>>>>>>>> you can dramatically reduce the size of the buffer map via boo=
t
>>>>>>>>>>>> loader
>>>>>>>>>>>> tuneables and proportionately increase vm.kmem_size.
>>>>>>>>>>>>
>>>>>>>>>>>> Any further increases in the kernel virtual address space size
>>>>>>>>>>>> will,
>>>>>>>>>>>> however, require code changes.  Small changes, but changes
>>>>>>>>>>>> nonetheless.
>>>>>>>>>>>>
>>>>>>>>>>>> Alan
>>>>>>>>>>>>
>>>>>>> <<snip>>
>>>>>>>>>>
>>>>>>>>>> Your objective should be to reduce the value of "sysctl
>>>>>>>>>> vfs.maxbufspace".
>>>>>>>>>> You can do this by setting the loader.conf tuneable
>>>>>>>>>> "kern.maxbcache"
>>>>>>>>>> to
>>>>>>>>>> the
>>>>>>>>>> desired value.
>>>>>>>>>>
>>>>>>>>>> What does your machine currently report for "sysctl
>>>>>>>>>> vfs.maxbufspace"?
>>>>>>>>>>
>>>>>>>>> Here you go:
>>>>>>>>> vfs.maxbufspace: 54967025664
>>>>>>>>> kern.maxbcache: 0
>>>>>>>>
>>>>>>>>
>>>>>>>> Try setting kern.maxbcache to two billion and adding 50 billion to
>>>>>>>> the
>>>>>>>> setting of vm.kmem_size{,_max}.
>>>>>>>>
>>>>>> 2 : 50 =3D=3D>>   is this the ratio for further tuning
>>>>>> kern.maxbcache:vm.kmem_size? Is kern.maxbcache also in bytes?
>>>>>>
>>>>> No, this is not a ratio.  Yes, kern.maxbcache is in bytes. Basically,
>>>>> for
>>>>> every byte that you subtract from vfs.maxbufspace, through setting
>>>>> kern.maxbcache, you can add a byte to vm.kmem_size{,_max}.
>>>>>
>>>>> Alan
>>>>>
>>>> Great! Thanks. Are there other sysctls aside from vfs.bufspace that I
>>>> should monitor for vfs.maxbufspace usage? I just want to make sure
>>>> that vfs.maxbufspace is sufficient for our needs.
>>>
>>>
>>> You might keep an eye on "sysctl vfs.bufdefragcnt".  If it starts rapid=
ly
>>> increasing, you may want to increase vfs.maxbufspace.
>>>
>>> Alan
>>>
>> We seem to max out vfs.bufspace in<24hrs uptime. It has been steady
>> at 1999273984 while vfs.bufdefragcnt stays at 0 - which I presume is
>> good. Nevertheless, I will increase kern.maxbcache to 6GB and adjust
>> vm.kmem_size{,_max}, vfs.zfs.arc_max accordingly. On another machine
>> with vfs.maxbufspace auto-tuned to 7738671104 (~7.2GB), vfs.bufspace
>> is now at 5278597120 (uptime 129 days).
>
>
> The buffer map is a kind of cache.  Like any cache, most of the time it w=
ill
> be full.  Don't worry.
>
> Moreover, even when the buffer map is full, the UFS file system is cachin=
g
> additional file data in physical memory pages that simply aren't mapped f=
or
> instantaneous access.  Essentially, limiting the size of the buffer map i=
s
> only limiting the amount of modified file data that hasn't been written b=
ack
> to disk, not the total amount of cached data.
>
> As long as you're making trivial use of UFS file systems, there really is=
n't
> a reason to increase the buffer map size.
>
> Alan
>
>

I see. Makes sense now. Thanks!

I forgot to mention that we do have smbfs mounts mounted from another
server, are writes/modifications on files on these mounts also cached
in the buffer map? All non-ZFS file systems right? Input/Output files
are read from or written to these mounts.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJKO3mX-0FZmSm98PvK0-RHq8EOsQxz_xghs7yA1iA2O4muCvw>