Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Aug 2012 15:58:41 -0500
From:      Alan Cox <alc@rice.edu>
To:        =?ISO-8859-1?Q?=22Gezeala_M=2E_Bacu=F1o_II=22?= <gezeala@gmail.com>
Cc:        alc@freebsd.org, freebsd-performance@freebsd.org, Andrey Zonov <andrey@zonov.org>, kib@freebsd.org
Subject:   Re: vm.kmem_size_max and vm.kmem_size capped at 329853485875 (~307GB)
Message-ID:  <502EB081.3030801@rice.edu>
In-Reply-To: <CAJKO3mU1NdkQwNSEDk3wWyLN700=dQ0_jSXt_sx-ABpywNjfsg@mail.gmail.com>
References:  <CAJKO3mU8bfn=jmWNSpvAXOR1AWyAAM0Sio1D1PnOYg8P59V9cg@mail.gmail.com> <CAGH67wS=jue7%2B92jSCyaydOLHC=hPwtndV64FVtC7nhDsPvFng@mail.gmail.com> <CAGH67wTNfW45pgJ_%2BVn_sX%2BP9M5B5wzPT9270dRmWjYF6KerrA@mail.gmail.com> <B74BE4AB-AB67-45BD-BFC3-9AE33A85751C@gmail.com> <502DEAD9.6050304@zonov.org> <CAJKO3mVWOFa9Cby_EWsf_OFHux7YBGSV7aGYSP2YANeJkqZtoQ@mail.gmail.com> <CAJKO3mU1NdkQwNSEDk3wWyLN700=dQ0_jSXt_sx-ABpywNjfsg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
vm.kmem_size controls the maximum size of the kernel's heap, i.e., the 
region where the kernel's slab and malloc()-like memory allocators 
obtain their memory.  While this heap may occupy the largest portion of 
the kernel's virtual address space, it cannot occupy the entirety of the 
address space.  There are other things that must be given space within 
the kernel's address space, for example, the file system buffer map.

ZFS does not, however, use the regular file system buffer cache. The ARC 
takes its place, and the ARC abuses the kernel's heap like nothing 
else.  So, if you are running a machine that only makes trivial use of a 
non-ZFS file system, like you boot from UFS, but store all of your data 
in ZFS, then you can dramatically reduce the size of the buffer map via 
boot loader tuneables and proportionately increase vm.kmem_size.

Any further increases in the kernel virtual address space size will, 
however, require code changes.  Small changes, but changes nonetheless.

Alan

On 8/17/2012 3:16 PM, Gezeala M. Bacuņo II wrote:
> On Fri, Aug 17, 2012 at 7:38 AM, Gezeala M. Bacuņo II <gezeala@gmail.com> wrote:
>> On Thu, Aug 16, 2012 at 11:55 PM, Andrey Zonov <andrey@zonov.org> wrote:
>>> On 8/17/12 7:15 AM, Marie Bacuno II wrote:
>>>>
>>>> On Aug 16, 2012, at 18:47, Garrett Cooper <yanegomi@gmail.com> wrote:
>>>>
>>>>> On Thu, Aug 16, 2012 at 6:44 PM, Garrett Cooper <yanegomi@gmail.com>
>>>>> wrote:
>>>>>> On Thu, Aug 16, 2012 at 5:46 PM, Gezeala M. Bacuņo II
>>>>>> <gezeala@gmail.com> wrote:
>>>>>>> Hello fellow listers,
>>>>>>>
>>>>>>> On a server with 512GB RAM it appears that vm.kmem_size_max is not
>>>>>>> being auto-tuned to use >329853485875 (~307GB).
>>>>>>>
>>>>>>> On this machine vm.kmem_size is equal to vm.kmem_size_max
>>>>>>>
>>>>>>> # from sysctl
>>>>>>> vm.kmem_size_max: 329853485875
>>>>>>> vm.kmem_size: 329853485875
>>>>>>>
>>>>>>> On a machine with 1GB of RAM, I have successfully set vm.kmem_size_max
>>>>>>> to 330GB and vm.kmem_size automatically adjusts to 1GB even if I
>>>>>>> manually set it in /boot/loader.conf.
>>>>>>>
>>>>>>> But on the machine with 512GB of RAM it just resets. For the machine
>>>>>>> to boot, we need to go to the loader prompt and issue:
>>>>>>>
>>>>>>> OK set vm.kmem_size_max="300G"
>>>>>>> OK boot
>>>>>>>
>>>>>>> On all PCBSD (8,9) or FreeBSD (8.1,8.2,9) machines we have,
>>>>>>> vm.kmem_size_max is always set to 329853485875.
>>>>>>>
>>>>>>> How can I increase vm.kmem_size_max to use at least 500GB? And how is
>>>>>>> 329853485875 determined (formula)? I need to increase vm.kmem_size_max
>>>>>>> and vm.kmem_size so I can set vfs.zfs.arc_max (ZFS ARC) to use say
>>>>>>> 490GB.
>>>>>>>
>>>>>>> I'm browsing thru the source code at
>>>>>>> http://fxr.watson.org/fxr/search?v=FREEBSD9&string=vm.kmem_size_max
>>>>>>> and I'm still trying to make sense of how vm.kmem_size_max is
>>>>>>> computed.
>>>>>>>
>>>>>>> I have posted the same topic on forums.freebsd.org but I'm not getting
>>>>>>> any recommendations.
>>>>>>>
>>>>>>> Please see the link for additional details:
>>>>>>> http://forums.freebsd.org/showthread.php?t=33977
>>>>>>
>>>>>> Have you tried defining VM_KMEM_SIZE_MAX to your target value?
>>>>>>
>>>>>> Its architecture specific BTW... see
>>>>>> sys/<architecture>/include/vmparam.h -- look for `VM_KMEM_SIZE_MAX`.
>>>>>
>>>>> Also, it's a tunable, not a sysctl... so you need to set the value in
>>>>> /boot/loader.conf .
>>>>> -Garrett
>>>>
>>>> Thanks for the quick reply.
>>>>
>>>> Yes, had it set on /boot/loader.conf and by trial and error on the loader
>>>> prompt.
>>>>
>>>> We were able to bump it to 400G successfully. Tried 500G and 450G and the
>>>> machine just spews out garbage in the screen.
>>>>
>>>> The latest output from "zfs-stats -a" with vm.kmem_size_max="400G" is in
>>>> the forum: http://forums.freebsd.org/showthread.php?t=33977
>>>>
>>>> About the code, I am looking into amd64 arch. Still checking the values of
>>>> the variables.. Can't just retrieve them using getconf. If you can point me
>>>> to a doxygen like documentation appreciate it a lot.
>>>>
>>>> Where does the constant value 329853485875 came from?
>>>>
>>> It comes from this macro:
>>>
>>> #define VM_KMEM_SIZE_MAX        ((VM_MAX_KERNEL_ADDRESS - \
>>>      VM_MIN_KERNEL_ADDRESS + 1) * 3 / 5)
>>>
>>> ((1<<39) * 3 / 5) = 329853488332
>>>
>>> AFAIK, VM_MAX_KERNEL_ADDRESS is limited to 512Gb.  May be it's time to
>>> increase it again.  I would asked kib@ or alc@ about that.
>>>
>>> --
>>> Andrey Zonov
> Thanks! That's great (great for deriving 512GB) but looks like bad
> news for us, we've really hit some limits there (FreeBSD auto-tuning
> wise). As I've stated above, we have tried setting vm.kmem_size_max to
> 500GB/450GB unsuccessfully so there may be some part of the code
> that's breaking. Is there any thread or discussion where you can point
> me as to why they used only 60%(hard coded) of 512GB?
>
> Some relevant codes I've gathered on this machine:
>
> /usr/src/sys/amd64/include/vmparam.h
> /*
>   * Virtual addresses of things.  Derived from the page directory and
>   * page table indexes from pmap.h for precision.
>   *
>   * 0x0000000000000000 - 0x00007fffffffffff   user map
>   * 0x0000800000000000 - 0xffff7fffffffffff   does not exist (hole)
>   * 0xffff800000000000 - 0xffff804020100fff   recursive page table (512GB slot)
>   * 0xffff804020101000 - 0xfffffdffffffffff   unused
>   * 0xfffffe0000000000 - 0xfffffeffffffffff   1TB direct map
>   * 0xffffff0000000000 - 0xffffff7fffffffff   unused
>   * 0xffffff8000000000 - 0xffffffffffffffff   512GB kernel map
>   *
>   * Within the kernel map:
>   *
>   * 0xffffffff80000000                        KERNBASE
>   */
>
> #define VM_MAX_KERNEL_ADDRESS   KVADDR(KPML4I, NPDPEPG-1, NPDEPG-1, NPTEPG-1)
> #define VM_MIN_KERNEL_ADDRESS   KVADDR(KPML4I, NPDPEPG-512, 0, 0)
>
> /usr/src/sys/amd64/include/pmap.h
> /*
>   * Pte related macros.  This is complicated by having to deal with
>   * the sign extension of the 48th bit.
>   */
> #define KVADDR(l4, l3, l2, l1) ( \
>          ((unsigned long)-1 << 47) | \
>          ((unsigned long)(l4) << PML4SHIFT) | \
>          ((unsigned long)(l3) << PDPSHIFT) | \
>          ((unsigned long)(l2) << PDRSHIFT) | \
>          ((unsigned long)(l1) << PAGE_SHIFT))
>
> /usr/src/sys/amd64/include/param.h
> #define PAGE_SHIFT      12              /* LOG2(PAGE_SIZE) */
> #define PDRSHIFT        21              /* LOG2(NBPDR) */
> #define PDPSHIFT        30              /* LOG2(NBPDP) */
> #define PML4SHIFT       39              /* LOG2(NBPML4) */
>
> Yet to derive: KPML4I, NPDPEPG-1, NPDEPG-1, NPTEPG-1. Checking pmap.c,
> vm_machdep.c etc.
>
> Additional Info:
> 1] Installed using PCBSD-9 Release amd64.
>
> 2] uname -a
> FreeBSD fmt-iscsi-stg1.musicreports.com 9.0-RELEASE FreeBSD
> 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011
> root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-source/9.0/sys/GENERIC
>   amd64
>
> 3] first few lines from /var/run/dmesg.boot:
> FreeBSD 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011
>      root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-source/9.0/sys/GENERIC
> amd64
> CPU: Intel(R) Xeon(R) CPU E7- 8837  @ 2.67GHz (2666.82-MHz K8-class CPU)
>    Origin = "GenuineIntel"  Id = 0x206f2  Family = 6  Model = 2f  Stepping = 2
>    Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>    Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
>    AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>    AMD Features2=0x1<LAHF>
>    TSC: P-state invariant, performance statistics
> real memory  = 549755813888 (524288 MB)
> avail memory = 530339893248 (505771 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: <ALASKA A M I>
> FreeBSD/SMP: Multiprocessor System Detected: 64 CPUs
> FreeBSD/SMP: 8 package(s) x 8 core(s)
>
> 4] relevant sysctl's with manual tuning:
> kern.maxusers: 384
> kern.maxvnodes: 8222162
> vfs.numvnodes: 675740
> vfs.freevnodes: 417524
> kern.ipc.somaxconn: 128
> kern.openfiles: 5238
> vfs.zfs.arc_max: 428422987776
> vfs.zfs.arc_min: 53552873472
> vfs.zfs.arc_meta_used: 3167391088
> vfs.zfs.arc_meta_limit: 107105746944
> vm.kmem_size_max: 429496729600    ==>> manually tuned
> vm.kmem_size: 429496729600    ==>> manually tuned
> vm.kmem_map_free: 107374727168
> vm.kmem_map_size: 144625156096
> vfs.wantfreevnodes: 2055540
> kern.minvnodes: 2055540
> kern.maxfiles: 197248    ==>> manually tuned
> vm.vmtotal:
> System wide totals computed every five seconds: (values in kilobytes)
> ===============================================
> Processes:              (RUNQ: 1 Disk Wait: 1 Page Wait: 0 Sleep: 150)
> Virtual Memory:         (Total: 1086325716K Active: 12377876K)
> Real Memory:            (Total: 144143408K Active: 803432K)
> Shared Virtual Memory:  (Total: 81384K Active: 37560K)
> Shared Real Memory:     (Total: 32224K Active: 27548K)
> Free Memory Pages:      365565564K
>
> hw.availpages: 134170294
> hw.physmem: 549561524224
> hw.usermem: 391395241984
> hw.realmem: 551836188672
> vm.kmem_size_scale: 1
> kern.ipc.nmbclusters: 2560000    ==>> manually tuned
> kern.ipc.maxsockbuf: 2097152
> net.inet.tcp.sendbuf_max: 2097152
> net.inet.tcp.recvbuf_max: 2097152
> kern.maxfilesperproc: 18000
> net.inet.ip.intr_queue_maxlen: 256
> kern.maxswzone: 33554432
> kern.ipc.shmmax: 10737418240    ==>> manually tuned
> kern.ipc.shmall: 2621440    ==>> manually tuned
> vfs.zfs.write_limit_override: 0
> vfs.zfs.prefetch_disable: 0
> hw.pagesize: 4096
> hw.availpages: 134170294
> kern.ipc.maxpipekva: 8586895360
> kern.ipc.shm_use_phys: 1    ==>> manually tuned
> vfs.vmiodirenable: 1
> debug.numcache: 632148
> vfs.ncsizefactor: 2
> vm.kvm_size: 549755809792
> vm.kvm_free: 54456741888
> kern.ipc.semmni: 256
> kern.ipc.semmns: 512
> kern.ipc.semmnu: 256
>
>
> Thanks!
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?502EB081.3030801>