Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Aug 2012 17:57:50 -0700
From:      =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= <gezeala@gmail.com>
To:        Alan Cox <alc@rice.edu>
Cc:        alc@freebsd.org, freebsd-performance@freebsd.org, Andrey Zonov <andrey@zonov.org>, kib@freebsd.org
Subject:   Re: vm.kmem_size_max and vm.kmem_size capped at 329853485875 (~307GB)
Message-ID:  <CAJKO3mVUMRfkUpSuk0fDdnEMc3hr087iH5u8b5N60CnPs-gP1g@mail.gmail.com>
In-Reply-To: <502FE98E.40807@rice.edu>
References:  <CAJKO3mU8bfn=jmWNSpvAXOR1AWyAAM0Sio1D1PnOYg8P59V9cg@mail.gmail.com> <CAGH67wS=jue7%2B92jSCyaydOLHC=hPwtndV64FVtC7nhDsPvFng@mail.gmail.com> <CAGH67wTNfW45pgJ_%2BVn_sX%2BP9M5B5wzPT9270dRmWjYF6KerrA@mail.gmail.com> <B74BE4AB-AB67-45BD-BFC3-9AE33A85751C@gmail.com> <502DEAD9.6050304@zonov.org> <CAJKO3mVWOFa9Cby_EWsf_OFHux7YBGSV7aGYSP2YANeJkqZtoQ@mail.gmail.com> <CAJKO3mU1NdkQwNSEDk3wWyLN700=dQ0_jSXt_sx-ABpywNjfsg@mail.gmail.com> <502EB081.3030801@rice.edu> <CAJKO3mWEXUvLtdSvmjgNhhyVqw4j0DuTYm9MqLd9=i9==WLAaA@mail.gmail.com> <502FE98E.40807@rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Aug 18, 2012 at 12:14 PM, Alan Cox <alc@rice.edu> wrote:
> On 08/17/2012 17:08, Gezeala M. Bacu=F1o II wrote:
>>
>> On Fri, Aug 17, 2012 at 1:58 PM, Alan Cox<alc@rice.edu>  wrote:
>>>
>>> vm.kmem_size controls the maximum size of the kernel's heap, i.e., the
>>> region where the kernel's slab and malloc()-like memory allocators obta=
in
>>> their memory.  While this heap may occupy the largest portion of the
>>> kernel's virtual address space, it cannot occupy the entirety of the
>>> address
>>> space.  There are other things that must be given space within the
>>> kernel's
>>> address space, for example, the file system buffer map.
>>>
>>> ZFS does not, however, use the regular file system buffer cache. The AR=
C
>>> takes its place, and the ARC abuses the kernel's heap like nothing else=
.
>>> So, if you are running a machine that only makes trivial use of a non-Z=
FS
>>> file system, like you boot from UFS, but store all of your data in ZFS,
>>> then
>>> you can dramatically reduce the size of the buffer map via boot loader
>>> tuneables and proportionately increase vm.kmem_size.
>>>
>>> Any further increases in the kernel virtual address space size will,
>>> however, require code changes.  Small changes, but changes nonetheless.
>>>
>>> Alan
>>>
>>>
>> <<snip>>
>>
>>>> Additional Info:
>>>> 1] Installed using PCBSD-9 Release amd64.
>>>>
>>>> 2] uname -a
>>>> FreeBSD fmt-iscsi-stg1.musicreports.com 9.0-RELEASE FreeBSD
>>>> 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011
>>>>
>>>>
>>>> root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-sour=
ce/9.0/sys/GENERIC
>>>>    amd64
>>>>
>>>> 3] first few lines from /var/run/dmesg.boot:
>>>> FreeBSD 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011
>>>>
>>>>
>>>> root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-sour=
ce/9.0/sys/GENERIC
>>>> amd64
>>>> CPU: Intel(R) Xeon(R) CPU E7- 8837  @ 2.67GHz (2666.82-MHz K8-class CP=
U)
>>>>     Origin =3D "GenuineIntel"  Id =3D 0x206f2  Family =3D 6  Model =3D=
 2f
>>>> Stepping
>>>> =3D 2
>>>>
>>>>
>>>> Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR=
,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>>>>
>>>>
>>>> Features2=3D0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2=
,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
>>>>     AMD Features=3D0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>>>>     AMD Features2=3D0x1<LAHF>
>>>>     TSC: P-state invariant, performance statistics
>>>> real memory  =3D 549755813888 (524288 MB)
>>>> avail memory =3D 530339893248 (505771 MB)
>>>> Event timer "LAPIC" quality 600
>>>> ACPI APIC Table:<ALASKA A M I>
>>>> FreeBSD/SMP: Multiprocessor System Detected: 64 CPUs
>>>> FreeBSD/SMP: 8 package(s) x 8 core(s)
>>>>
>>>> 4] relevant sysctl's with manual tuning:
>>>> kern.maxusers: 384
>>>> kern.maxvnodes: 8222162
>>>> vfs.numvnodes: 675740
>>>> vfs.freevnodes: 417524
>>>> kern.ipc.somaxconn: 128
>>>> kern.openfiles: 5238
>>>> vfs.zfs.arc_max: 428422987776
>>>> vfs.zfs.arc_min: 53552873472
>>>> vfs.zfs.arc_meta_used: 3167391088
>>>> vfs.zfs.arc_meta_limit: 107105746944
>>>> vm.kmem_size_max: 429496729600    =3D=3D>>  manually tuned
>>>> vm.kmem_size: 429496729600    =3D=3D>>  manually tuned
>>>> vm.kmem_map_free: 107374727168
>>>> vm.kmem_map_size: 144625156096
>>>> vfs.wantfreevnodes: 2055540
>>>> kern.minvnodes: 2055540
>>>> kern.maxfiles: 197248    =3D=3D>>  manually tuned
>>>> vm.vmtotal:
>>>> System wide totals computed every five seconds: (values in kilobytes)
>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>> Processes:              (RUNQ: 1 Disk Wait: 1 Page Wait: 0 Sleep: 150)
>>>> Virtual Memory:         (Total: 1086325716K Active: 12377876K)
>>>> Real Memory:            (Total: 144143408K Active: 803432K)
>>>> Shared Virtual Memory:  (Total: 81384K Active: 37560K)
>>>> Shared Real Memory:     (Total: 32224K Active: 27548K)
>>>> Free Memory Pages:      365565564K
>>>>
>>>> hw.availpages: 134170294
>>>> hw.physmem: 549561524224
>>>> hw.usermem: 391395241984
>>>> hw.realmem: 551836188672
>>>> vm.kmem_size_scale: 1
>>>> kern.ipc.nmbclusters: 2560000    =3D=3D>>  manually tuned
>>>> kern.ipc.maxsockbuf: 2097152
>>>> net.inet.tcp.sendbuf_max: 2097152
>>>> net.inet.tcp.recvbuf_max: 2097152
>>>> kern.maxfilesperproc: 18000
>>>> net.inet.ip.intr_queue_maxlen: 256
>>>> kern.maxswzone: 33554432
>>>> kern.ipc.shmmax: 10737418240    =3D=3D>>  manually tuned
>>>> kern.ipc.shmall: 2621440    =3D=3D>>  manually tuned
>>>> vfs.zfs.write_limit_override: 0
>>>> vfs.zfs.prefetch_disable: 0
>>>> hw.pagesize: 4096
>>>> hw.availpages: 134170294
>>>> kern.ipc.maxpipekva: 8586895360
>>>> kern.ipc.shm_use_phys: 1    =3D=3D>>  manually tuned
>>>> vfs.vmiodirenable: 1
>>>> debug.numcache: 632148
>>>> vfs.ncsizefactor: 2
>>>> vm.kvm_size: 549755809792
>>>> vm.kvm_free: 54456741888
>>>> kern.ipc.semmni: 256
>>>> kern.ipc.semmns: 512
>>>> kern.ipc.semmnu: 256
>>>>
>> Thanks. It will be mainly used for postgreSQL and java. We have a huge
>> db (3TB and growing) and we need to have as much of it as we can on
>> zfs' ARC. All data resides on zpools while root is on ufs. On 8.2 and
>> 9 machines vm.kmem_size is always auto-tuned to almost the same size
>> as our installed RAM. What I've tuned on those machines is lower
>> vfs.zfs.arc_max to 50% or 75% of vm.kmem_size and that have worked
>> well for us and the machines does not swap out. Now on this machine, I
>> do think that I need to adjust my formula for tuning vfs.zfs.arc_max,
>> 25% for other stuff is probably overkill.
>>
>> We were able to successfully bump vm.kmem_size_max and vm.kmem_size to
>> 400GB:
>> vm.kmem_size_max: 429496729600    =3D=3D>>  manually tuned
>> vm.kmem_size: 429496729600    =3D=3D>>  manually tuned
>> vfs.zfs.arc_max: 428422987776  =3D=3D>>  auto-tuned (vm.kmem_size - 1G)
>> vfs.zfs.arc_min: 53552873472  =3D=3D>>  auto-tuned
>>
>> Which other tuneables do I need to set on /boot/loader.conf so we can
>> boot the machine with vm.kmem_size>  400G. As I don't know which part
>> of the boot-up process is failing with vm.kmem_size/_max set to 450G
>> or 500G, I have no idea which to tune next.
>
>
>
> Your objective should be to reduce the value of "sysctl vfs.maxbufspace".
> You can do this by setting the loader.conf tuneable "kern.maxbcache" to t=
he
> desired value.
>
> What does your machine currently report for "sysctl vfs.maxbufspace"?
>

Here you go:
vfs.maxbufspace: 54967025664
kern.maxbcache: 0

Other (probably) relevant values:
vfs.hirunningspace: 16777216
vfs.lorunningspace: 11206656
vfs.bufdefragcnt: 0
vfs.buffreekvacnt: 2
vfs.bufreusecnt: 320149
vfs.hibufspace: 54966370304
vfs.lobufspace: 54966304768
vfs.maxmallocbufspace: 2748318515
vfs.bufmallocspace: 0
vfs.bufspace: 10490478592
vfs.runningbufspace: 0

Let me know if you need other tuneables or sysctl values. Thanks a lot
for looking into this.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJKO3mVUMRfkUpSuk0fDdnEMc3hr087iH5u8b5N60CnPs-gP1g>