From owner-freebsd-performance@FreeBSD.ORG Sun Aug 19 00:58:11 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 77E471065670; Sun, 19 Aug 2012 00:58:11 +0000 (UTC) (envelope-from gezeala@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 36C848FC19; Sun, 19 Aug 2012 00:58:10 +0000 (UTC) Received: by dadr6 with SMTP id r6so1541988dad.13 for ; Sat, 18 Aug 2012 17:58:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=vBthfF7cM2GM8MfF2Z1G74V50SjWcaGxMzbEs8lL65Q=; b=MDS2wxQSw4olskJn5vw8EYa5bdflzYhMQHBQdjUkAmeQHNvgZPC+XOihhfRNdm6NFQ 2KJ6e4j7pvejvcKolm5VIVz1lw82xlT5g7QochYuu15B0MfcM0vPNBMIEHNRGagRZi7i 7zs64jMRb4ZuKhjXvdcQB8VUlFSohBTL10EIRoAcgkoXA1/gb5zovN3smZ++Z2vHJgCF v+9CUHpkCDrLsHH7t2ND9WW5Ql7aC+sXwbhR2pJiz2EGNX3ew6s2sRQRvL4hiNUPIUGR Ssr68lI4765VHYFLnK9FuUWMGtKdC6/gGcgFzvfedPhNgrCKG36hZv/NzVrH5DLj+1ac 8oCw== Received: by 10.68.230.232 with SMTP id tb8mr23171802pbc.19.1345337890432; Sat, 18 Aug 2012 17:58:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.190.71 with HTTP; Sat, 18 Aug 2012 17:57:50 -0700 (PDT) In-Reply-To: <502FE98E.40807@rice.edu> References: <502DEAD9.6050304@zonov.org> <502EB081.3030801@rice.edu> <502FE98E.40807@rice.edu> From: =?ISO-8859-1?Q?Gezeala_M=2E_Bacu=F1o_II?= Date: Sat, 18 Aug 2012 17:57:50 -0700 Message-ID: To: Alan Cox Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: alc@freebsd.org, freebsd-performance@freebsd.org, Andrey Zonov , kib@freebsd.org Subject: Re: vm.kmem_size_max and vm.kmem_size capped at 329853485875 (~307GB) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2012 00:58:11 -0000 On Sat, Aug 18, 2012 at 12:14 PM, Alan Cox wrote: > On 08/17/2012 17:08, Gezeala M. Bacu=F1o II wrote: >> >> On Fri, Aug 17, 2012 at 1:58 PM, Alan Cox wrote: >>> >>> vm.kmem_size controls the maximum size of the kernel's heap, i.e., the >>> region where the kernel's slab and malloc()-like memory allocators obta= in >>> their memory. While this heap may occupy the largest portion of the >>> kernel's virtual address space, it cannot occupy the entirety of the >>> address >>> space. There are other things that must be given space within the >>> kernel's >>> address space, for example, the file system buffer map. >>> >>> ZFS does not, however, use the regular file system buffer cache. The AR= C >>> takes its place, and the ARC abuses the kernel's heap like nothing else= . >>> So, if you are running a machine that only makes trivial use of a non-Z= FS >>> file system, like you boot from UFS, but store all of your data in ZFS, >>> then >>> you can dramatically reduce the size of the buffer map via boot loader >>> tuneables and proportionately increase vm.kmem_size. >>> >>> Any further increases in the kernel virtual address space size will, >>> however, require code changes. Small changes, but changes nonetheless. >>> >>> Alan >>> >>> >> <> >> >>>> Additional Info: >>>> 1] Installed using PCBSD-9 Release amd64. >>>> >>>> 2] uname -a >>>> FreeBSD fmt-iscsi-stg1.musicreports.com 9.0-RELEASE FreeBSD >>>> 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011 >>>> >>>> >>>> root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-sour= ce/9.0/sys/GENERIC >>>> amd64 >>>> >>>> 3] first few lines from /var/run/dmesg.boot: >>>> FreeBSD 9.0-RELEASE #3: Tue Dec 27 14:14:29 PST 2011 >>>> >>>> >>>> root@build9x64.pcbsd.org:/usr/obj/builds/amd64/pcbsd-build90/fbsd-sour= ce/9.0/sys/GENERIC >>>> amd64 >>>> CPU: Intel(R) Xeon(R) CPU E7- 8837 @ 2.67GHz (2666.82-MHz K8-class CP= U) >>>> Origin =3D "GenuineIntel" Id =3D 0x206f2 Family =3D 6 Model =3D= 2f >>>> Stepping >>>> =3D 2 >>>> >>>> >>>> Features=3D0xbfebfbff >>>> >>>> >>>> Features2=3D0x29ee3ff >>>> AMD Features=3D0x2c100800 >>>> AMD Features2=3D0x1 >>>> TSC: P-state invariant, performance statistics >>>> real memory =3D 549755813888 (524288 MB) >>>> avail memory =3D 530339893248 (505771 MB) >>>> Event timer "LAPIC" quality 600 >>>> ACPI APIC Table: >>>> FreeBSD/SMP: Multiprocessor System Detected: 64 CPUs >>>> FreeBSD/SMP: 8 package(s) x 8 core(s) >>>> >>>> 4] relevant sysctl's with manual tuning: >>>> kern.maxusers: 384 >>>> kern.maxvnodes: 8222162 >>>> vfs.numvnodes: 675740 >>>> vfs.freevnodes: 417524 >>>> kern.ipc.somaxconn: 128 >>>> kern.openfiles: 5238 >>>> vfs.zfs.arc_max: 428422987776 >>>> vfs.zfs.arc_min: 53552873472 >>>> vfs.zfs.arc_meta_used: 3167391088 >>>> vfs.zfs.arc_meta_limit: 107105746944 >>>> vm.kmem_size_max: 429496729600 =3D=3D>> manually tuned >>>> vm.kmem_size: 429496729600 =3D=3D>> manually tuned >>>> vm.kmem_map_free: 107374727168 >>>> vm.kmem_map_size: 144625156096 >>>> vfs.wantfreevnodes: 2055540 >>>> kern.minvnodes: 2055540 >>>> kern.maxfiles: 197248 =3D=3D>> manually tuned >>>> vm.vmtotal: >>>> System wide totals computed every five seconds: (values in kilobytes) >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> Processes: (RUNQ: 1 Disk Wait: 1 Page Wait: 0 Sleep: 150) >>>> Virtual Memory: (Total: 1086325716K Active: 12377876K) >>>> Real Memory: (Total: 144143408K Active: 803432K) >>>> Shared Virtual Memory: (Total: 81384K Active: 37560K) >>>> Shared Real Memory: (Total: 32224K Active: 27548K) >>>> Free Memory Pages: 365565564K >>>> >>>> hw.availpages: 134170294 >>>> hw.physmem: 549561524224 >>>> hw.usermem: 391395241984 >>>> hw.realmem: 551836188672 >>>> vm.kmem_size_scale: 1 >>>> kern.ipc.nmbclusters: 2560000 =3D=3D>> manually tuned >>>> kern.ipc.maxsockbuf: 2097152 >>>> net.inet.tcp.sendbuf_max: 2097152 >>>> net.inet.tcp.recvbuf_max: 2097152 >>>> kern.maxfilesperproc: 18000 >>>> net.inet.ip.intr_queue_maxlen: 256 >>>> kern.maxswzone: 33554432 >>>> kern.ipc.shmmax: 10737418240 =3D=3D>> manually tuned >>>> kern.ipc.shmall: 2621440 =3D=3D>> manually tuned >>>> vfs.zfs.write_limit_override: 0 >>>> vfs.zfs.prefetch_disable: 0 >>>> hw.pagesize: 4096 >>>> hw.availpages: 134170294 >>>> kern.ipc.maxpipekva: 8586895360 >>>> kern.ipc.shm_use_phys: 1 =3D=3D>> manually tuned >>>> vfs.vmiodirenable: 1 >>>> debug.numcache: 632148 >>>> vfs.ncsizefactor: 2 >>>> vm.kvm_size: 549755809792 >>>> vm.kvm_free: 54456741888 >>>> kern.ipc.semmni: 256 >>>> kern.ipc.semmns: 512 >>>> kern.ipc.semmnu: 256 >>>> >> Thanks. It will be mainly used for postgreSQL and java. We have a huge >> db (3TB and growing) and we need to have as much of it as we can on >> zfs' ARC. All data resides on zpools while root is on ufs. On 8.2 and >> 9 machines vm.kmem_size is always auto-tuned to almost the same size >> as our installed RAM. What I've tuned on those machines is lower >> vfs.zfs.arc_max to 50% or 75% of vm.kmem_size and that have worked >> well for us and the machines does not swap out. Now on this machine, I >> do think that I need to adjust my formula for tuning vfs.zfs.arc_max, >> 25% for other stuff is probably overkill. >> >> We were able to successfully bump vm.kmem_size_max and vm.kmem_size to >> 400GB: >> vm.kmem_size_max: 429496729600 =3D=3D>> manually tuned >> vm.kmem_size: 429496729600 =3D=3D>> manually tuned >> vfs.zfs.arc_max: 428422987776 =3D=3D>> auto-tuned (vm.kmem_size - 1G) >> vfs.zfs.arc_min: 53552873472 =3D=3D>> auto-tuned >> >> Which other tuneables do I need to set on /boot/loader.conf so we can >> boot the machine with vm.kmem_size> 400G. As I don't know which part >> of the boot-up process is failing with vm.kmem_size/_max set to 450G >> or 500G, I have no idea which to tune next. > > > > Your objective should be to reduce the value of "sysctl vfs.maxbufspace". > You can do this by setting the loader.conf tuneable "kern.maxbcache" to t= he > desired value. > > What does your machine currently report for "sysctl vfs.maxbufspace"? > Here you go: vfs.maxbufspace: 54967025664 kern.maxbcache: 0 Other (probably) relevant values: vfs.hirunningspace: 16777216 vfs.lorunningspace: 11206656 vfs.bufdefragcnt: 0 vfs.buffreekvacnt: 2 vfs.bufreusecnt: 320149 vfs.hibufspace: 54966370304 vfs.lobufspace: 54966304768 vfs.maxmallocbufspace: 2748318515 vfs.bufmallocspace: 0 vfs.bufspace: 10490478592 vfs.runningbufspace: 0 Let me know if you need other tuneables or sysctl values. Thanks a lot for looking into this.