Date: Mon, 26 Jul 2010 14:30:59 -0500 From: Alan Cox <alan.l.cox@gmail.com> To: Andriy Gapon <avg@freebsd.org> Cc: Matthew Fleming <mdf356@gmail.com>, freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? Message-ID: <AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com> In-Reply-To: <4C4DD1AA.3050906@freebsd.org> References: <4C4DB2B8.9080404@freebsd.org> <AANLkTikY%2BnPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com> <4C4DD1AA.3050906@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon <avg@freebsd.org> wrote: > on 26/07/2010 20:04 Matthew Fleming said the following: > > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org> wrote: > >> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be > set to 1? > >> I mean things potentially breaking, or some unpleasant surprise for an > >> administrator/user... > > > > As I understand it, it's merely a resource usage issue. amd64 needs > > page table entries for the expected virtual address space, so allowing > > more than e.g. 1/3 of physical memory means needing more PTEs. But > > the memory overhead isn't all that large IIRC: each 4k physical memory > > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it > > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual > > address space. > > My understanding is that paging entries are only allocated when actual > (physical) memory allocation is done. But I am not sure. > > > Having cut my OS teeth on AIX/PowerPC where virutal address space is > > free and has no relation to the size of the hardware page table, the > > FreeBSD architecture limiting the size of the kernel virtual space > > seemed weird to me. However, since FreeBSD also does not page kernel > > data to disk, there's a good reason to limit the size of the kernel's > > virtual space, since that also limits the kernel's physical space. > > > > In other words, setting it to 1 could lead to the system being out of > > memory but not trying to fail kernel malloc requests. I'm not > > entirely sure this is a new problem since one could also chew through > > physical memory with sub-page uma allocations as well on amd64. > > Well, personally I would prefer kernel eating a lot of memory over getting > "kmem_map too small" panic. Unexpectedly large memory usage by kernel can > be > detected and diagnosed, and then proper limits and (auto-)tuning could be > put in > place. Panic at some random allocation is not that helpful. > Besides, presently there are more and more workloads that require a lot of > kernel memory - e.g. ZFS is gaining popularity. > > Like what exactly? Since I increased the size of the kernel address space for amd64 to 512GB, and thus the size of the kernel heap was no longer limited by virtual address space size, but only by the auto-tuning based upon physical memory size, I am not aware of any "kmem_map to small" panics that are not ZFS/ARC related. > Hence, the question/suggestion. > > Of course, the things can be tuned by hand, but I think that > VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value. > > Even this would not eliminate the ZFS/ARC panics. I have heard that some people must configure the kmem_map to 1.5 times a machine's physical memory size to avoid panics. The reason is that unlike the traditional FreeBSD way of caching file data, the ZFS/ARC wants to have every page of cached data *mapped* (and wired) in the kernel address space. Over time, the available, unused space in the kmem_map becomes fragmented, and even though the ARC thinks that it has not reached its size limit, kmem_malloc() cannot find contiguous space to satisfy the allocation request. To see this described in great detail, do a web search for an e-mail by Ben Kelly with the subject "[patch] zfs kmem fragmentation". As far as eliminating or reducing the manual tuning that many ZFS users do, I would love to see someone tackle the overly conservative hard limit that we place on the number of vnode structures. The current hard limit was put in place when we had just introduced mutexes into many structures and more a mutex was much larger than it is today. Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq>