From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 19:31:01 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82E491065687; Mon, 26 Jul 2010 19:31:01 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1965A8FC08; Mon, 26 Jul 2010 19:31:00 +0000 (UTC) Received: by qwk3 with SMTP id 3so354383qwk.13 for ; Mon, 26 Jul 2010 12:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=dt5WTdO37i/Yn+WXGEQ6KnBEh4qdMtyGDgnq1vATrHQ=; b=Nt0gGxm+lebWcQJgt0TU0121OayRTf1yfscoJRgR7OTaCRJVV72t332nddOZpdt4Qq ky9iZbVsX9H1Dqt/48t8phdXkMeR8fLVUCGjwytSIvrYUa6WouLOi4zkDHm7XPY1vZra mHIDb+yzmC9jej9HN/+4etgx4hLmYBzOVQooU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=VrMgD9/7CCACjEHZvFSKdyRZZGke+mowec9GkGdVMY86WLVl7SgqccjoT8EADdi/dn Tmut31HRTCNWQ+Ap8VRypkIW8jFlBkAYDWL30SHMYfpKdmY52HPwQHfRR0fqz0QO2gbK bftiz2PS4NBqGJYyv1HniuloQBc6WWFzxwTsE= MIME-Version: 1.0 Received: by 10.224.65.138 with SMTP id j10mr6460469qai.147.1280172659950; Mon, 26 Jul 2010 12:30:59 -0700 (PDT) Received: by 10.229.239.5 with HTTP; Mon, 26 Jul 2010 12:30:59 -0700 (PDT) In-Reply-To: <4C4DD1AA.3050906@freebsd.org> References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> Date: Mon, 26 Jul 2010 14:30:59 -0500 Message-ID: From: Alan Cox To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Matthew Fleming , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:31:01 -0000 On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon wrote: > on 26/07/2010 20:04 Matthew Fleming said the following: > > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon wrote: > >> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be > set to 1? > >> I mean things potentially breaking, or some unpleasant surprise for an > >> administrator/user... > > > > As I understand it, it's merely a resource usage issue. amd64 needs > > page table entries for the expected virtual address space, so allowing > > more than e.g. 1/3 of physical memory means needing more PTEs. But > > the memory overhead isn't all that large IIRC: each 4k physical memory > > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it > > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual > > address space. > > My understanding is that paging entries are only allocated when actual > (physical) memory allocation is done. But I am not sure. > > > Having cut my OS teeth on AIX/PowerPC where virutal address space is > > free and has no relation to the size of the hardware page table, the > > FreeBSD architecture limiting the size of the kernel virtual space > > seemed weird to me. However, since FreeBSD also does not page kernel > > data to disk, there's a good reason to limit the size of the kernel's > > virtual space, since that also limits the kernel's physical space. > > > > In other words, setting it to 1 could lead to the system being out of > > memory but not trying to fail kernel malloc requests. I'm not > > entirely sure this is a new problem since one could also chew through > > physical memory with sub-page uma allocations as well on amd64. > > Well, personally I would prefer kernel eating a lot of memory over getting > "kmem_map too small" panic. Unexpectedly large memory usage by kernel can > be > detected and diagnosed, and then proper limits and (auto-)tuning could be > put in > place. Panic at some random allocation is not that helpful. > Besides, presently there are more and more workloads that require a lot of > kernel memory - e.g. ZFS is gaining popularity. > > Like what exactly? Since I increased the size of the kernel address space for amd64 to 512GB, and thus the size of the kernel heap was no longer limited by virtual address space size, but only by the auto-tuning based upon physical memory size, I am not aware of any "kmem_map to small" panics that are not ZFS/ARC related. > Hence, the question/suggestion. > > Of course, the things can be tuned by hand, but I think that > VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value. > > Even this would not eliminate the ZFS/ARC panics. I have heard that some people must configure the kmem_map to 1.5 times a machine's physical memory size to avoid panics. The reason is that unlike the traditional FreeBSD way of caching file data, the ZFS/ARC wants to have every page of cached data *mapped* (and wired) in the kernel address space. Over time, the available, unused space in the kmem_map becomes fragmented, and even though the ARC thinks that it has not reached its size limit, kmem_malloc() cannot find contiguous space to satisfy the allocation request. To see this described in great detail, do a web search for an e-mail by Ben Kelly with the subject "[patch] zfs kmem fragmentation". As far as eliminating or reducing the manual tuning that many ZFS users do, I would love to see someone tackle the overly conservative hard limit that we place on the number of vnode structures. The current hard limit was put in place when we had just introduced mutexes into many structures and more a mutex was much larger than it is today. Alan