From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 16:07:24 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11263106566B for ; Mon, 26 Jul 2010 16:07:24 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4BB308FC18 for ; Mon, 26 Jul 2010 16:07:22 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA26491 for ; Mon, 26 Jul 2010 19:07:21 +0300 (EEST) (envelope-from avg@freebsd.org) Message-ID: <4C4DB2B8.9080404@freebsd.org> Date: Mon, 26 Jul 2010 19:07:20 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: freebsd-arch@freebsd.org X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 16:07:24 -0000 Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1? I mean things potentially breaking, or some unpleasant surprise for an administrator/user... -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 17:05:07 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 461D61065672 for ; Mon, 26 Jul 2010 17:05:07 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-px0-f182.google.com (mail-px0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id 16E9D8FC12 for ; Mon, 26 Jul 2010 17:05:07 +0000 (UTC) Received: by pxi8 with SMTP id 8so173798pxi.13 for ; Mon, 26 Jul 2010 10:05:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=EqN/H7CaSAZ1SzK6adoTVXNoG0x2DfVCKIoz6sAPAvQ=; b=CMFkHLTt/pPXUDPyfS5cunKeBzbaGvP9iLz2KCNWveO7v/SP7R1RAaEJn/0ufzgIsV Ps/t0BdpTpYcGDeLKIYYFHwOxM6bMW+HBkL6E5WyUf4jATg+Qur28gTNlHyS+XOt0HcY SJc7iXMJtXa8jUXxnJX4aQb5xcbwfeeSsA5t0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=FhwqqqKnYM5tSPaV4IteFi45jtluziDyOwkFiAdTFRggPaq4pCIKE7oY6r/TK/K/Ca xZNe34hfPyQXO0B895WksIfIAMD7YYqcC79+FN/hBhILs6bKooYKMrDyRzL73gqlrUr1 T+Xywd7yPMynxoIMJAmmGhCG+/ffibhRUK2gE= MIME-Version: 1.0 Received: by 10.114.103.9 with SMTP id a9mr11787324wac.174.1280163898288; Mon, 26 Jul 2010 10:04:58 -0700 (PDT) Received: by 10.42.6.85 with HTTP; Mon, 26 Jul 2010 10:04:58 -0700 (PDT) In-Reply-To: <4C4DB2B8.9080404@freebsd.org> References: <4C4DB2B8.9080404@freebsd.org> Date: Mon, 26 Jul 2010 10:04:58 -0700 Message-ID: From: Matthew Fleming To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 17:05:07 -0000 On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon wrote: > > Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1? > I mean things potentially breaking, or some unpleasant surprise for an > administrator/user... As I understand it, it's merely a resource usage issue. amd64 needs page table entries for the expected virtual address space, so allowing more than e.g. 1/3 of physical memory means needing more PTEs. But the memory overhead isn't all that large IIRC: each 4k physical memory devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it takes about 4MB reserved as PTE pages to map 2GB of kernel virtual address space. Having cut my OS teeth on AIX/PowerPC where virutal address space is free and has no relation to the size of the hardware page table, the FreeBSD architecture limiting the size of the kernel virtual space seemed weird to me. However, since FreeBSD also does not page kernel data to disk, there's a good reason to limit the size of the kernel's virtual space, since that also limits the kernel's physical space. In other words, setting it to 1 could lead to the system being out of memory but not trying to fail kernel malloc requests. I'm not entirely sure this is a new problem since one could also chew through physical memory with sub-page uma allocations as well on amd64. Corrections to the above gratefully accepted. This is just my current understanding of it. Thanks, matthew From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 18:19:28 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D6F61065676 for ; Mon, 26 Jul 2010 18:19:28 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E1DB98FC1A for ; Mon, 26 Jul 2010 18:19:27 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA28708; Mon, 26 Jul 2010 21:19:25 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OdSGr-000H3b-3K; Mon, 26 Jul 2010 21:19:25 +0300 Message-ID: <4C4DD1AA.3050906@freebsd.org> Date: Mon, 26 Jul 2010 21:19:22 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: Matthew Fleming , freebsd-arch@freebsd.org References: <4C4DB2B8.9080404@freebsd.org> In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 18:19:28 -0000 on 26/07/2010 20:04 Matthew Fleming said the following: > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon wrote: >> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1? >> I mean things potentially breaking, or some unpleasant surprise for an >> administrator/user... > > As I understand it, it's merely a resource usage issue. amd64 needs > page table entries for the expected virtual address space, so allowing > more than e.g. 1/3 of physical memory means needing more PTEs. But > the memory overhead isn't all that large IIRC: each 4k physical memory > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual > address space. My understanding is that paging entries are only allocated when actual (physical) memory allocation is done. But I am not sure. > Having cut my OS teeth on AIX/PowerPC where virutal address space is > free and has no relation to the size of the hardware page table, the > FreeBSD architecture limiting the size of the kernel virtual space > seemed weird to me. However, since FreeBSD also does not page kernel > data to disk, there's a good reason to limit the size of the kernel's > virtual space, since that also limits the kernel's physical space. > > In other words, setting it to 1 could lead to the system being out of > memory but not trying to fail kernel malloc requests. I'm not > entirely sure this is a new problem since one could also chew through > physical memory with sub-page uma allocations as well on amd64. Well, personally I would prefer kernel eating a lot of memory over getting "kmem_map too small" panic. Unexpectedly large memory usage by kernel can be detected and diagnosed, and then proper limits and (auto-)tuning could be put in place. Panic at some random allocation is not that helpful. Besides, presently there are more and more workloads that require a lot of kernel memory - e.g. ZFS is gaining popularity. Hence, the question/suggestion. Of course, the things can be tuned by hand, but I think that VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 19:18:50 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A221810656B0; Mon, 26 Jul 2010 19:18:50 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6C1888FC12; Mon, 26 Jul 2010 19:18:50 +0000 (UTC) Received: by pvh1 with SMTP id 1so247366pvh.13 for ; Mon, 26 Jul 2010 12:18:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=C5M8sh625HpmpNqugV+mGGDAeJhwtqJNCxz4LQrvlPk=; b=mPzahy4rIGS5KDKELPcTU38OC4RXiw1+4yBy1ru3zUcR5K4w4Cxyy7QFa+nsWisvv4 IEM8VVm4uqDq5DC7rDUcwTcr6NiF1Ws5XPxZgNqlRS1tr7ZlZWtvCS8a9xbn4OY6aI5s 6P/y98qRbUN3WQRfUHvzkijbI13ZYKLvvEyGs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=VW1cDfsL6cetH9oQsMxcmKny43CkK6cqDzcDya397OFshmZY/EGOFreQiVFtT97BTm 6/UjgyUqeWgt4smWrlKH6dT8NKHP4NlhGMDYn8jQqaVLWlvtq6F09DWqb3gmu3xPBr9g cAwGtU3xoH5HJBhp0WTXIAWuoMJQD+hDHH9Xk= MIME-Version: 1.0 Received: by 10.142.223.12 with SMTP id v12mr9388422wfg.76.1280170129445; Mon, 26 Jul 2010 11:48:49 -0700 (PDT) Received: by 10.229.239.5 with HTTP; Mon, 26 Jul 2010 11:48:48 -0700 (PDT) In-Reply-To: References: <4C4DB2B8.9080404@freebsd.org> Date: Mon, 26 Jul 2010 13:48:48 -0500 Message-ID: From: Alan Cox To: Matthew Fleming Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:18:50 -0000 On Mon, Jul 26, 2010 at 12:04 PM, Matthew Fleming wrote: > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon wrote: > > > > Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set > to 1? > > I mean things potentially breaking, or some unpleasant surprise for an > > administrator/user... > > As I understand it, it's merely a resource usage issue. amd64 needs > page table entries for the expected virtual address space, so allowing > more than e.g. 1/3 of physical memory means needing more PTEs. But > the memory overhead isn't all that large IIRC: each 4k physical memory > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual > address space. > > Having cut my OS teeth on AIX/PowerPC where virutal address space is > free and has no relation to the size of the hardware page table, the > FreeBSD architecture limiting the size of the kernel virtual space > seemed weird to me. However, since FreeBSD also does not page kernel > data to disk, there's a good reason to limit the size of the kernel's > virtual space, since that also limits the kernel's physical space. > > This last answer is the answer that I would give as well. As you say, the page table memory isn't that significant. > In other words, setting it to 1 could lead to the system being out of > memory but not trying to fail kernel malloc requests. I'm not > entirely sure this is a new problem since one could also chew through > physical memory with sub-page uma allocations as well on amd64. > > Yes, on both counts. However, many of the things that we might allocate with uma_small_alloc() have caps, e.g., vnode structures, mitigating the risk somewhat. Alan From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 19:29:07 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7824A1065676 for ; Mon, 26 Jul 2010 19:29:07 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 5272B8FC13 for ; Mon, 26 Jul 2010 19:29:07 +0000 (UTC) Received: by pzk7 with SMTP id 7so1306491pzk.13 for ; Mon, 26 Jul 2010 12:29:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.200.21 with SMTP id x21mr3698302wff.207.1280172546682; Mon, 26 Jul 2010 12:29:06 -0700 (PDT) Received: by 10.229.237.73 with HTTP; Mon, 26 Jul 2010 12:29:05 -0700 (PDT) In-Reply-To: <4C4DD1AA.3050906@freebsd.org> References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> Date: Mon, 26 Jul 2010 12:29:05 -0700 Message-ID: From: Peter Wemm To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: Matthew Fleming , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:29:07 -0000 On Mon, Jul 26, 2010 at 11:19 AM, Andriy Gapon wrote: > on 26/07/2010 20:04 Matthew Fleming said the following: >> On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon wrote: >>> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1? >>> I mean things potentially breaking, or some unpleasant surprise for an >>> administrator/user... The amd64 kernel has a fixed size limit of kva, of which the kmem_map must fit inside. Most consumers of malloc() use the free direct map region, but there are some notable abusers of malloc (zfs being the prime offender) that prevent the use of the free direct map region for its allocations. I'm not familiar with how VM_KMEM_SIZE_SCALE's calculations work but I think it would be a crying shame to waste a huge chunk of finite kva space on systems that aren't handicapped by ZFS's abuse of malloc(). We've run out of kva space on amd64 in the past. To recap.. the amd64 kernel has a place to do temporary mappings. This space is finite. 6G on newer systems, 2G on older ones. This is most often used to remap discontiguous pages into virtually contiguous address space. The kernel also sets up a 1:1 virtual<->physical map region so it can get to any page on the system without requiring a kva mapping. If its clear that changing VM_KMEM_SIZE_SCALE makes sense for the common case then that's different. Of course, with machines with 128G / 256G of physical ram either already here or just around the corner, its time to start thinking hard about physical ram based scaling calculations again. That hard limit of 512G of physical ram doesn't seem so distant anymore.. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 19:31:01 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82E491065687; Mon, 26 Jul 2010 19:31:01 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1965A8FC08; Mon, 26 Jul 2010 19:31:00 +0000 (UTC) Received: by qwk3 with SMTP id 3so354383qwk.13 for ; Mon, 26 Jul 2010 12:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=dt5WTdO37i/Yn+WXGEQ6KnBEh4qdMtyGDgnq1vATrHQ=; b=Nt0gGxm+lebWcQJgt0TU0121OayRTf1yfscoJRgR7OTaCRJVV72t332nddOZpdt4Qq ky9iZbVsX9H1Dqt/48t8phdXkMeR8fLVUCGjwytSIvrYUa6WouLOi4zkDHm7XPY1vZra mHIDb+yzmC9jej9HN/+4etgx4hLmYBzOVQooU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=VrMgD9/7CCACjEHZvFSKdyRZZGke+mowec9GkGdVMY86WLVl7SgqccjoT8EADdi/dn Tmut31HRTCNWQ+Ap8VRypkIW8jFlBkAYDWL30SHMYfpKdmY52HPwQHfRR0fqz0QO2gbK bftiz2PS4NBqGJYyv1HniuloQBc6WWFzxwTsE= MIME-Version: 1.0 Received: by 10.224.65.138 with SMTP id j10mr6460469qai.147.1280172659950; Mon, 26 Jul 2010 12:30:59 -0700 (PDT) Received: by 10.229.239.5 with HTTP; Mon, 26 Jul 2010 12:30:59 -0700 (PDT) In-Reply-To: <4C4DD1AA.3050906@freebsd.org> References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> Date: Mon, 26 Jul 2010 14:30:59 -0500 Message-ID: From: Alan Cox To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Matthew Fleming , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:31:01 -0000 On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon wrote: > on 26/07/2010 20:04 Matthew Fleming said the following: > > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon wrote: > >> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be > set to 1? > >> I mean things potentially breaking, or some unpleasant surprise for an > >> administrator/user... > > > > As I understand it, it's merely a resource usage issue. amd64 needs > > page table entries for the expected virtual address space, so allowing > > more than e.g. 1/3 of physical memory means needing more PTEs. But > > the memory overhead isn't all that large IIRC: each 4k physical memory > > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it > > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual > > address space. > > My understanding is that paging entries are only allocated when actual > (physical) memory allocation is done. But I am not sure. > > > Having cut my OS teeth on AIX/PowerPC where virutal address space is > > free and has no relation to the size of the hardware page table, the > > FreeBSD architecture limiting the size of the kernel virtual space > > seemed weird to me. However, since FreeBSD also does not page kernel > > data to disk, there's a good reason to limit the size of the kernel's > > virtual space, since that also limits the kernel's physical space. > > > > In other words, setting it to 1 could lead to the system being out of > > memory but not trying to fail kernel malloc requests. I'm not > > entirely sure this is a new problem since one could also chew through > > physical memory with sub-page uma allocations as well on amd64. > > Well, personally I would prefer kernel eating a lot of memory over getting > "kmem_map too small" panic. Unexpectedly large memory usage by kernel can > be > detected and diagnosed, and then proper limits and (auto-)tuning could be > put in > place. Panic at some random allocation is not that helpful. > Besides, presently there are more and more workloads that require a lot of > kernel memory - e.g. ZFS is gaining popularity. > > Like what exactly? Since I increased the size of the kernel address space for amd64 to 512GB, and thus the size of the kernel heap was no longer limited by virtual address space size, but only by the auto-tuning based upon physical memory size, I am not aware of any "kmem_map to small" panics that are not ZFS/ARC related. > Hence, the question/suggestion. > > Of course, the things can be tuned by hand, but I think that > VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value. > > Even this would not eliminate the ZFS/ARC panics. I have heard that some people must configure the kmem_map to 1.5 times a machine's physical memory size to avoid panics. The reason is that unlike the traditional FreeBSD way of caching file data, the ZFS/ARC wants to have every page of cached data *mapped* (and wired) in the kernel address space. Over time, the available, unused space in the kmem_map becomes fragmented, and even though the ARC thinks that it has not reached its size limit, kmem_malloc() cannot find contiguous space to satisfy the allocation request. To see this described in great detail, do a web search for an e-mail by Ben Kelly with the subject "[patch] zfs kmem fragmentation". As far as eliminating or reducing the manual tuning that many ZFS users do, I would love to see someone tackle the overly conservative hard limit that we place on the number of vnode structures. The current hard limit was put in place when we had just introduced mutexes into many structures and more a mutex was much larger than it is today. Alan From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 19:35:15 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7605A106566C; Mon, 26 Jul 2010 19:35:15 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-px0-f182.google.com (mail-px0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id 407D58FC1E; Mon, 26 Jul 2010 19:35:15 +0000 (UTC) Received: by pxi8 with SMTP id 8so253501pxi.13 for ; Mon, 26 Jul 2010 12:35:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=YswMUv+kpVmcWpa2XAWn8SAhKfs40blsor/dZ+FMZbk=; b=D9hO8kPZfm0RFqw9+vmRw4xPxk63NN/17afvZPTcVKzKZWiOV7uPxQ3eQiX0hTdjUC SPxKZtWntL/6fPDmQCpiQ+zqmIc6j0Scih4jlPfeb1AMLCTH8YaVke53fxhsV3mehkX6 bZ7oIhWFGEselPvYEuJTzi26s6RRltG/0sSUo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=d/VG4lZQGwyrHCibwJdml/S0r9O+79mDdllPIqTL/ssDfkqHdRh0MXcO2YXRCqK45i ZZycfHT+2IMnoGiQj1cefigPyh1Qj274pMlW3oFxT2eAj2YBgWnWZwzH284Eu9Sq0ivy cQ3IOKdbp7OAcr+n99ht33zDxazA+8jy1K/uM= MIME-Version: 1.0 Received: by 10.114.13.12 with SMTP id 12mr7811003wam.90.1280172914679; Mon, 26 Jul 2010 12:35:14 -0700 (PDT) Received: by 10.229.239.5 with HTTP; Mon, 26 Jul 2010 12:35:13 -0700 (PDT) In-Reply-To: References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> Date: Mon, 26 Jul 2010 14:35:13 -0500 Message-ID: From: Alan Cox To: Peter Wemm Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Matthew Fleming , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:35:15 -0000 Peter, In FreeBSD >= 7.3, the kernel address space limit is no longer 6GB. It is now 512GB. Alan From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 19:43:26 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB9B91065670; Mon, 26 Jul 2010 19:43:26 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 14BA88FC1C; Mon, 26 Jul 2010 19:43:25 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29829; Mon, 26 Jul 2010 22:43:24 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OdTa7-000H9t-Uw; Mon, 26 Jul 2010 22:43:24 +0300 Message-ID: <4C4DE555.2020503@freebsd.org> Date: Mon, 26 Jul 2010 22:43:17 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: alc@freebsd.org References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Matthew Fleming , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:43:27 -0000 on 26/07/2010 22:30 Alan Cox said the following: > On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon > wrote: > > on 26/07/2010 20:04 Matthew Fleming said the following: > > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon > wrote: > >> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should > not be set to 1? > >> I mean things potentially breaking, or some unpleasant surprise > for an > >> administrator/user... > > > > As I understand it, it's merely a resource usage issue. amd64 needs > > page table entries for the expected virtual address space, so allowing > > more than e.g. 1/3 of physical memory means needing more PTEs. But > > the memory overhead isn't all that large IIRC: each 4k physical memory > > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it > > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual > > address space. > > My understanding is that paging entries are only allocated when actual > (physical) memory allocation is done. But I am not sure. > > > Having cut my OS teeth on AIX/PowerPC where virutal address space is > > free and has no relation to the size of the hardware page table, the > > FreeBSD architecture limiting the size of the kernel virtual space > > seemed weird to me. However, since FreeBSD also does not page kernel > > data to disk, there's a good reason to limit the size of the kernel's > > virtual space, since that also limits the kernel's physical space. > > > > In other words, setting it to 1 could lead to the system being out of > > memory but not trying to fail kernel malloc requests. I'm not > > entirely sure this is a new problem since one could also chew through > > physical memory with sub-page uma allocations as well on amd64. > > Well, personally I would prefer kernel eating a lot of memory over > getting > "kmem_map too small" panic. Unexpectedly large memory usage by > kernel can be > detected and diagnosed, and then proper limits and (auto-)tuning > could be put in > place. Panic at some random allocation is not that helpful. > Besides, presently there are more and more workloads that require a > lot of > kernel memory - e.g. ZFS is gaining popularity. > > > Like what exactly? Since I increased the size of the kernel address > space for amd64 to 512GB, and thus the size of the kernel heap was no > longer limited by virtual address space size, but only by the > auto-tuning based upon physical memory size, I am not aware of any > "kmem_map to small" panics that are not ZFS/ARC related. Well, I meant exactly these. > Hence, the question/suggestion. > > Of course, the things can be tuned by hand, but I think that > VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current > value. > > > Even this would not eliminate the ZFS/ARC panics. I have heard that > some people must configure the kmem_map to 1.5 times a machine's > physical memory size to avoid panics. The reason is that unlike the > traditional FreeBSD way of caching file data, the ZFS/ARC wants to have > every page of cached data *mapped* (and wired) in the kernel address > space. Over time, the available, unused space in the kmem_map becomes > fragmented, and even though the ARC thinks that it has not reached its > size limit, kmem_malloc() cannot find contiguous space to satisfy the > allocation request. To see this described in great detail, do a web > search for an e-mail by Ben Kelly with the subject "[patch] zfs kmem > fragmentation". Yes, I am aware of the fragmentation issue. But I haven't hit that panic myself since setting vm.kmem_size_scale="1" in loader.conf. Of course, what I propose would not fix the fragmentation issue. But... it's something that ZFS users (especially serious ZFS users like file servers) would want to do anyway and it won't cause any harm for others. > As far as eliminating or reducing the manual tuning that many ZFS users > do, I would love to see someone tackle the overly conservative hard > limit that we place on the number of vnode structures. The current hard > limit was put in place when we had just introduced mutexes into many > structures and more a mutex was much larger than it is today. I agree. But that's a little bit different topic. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 20:12:54 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FEE6106564A; Mon, 26 Jul 2010 20:12:54 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 8E5E08FC19; Mon, 26 Jul 2010 20:12:53 +0000 (UTC) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.4/8.14.4) with ESMTP id o6QJglSv029077; Mon, 26 Jul 2010 13:42:47 -0600 (MDT) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: Date: Mon, 26 Jul 2010 13:42:47 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> To: alc@freebsd.org X-Mailer: Apple Mail (2.1078) X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: Matthew Fleming , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 20:12:54 -0000 On Jul 26, 2010, at 1:35 PM, Alan Cox wrote: > Peter, >=20 > In FreeBSD >=3D 7.3, the kernel address space limit is no longer 6GB. = It is > now 512GB. >=20 Ok, I mistakenly thought that it was still 2GB/6GB as well. So to be = clear, KVA maxes out at ? and kmem maxes out at ? Scott From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 20:55:51 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 79EB81065677; Mon, 26 Jul 2010 20:55:51 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8B9618FC2B; Mon, 26 Jul 2010 20:55:50 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA00770; Mon, 26 Jul 2010 23:55:36 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OdUhz-000HEi-Po; Mon, 26 Jul 2010 23:55:35 +0300 Message-ID: <4C4DF646.9090206@freebsd.org> Date: Mon, 26 Jul 2010 23:55:34 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: Scott Long References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: 7bit Cc: alc@freebsd.org, Matthew Fleming , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 20:55:51 -0000 on 26/07/2010 22:42 Scott Long said the following: > On Jul 26, 2010, at 1:35 PM, Alan Cox wrote: >> Peter, >> >> In FreeBSD >= 7.3, the kernel address space limit is no longer 6GB. It is >> now 512GB. >> > > Ok, I mistakenly thought that it was still 2GB/6GB as well. So to be clear, > KVA maxes out at ? As Alan said - 512GB. > and kmem maxes out at ? There is a formula with bunch of tunables, but normally it's 1/3 of available physical memory. Unless I am mistaken. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Mon Jul 26 23:06:07 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E2DA1065675; Mon, 26 Jul 2010 23:06:07 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from fallbackmx10.syd.optusnet.com.au (fallbackmx10.syd.optusnet.com.au [211.29.132.251]) by mx1.freebsd.org (Postfix) with ESMTP id 5817B8FC16; Mon, 26 Jul 2010 23:06:05 +0000 (UTC) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by fallbackmx10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o6QL6OXW006316; Tue, 27 Jul 2010 07:06:24 +1000 Received: from server.vk2pj.dyndns.org (c211-30-160-13.belrs4.nsw.optusnet.com.au [211.30.160.13]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o6QL6JAq025144 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 27 Jul 2010 07:06:21 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o6QL6JlW003135; Tue, 27 Jul 2010 07:06:19 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o6QL6Jlr003134; Tue, 27 Jul 2010 07:06:19 +1000 (EST) (envelope-from peter) Date: Tue, 27 Jul 2010 07:06:19 +1000 From: Peter Jeremy To: Peter Wemm Message-ID: <20100726210619.GC2921@server.vk2pj.dyndns.org> References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0vzXIDBeUiKkjNJl" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Matthew Fleming , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 23:06:07 -0000 --0vzXIDBeUiKkjNJl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Jul-26 12:29:05 -0700, Peter Wemm wrote: >That hard limit of 512G of physical ram doesn't seem so distant anymore.. You can put 512GB of 16GB DIMMs into one of these: http://www.supermicro.com/a_images/products/Aplus/MB/H8QGi-F_spec.jpg And, I don't have the link but at least one of Dell's higher-end boxes allows you to select 1TB RAM in the configurator. --=20 Peter Jeremy --0vzXIDBeUiKkjNJl Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (FreeBSD) iEYEARECAAYFAkxN+MsACgkQ/opHv/APuIeW2ACgvg3/85MUTDg42Xg60T5Qb1U8 J2AAnA/sCwy3nTo8z9cn4+HM1w1UmC+Z =QQVm -----END PGP SIGNATURE----- --0vzXIDBeUiKkjNJl-- From owner-freebsd-arch@FreeBSD.ORG Tue Jul 27 14:16:29 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D7A5106564A; Tue, 27 Jul 2010 14:16:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1E3588FC0C; Tue, 27 Jul 2010 14:16:29 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id C233E46B38; Tue, 27 Jul 2010 10:16:28 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E8E748A04E; Tue, 27 Jul 2010 10:16:27 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org, alc@freebsd.org Date: Tue, 27 Jul 2010 09:35:52 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> In-Reply-To: MIME-Version: 1.0 Message-Id: <201007270935.52082.jhb@freebsd.org> Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Tue, 27 Jul 2010 10:16:27 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Matthew Fleming , Andriy Gapon Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 14:16:29 -0000 On Monday, July 26, 2010 3:30:59 pm Alan Cox wrote: > As far as eliminating or reducing the manual tuning that many ZFS users do, > I would love to see someone tackle the overly conservative hard limit that > we place on the number of vnode structures. The current hard limit was put > in place when we had just introduced mutexes into many structures and more a > mutex was much larger than it is today. I have a strawman of that (relative to 7). It simply adjusts the hardcoded maximum to instead be a function of the amount of physical memory. Index: vfs_subr.c =================================================================== --- vfs_subr.c (revision 210934) +++ vfs_subr.c (working copy) @@ -288,6 +288,7 @@ static void vntblinit(void *dummy __unused) { + int vnodes; /* * Desiredvnodes is a function of the physical memory size and @@ -299,10 +300,19 @@ desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size / (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); if (desiredvnodes > MAXVNODES_MAX) { + + /* + * If there is a lot of physical memory, allow the cap + * on vnodes to expand to using a little under 1% of + * available RAM. + */ + vnodes = max(MAXVNODES_MAX, cnt.v_page_count * (PAGE_SIZE / + 128) / (sizeof(struct vm_object) + sizeof(struct vnode))); + KASSERT(vnodes < desiredvnodes, ("capped vnodes too big")); if (bootverbose) printf("Reducing kern.maxvnodes %d -> %d\n", - desiredvnodes, MAXVNODES_MAX); - desiredvnodes = MAXVNODES_MAX; + desiredvnodes, vnodes); + desiredvnodes = vnodes; } wantfreevnodes = desiredvnodes / 4; mtx_init(&mntid_mtx, "mntid", NULL, MTX_DEF); -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Jul 27 16:45:36 2010 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02E78106566B; Tue, 27 Jul 2010 16:45:36 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 8C0218FC12; Tue, 27 Jul 2010 16:45:35 +0000 (UTC) Received: from c122-106-147-41.carlnfd1.nsw.optusnet.com.au (c122-106-147-41.carlnfd1.nsw.optusnet.com.au [122.106.147.41]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o6RGjVXa022996 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 28 Jul 2010 02:45:32 +1000 Date: Wed, 28 Jul 2010 02:45:31 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: alc@FreeBSD.org In-Reply-To: Message-ID: <20100728001247.B899@delplex.bde.org> References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Matthew Fleming , Andriy Gapon , freebsd-arch@FreeBSD.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 16:45:36 -0000 On Mon, 26 Jul 2010, Alan Cox wrote: > On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon wrote: > >> on 26/07/2010 20:04 Matthew Fleming said the following: >>> On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon wrote: >>>> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be >> set to 1? >>>> I mean things potentially breaking, or some unpleasant surprise for an >>>> administrator/user... Shouldn't it be a fraction (of about 1/(2**32)) so that you can map things sparsely into about 2**64 bytes of KVA? Actually mapping 2**64 bytes of KVA would take too many resources, but is does it take too many resources to reserve that amount and to be prepared to actually map lots more than now? >>> As I understand it, it's merely a resource usage issue. amd64 needs >>> page table entries for the expected virtual address space, so allowing >>> more than e.g. 1/3 of physical memory means needing more PTEs. But >>> the memory overhead isn't all that large IIRC: each 4k physical memory >>> devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it >>> takes about 4MB reserved as PTE pages to map 2GB of kernel virtual >>> address space. That's not small, but isn't it 1024 times less due to 4MB pages in the kernel? But I guess 4MB pages are no good for sparse mappings. >> ... >> Well, personally I would prefer kernel eating a lot of memory over getting >> "kmem_map too small" panic. Unexpectedly large memory usage by kernel can >> be >> detected and diagnosed, and then proper limits and (auto-)tuning could be >> put in >> place. Panic at some random allocation is not that helpful. >> Besides, presently there are more and more workloads that require a lot of >> kernel memory - e.g. ZFS is gaining popularity. >> > Like what exactly? Since I increased the size of the kernel address space > for amd64 to 512GB, and thus the size of the kernel heap was no longer > limited by virtual address space size, but only by the auto-tuning based > upon physical memory size, I am not aware of any "kmem_map to small" panics > that are not ZFS/ARC related. > >> Hence, the question/suggestion. >> >> Of course, the things can be tuned by hand, but I think that >> VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value. >> > Even this would not eliminate the ZFS/ARC panics. I have heard that some > people must configure the kmem_map to 1.5 times a machine's physical memory > size to avoid panics. 2**32 times larger whould avoid this even better (up to 4GB physical memory) :-). With 512GB virtual and 4GB physical, 128 times larger (VM_KMEM_SIZE_SCALE=(1/ 128.0) is almost possible and 32 larger seems practical (leave 3/4 for other things). However, it seems wrong to scale by physical memory at all. If you are prepared to map 512GB, why not allow a significant fraction of that (say 1/4) to be used for kmem? The only problem that I see is that there will be more rounds of physical memory and disk sizes increasing faster than virtual memory limits; on every round algorithms based on sparse mappings break. > The reason is that unlike the traditional FreeBSD way > of caching file data, the ZFS/ARC wants to have every page of cached data > *mapped* (and wired) in the kernel address space. Traditional BSD (Net/2 at least, and perhaps even FreeBSD-1), mapped and wired every page of cached data (all ~2MB of it) sparsely into buffer map part of the kernel address space (all ~16MB or 32MB of it in 386BSD or FreeBSD-early, but 256MB in FreeBSD-1.1.5). I like the simplicity of this. It would have worked perfectly in FreeBSD-1.1.5 since physical memory and disk sizes were still much smaller than i386 address space. It would work adequately even now (since nbuf now only needs to be large enough to limit thrashing of VMIO mappings). > Over time, the available, > unused space in the kmem_map becomes fragmented, and even though the ARC > thinks that it has not reached its size limit, kmem_malloc() cannot find > contiguous space to satisfy the allocation request. To see this described > in great detail, do a web search for an e-mail by Ben Kelly with the subject > "[patch] zfs kmem fragmentation". This is exactly what happened several with the buffer map(s) in FreeBSD-[1-2][3-4?], except with memory sizes scaled by 3, then 2, then 1 decimal orders of magnitude. In FreeBSD-1, plain malloc() was used for buffers, and kmem_map was far too small (16MB) for this to work well. In FreeBSD-[2-current], a much more complicated method is used to allocate buffers (and to map VMIO pages into buffers). This is essentially a private version of malloc() with lots of specialization for buffers and a separate map so that it doesn't have to fight with other users of malloc(). Despite its specialization, this still had problems with fragmentation. It wasn't until FreeBSD-4 that the specialization became complicated enough to mostly avoid these problems. Bruce From owner-freebsd-arch@FreeBSD.ORG Wed Jul 28 22:45:52 2010 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E732A1065672; Wed, 28 Jul 2010 22:45:52 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C50A28FC0C; Wed, 28 Jul 2010 22:45:52 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 67DC046B45; Wed, 28 Jul 2010 18:45:52 -0400 (EDT) Date: Wed, 28 Jul 2010 23:45:52 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: freebsd-net@FreeBSD.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: freebsd-arch@FreeBSD.org Subject: Future of netnatm: volunteer wanted -- and/or -- removal notice X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jul 2010 22:45:53 -0000 Dear all: When the new link layer framework was introduced in 8.0, one of our ATM stacks, netnatm, was left behind. As a result, it neither compiles nor runs in 8.x and 9.x. This e-mail serves two purposes: (1) To solicit a volunteer who can work on the netnatm stack in 9.x, with potential merge to 8.x, to get it back to functionality before 9.0 ships. This is the preferred course of action. (2) To serve as notice that if we can't find a volunteer to do this, we will remove netnatm and associated parts from the tree in 9.0 since they'll have gone one major version neither compiling nor running. This is the fallback plan. I'm in no great rush to remove netnatm, having spent quite a bit of time making it work in our MPSAFE world order a couple of years ago. However, the code is bitrotting and requires urgent attention if it's going to work again easily (the stack is changing around it, and because netnatm doesn't build, it will get only cursory and likely incorrect updates). I'm happy to help funnel changes into the tree from non-committers, as well as answer questions about the network stack, but I have no hardware facilities for debugging or testing netnatm changes myself, nor, unfortunately, the time to work on the code. In order to provide further motivation for potentially interested parties, here's the proposed six-month removal schedule: 28 July 2010 - Notice of proposed removal 28 October 2010 - Transmit of notice of proposed removal 28 January 2011 - Proposed removal date This schedule may be updated as the 9.0 release schedule becomes more clear, or if there are obvious signs of improvement and just a couple more months would get it fixed :-). And, if worst comes to worst and we can't find a volunteer, the code will live on in the source repository history if there's a desire to rejuvenate it in the future. Thanks, Robert Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-arch@FreeBSD.ORG Thu Jul 29 15:47:44 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FD411065676 for ; Thu, 29 Jul 2010 15:47:44 +0000 (UTC) (envelope-from tijl@coosemans.org) Received: from mailrelay004.isp.belgacom.be (mailrelay004.isp.belgacom.be [195.238.6.170]) by mx1.freebsd.org (Postfix) with ESMTP id 9FEE58FC16 for ; Thu, 29 Jul 2010 15:47:43 +0000 (UTC) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkwFANc3UUxbsVuf/2dsb2JhbACTNIxTcsAohTgE Received: from 159.91-177-91.adsl-dyn.isp.belgacom.be (HELO kalimero.tijl.coosemans.org) ([91.177.91.159]) by relay.skynet.be with ESMTP; 29 Jul 2010 17:18:13 +0200 Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org [127.0.0.1]) by kalimero.tijl.coosemans.org (8.14.4/8.14.4) with ESMTP id o6TFIDUI005066; Thu, 29 Jul 2010 17:18:13 +0200 (CEST) (envelope-from tijl@coosemans.org) From: Tijl Coosemans To: freebsd-arch@freebsd.org Date: Thu, 29 Jul 2010 17:18:03 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.1-PRERELEASE; KDE/4.4.5; i386; ; ) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1799404.mOfT1x0N4T"; protocol="application/pgp-signature"; micalg=pgp-sha256 Content-Transfer-Encoding: 7bit Message-Id: <201007291718.12687.tijl@coosemans.org> Cc: pluknet Subject: Support for cc -m32 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 15:47:44 -0000 --nextPart1799404.mOfT1x0N4T Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, I've put the initial version of some patches online to support cross compilation of 32 bit binaries on amd64. It's modelled after how NetBSD does this. With these patches something like "cc -m32 -o test test.c -pthread -lm" generates a program that runs on FreeBSD/i386. http://people.freebsd.org/~tijl/cc-m32-1.diff http://people.freebsd.org/~tijl/cc-m32-2.diff http://people.freebsd.org/~tijl/cc-m32-3.diff *cc-m32-1.diff* : Let ld and cc find 32 bit libraries. *cc-m32-2.diff* : Install i386 headers on amd64. With this patch headers for a particular $arch are always installed under /usr/include/$arch and /usr/include/machine becomes a symlink. A question I have here is how best to clean up the old machine directory. The patch currently uses 'rm -rf'. Another problem I encountered was that during the build of usr.bin/kdump all headers are searched for definitions of ioctl requests and a C source code file is generated that includes all those headers. This fails when both i386 and amd64 headers are installed because they can't both be included at the same time. For now the patch simply blacklists /usr/include/i386, but actually all $arch should be excluded. The ioctl requests can still be found through the machine symlink. If someone has a better idea... *cc-m32-3.diff* : Modify amd64 headers to include i386 headers when __i386__ is defined. This patch modifies the amd64 headers to follow this format: #ifndef _AMD64_HEADER_H #define _AMD64_HEADER_H #ifdef __i386__ #include #else ... #endif /* __i386__ */ #endif /* !_AMD64_HEADER_H */ This way including works for -m32. There are a few i386 headers which don't exist for amd64: apm_segments.h bootinfo.h cserial.h elan_mmcr.h if_wl_wavelan.h ioctl_bt848.h ioctl_meteor.h npx.h pcaudioio.h pcb_ext.h perfmon.h privatespace.h smapi.h speaker.h vm86.h xbox.h Theoretically a dummy amd64 header should be created for each of them that just includes the i386 header. The patch does this for npx.h. The other headers seem to be really i386 specific or even outdated. If it were ever necessary to cross-compile code that uses them, it would be easy to modify that code to directly include . Feel free to test the patches and to comment on any part of them. --nextPart1799404.mOfT1x0N4T Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (FreeBSD) iF4EABEIAAYFAkxRm7QACgkQfoCS2CCgtivT8AD/VOb8YCDFbvGNqKiPfx+D1oSz CiOH80+ChiWcjC/cDPIA/Azn52ZFrE4eCDs1Cr/pQAIWAP71soOk1oNrvoWeYOI4 =eL00 -----END PGP SIGNATURE----- --nextPart1799404.mOfT1x0N4T-- From owner-freebsd-arch@FreeBSD.ORG Thu Jul 29 17:01:40 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6699B1065676 for ; Thu, 29 Jul 2010 17:01:40 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-px0-f182.google.com (mail-px0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id 365D38FC25 for ; Thu, 29 Jul 2010 17:01:39 +0000 (UTC) Received: by pxi8 with SMTP id 8so216812pxi.13 for ; Thu, 29 Jul 2010 10:01:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=KIrV2jo9MaNakAL1H67VVIzBfeBmPDIuDwku9dc8UnM=; b=ZEDzGXc3J6LzSxkUbYNwTirN0R/W7tzREOYl7NtHPRaSVkjKlT0c/3BVntN+NOSKYf RDznAeXARCZAoeBQSdWRrBpruD6pj07lVR7vwqW26irI2p56F0ZJEKH+rZ3YgJ8z26Tf aPxsJwXCQkMUucB6tathCYWAN2BwHH5I7LpY8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; b=Vjs3wmxJcHfQmQyw1yw+5mjkVNUTA5qJ4G6ViZrRPm+yU56kvYHOZvLWrkME+Dg5gQ rjBLNeYujBan59JNAb1bqRx5YH3arxqzCvM9jCjicn/2/o24OWdIXPvkmtJTN63SSt/E nKGTefgHBDlbZva1rWbZkXp/eyXNSMo2hFCzY= MIME-Version: 1.0 Received: by 10.142.153.8 with SMTP id a8mr381059wfe.272.1280422899466; Thu, 29 Jul 2010 10:01:39 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.42.6.85 with HTTP; Thu, 29 Jul 2010 10:01:39 -0700 (PDT) Date: Thu, 29 Jul 2010 10:01:39 -0700 X-Google-Sender-Auth: QdJZyDriMqxt9m0PI0t2dIVklQs Message-ID: From: mdf@FreeBSD.org To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: dwmalone@maths.tcd.ie, alc@freebsd.org, iedowse@freebsd.org Subject: memguard(9) rewrite, part 2 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 17:01:40 -0000 Back in March I asked about interest in a memguard(9) redo. I've had the time to get the code to a place I'm pretty happy with, and we've successfully used it at work without running into some of the resource limitations that the original memguard(9) gave. http://people.freebsd.org/~mdf/bsd-memguard.diff The gist of the new implementation is to reserve a lot of KVA for memguard(9) to use, and then to avoid re-using KVA as long as possible. Rather than keep the physical pages around, though, on free(9) the pages are returned to the system. The KVA is allocated using vm_map_findspace() from a current pointer into the memguard_map, which is incremented until the end of the map is encountered, at which time it wraps. This is a "free" way to avoid re-use of KVA as long as possible; any other scheme requires more than O(1) data to track what has been used. I've limited the KVA to 2x ram size, and also limited the physical memory that memguard(9) can take to vm_memguard_divisor fraction of physical memory (instead of limiting both KVA and physical to vm_memguard_divisor as the original code did). This patch also allows for tweaking which malloc type is guarded at run time, will randomly guard allocations of any type if requested, has a knob to always guard allocations of PAGE_SIZE or larger since it won't waste any memory, will optionally add guard pages of unmapped KVA at the beginning and end of the allocation to catch overruns more easily, and also can impose minimum allocation sizes on guarded memory so that the page promotions don't waste too much space. Assuming alc@ is happy with the VM changes and no one has any further suggestions, I'd like to commit this some time next week. I'd also like to MFC to stable/8 and stable/7 since this patch doesn't introduce any KBI/ABI/KPI/API changes. Apart from the general desire to have production systems run as fast as possible, I'd really like more tools like memguard(9) to be always-on, to help catch bugs the first time instead of requiring multiple recreates. Thanks, matthew From owner-freebsd-arch@FreeBSD.ORG Thu Jul 29 22:27:18 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B07C91065676 for ; Thu, 29 Jul 2010 22:27:18 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from mail.icecube.wisc.edu (trout.icecube.wisc.edu [128.104.255.119]) by mx1.freebsd.org (Postfix) with ESMTP id 86CB88FC1F for ; Thu, 29 Jul 2010 22:27:18 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.icecube.wisc.edu (Postfix) with ESMTP id ABC1D582C7; Thu, 29 Jul 2010 17:27:17 -0500 (CDT) X-Virus-Scanned: amavisd-new at icecube.wisc.edu Received: from mail.icecube.wisc.edu ([127.0.0.1]) by localhost (trout.icecube.wisc.edu [127.0.0.1]) (amavisd-new, port 10030) with ESMTP id r3HHItGN-leA; Thu, 29 Jul 2010 17:27:17 -0500 (CDT) Received: from wanderer.tachypleus.net (adsl-75-50-88-235.dsl.mdsnwi.sbcglobal.net [75.50.88.235]) by mail.icecube.wisc.edu (Postfix) with ESMTP id 2F1DE582C4; Thu, 29 Jul 2010 17:27:17 -0500 (CDT) Message-ID: <4C520044.5020002@freebsd.org> Date: Fri, 30 Jul 2010 00:27:16 +0200 From: Nathan Whitehorn User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.11) Gecko/20100727 Thunderbird/3.0.6 MIME-Version: 1.0 To: Tijl Coosemans References: <201007291718.12687.tijl@coosemans.org> In-Reply-To: <201007291718.12687.tijl@coosemans.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: pluknet , freebsd-arch@freebsd.org Subject: Re: Support for cc -m32 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 22:27:18 -0000 On 07/29/10 17:18, Tijl Coosemans wrote: > Hi, > > I've put the initial version of some patches online to support cross > compilation of 32 bit binaries on amd64. It's modelled after how NetBSD > does this. > > With these patches something like "cc -m32 -o test test.c -pthread -lm" > generates a program that runs on FreeBSD/i386. > > http://people.freebsd.org/~tijl/cc-m32-1.diff > http://people.freebsd.org/~tijl/cc-m32-2.diff > http://people.freebsd.org/~tijl/cc-m32-3.diff > > *cc-m32-1.diff* : Let ld and cc find 32 bit libraries. > > *cc-m32-2.diff* : Install i386 headers on amd64. > Why not use the GCC multilib code for what patch 1 does? There is already code in cc_tools/Makefile to handle this for powerpc64 (where cc -m32 already works). -Nathan From owner-freebsd-arch@FreeBSD.ORG Fri Jul 30 08:53:30 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A4C61106567C for ; Fri, 30 Jul 2010 08:53:30 +0000 (UTC) (envelope-from tijl@coosemans.org) Received: from mailrelay004.isp.belgacom.be (mailrelay004.isp.belgacom.be [195.238.6.170]) by mx1.freebsd.org (Postfix) with ESMTP id 16E128FC19 for ; Fri, 30 Jul 2010 08:53:29 +0000 (UTC) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAAYwUkxbscBD/2dsb2JhbACgF3K+aYU5BA Received: from 67.192-177-91.adsl-dyn.isp.belgacom.be (HELO kalimero.tijl.coosemans.org) ([91.177.192.67]) by relay.skynet.be with ESMTP; 30 Jul 2010 10:53:28 +0200 Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org [127.0.0.1]) by kalimero.tijl.coosemans.org (8.14.4/8.14.4) with ESMTP id o6U8rRvK002422; Fri, 30 Jul 2010 10:53:28 +0200 (CEST) (envelope-from tijl@coosemans.org) From: Tijl Coosemans To: Nathan Whitehorn Date: Fri, 30 Jul 2010 10:53:17 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.1-PRERELEASE; KDE/4.4.5; i386; ; ) References: <201007291718.12687.tijl@coosemans.org> <4C520044.5020002@freebsd.org> In-Reply-To: <4C520044.5020002@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2641494.0OStBKXvGs"; protocol="application/pgp-signature"; micalg=pgp-sha256 Content-Transfer-Encoding: 7bit Message-Id: <201007301053.27407.tijl@coosemans.org> Cc: pluknet , freebsd-arch@freebsd.org Subject: Re: Support for cc -m32 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 08:53:30 -0000 --nextPart2641494.0OStBKXvGs Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Friday 30 July 2010 00:27:16 Nathan Whitehorn wrote: > On 07/29/10 17:18, Tijl Coosemans wrote: >> I've put the initial version of some patches online to support cross >> compilation of 32 bit binaries on amd64. It's modelled after how NetBSD >> does this. >> >> With these patches something like "cc -m32 -o test test.c -pthread -lm" >> generates a program that runs on FreeBSD/i386. >> >> http://people.freebsd.org/~tijl/cc-m32-1.diff >> http://people.freebsd.org/~tijl/cc-m32-2.diff >> http://people.freebsd.org/~tijl/cc-m32-3.diff >> >> *cc-m32-1.diff* : Let ld and cc find 32 bit libraries. > > Why not use the GCC multilib code for what patch 1 does? There is > already code in cc_tools/Makefile to handle this for powerpc64 (where > cc -m32 already works). Thanks, it's indeed better to specify this per architecture so I've updated the patch. It changes the output of -print-search-dirs though. With the previous patch "cc -m32 -print-search-dirs" printed: install: /usr/libexec/ programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/ libraries: =/usr/lib32/:/usr/lib32/ And now it prints: install: /usr/libexec/ programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/ libraries: =/usr/lib/32/:/usr/lib/../lib32/:/usr/lib/:/usr/lib/ That works, but it's not entirely correct. --nextPart2641494.0OStBKXvGs Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (FreeBSD) iF4EABEIAAYFAkxSkwcACgkQfoCS2CCgtisPOAD/QduCN05QUX07YjqhZfH3FTKc tCUmX/svoR98579BkDIA+wbjmTP5n5LnT7E3B6JktpYe9ByYjB8nL2rhBwH4s5SY =SAB4 -----END PGP SIGNATURE----- --nextPart2641494.0OStBKXvGs-- From owner-freebsd-arch@FreeBSD.ORG Fri Jul 30 12:11:17 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3443106566B for ; Fri, 30 Jul 2010 12:11:17 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from mail.icecube.wisc.edu (trout.icecube.wisc.edu [128.104.255.119]) by mx1.freebsd.org (Postfix) with ESMTP id 86B618FC08 for ; Fri, 30 Jul 2010 12:11:17 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.icecube.wisc.edu (Postfix) with ESMTP id B6EA6582C7; Fri, 30 Jul 2010 07:11:16 -0500 (CDT) X-Virus-Scanned: amavisd-new at icecube.wisc.edu Received: from mail.icecube.wisc.edu ([127.0.0.1]) by localhost (trout.icecube.wisc.edu [127.0.0.1]) (amavisd-new, port 10030) with ESMTP id ppjpAzczgcxs; Fri, 30 Jul 2010 07:11:16 -0500 (CDT) Received: from wanderer.tachypleus.net (adsl-75-50-88-235.dsl.mdsnwi.sbcglobal.net [75.50.88.235]) by mail.icecube.wisc.edu (Postfix) with ESMTP id 297DD582C2; Fri, 30 Jul 2010 07:11:16 -0500 (CDT) Message-ID: <4C52C163.9010601@freebsd.org> Date: Fri, 30 Jul 2010 14:11:15 +0200 From: Nathan Whitehorn User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.11) Gecko/20100727 Thunderbird/3.0.6 MIME-Version: 1.0 To: Tijl Coosemans References: <201007291718.12687.tijl@coosemans.org> <4C520044.5020002@freebsd.org> <201007301053.27407.tijl@coosemans.org> In-Reply-To: <201007301053.27407.tijl@coosemans.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: pluknet , freebsd-arch@freebsd.org Subject: Re: Support for cc -m32 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 12:11:17 -0000 On 07/30/10 10:53, Tijl Coosemans wrote: > On Friday 30 July 2010 00:27:16 Nathan Whitehorn wrote: > >> On 07/29/10 17:18, Tijl Coosemans wrote: >> >>> I've put the initial version of some patches online to support cross >>> compilation of 32 bit binaries on amd64. It's modelled after how NetBSD >>> does this. >>> >>> With these patches something like "cc -m32 -o test test.c -pthread -lm" >>> generates a program that runs on FreeBSD/i386. >>> >>> http://people.freebsd.org/~tijl/cc-m32-1.diff >>> http://people.freebsd.org/~tijl/cc-m32-2.diff >>> http://people.freebsd.org/~tijl/cc-m32-3.diff >>> >>> *cc-m32-1.diff* : Let ld and cc find 32 bit libraries. >>> >> Why not use the GCC multilib code for what patch 1 does? There is >> already code in cc_tools/Makefile to handle this for powerpc64 (where >> cc -m32 already works). >> > Thanks, it's indeed better to specify this per architecture so I've > updated the patch. It changes the output of -print-search-dirs though. > > With the previous patch "cc -m32 -print-search-dirs" printed: > > install: /usr/libexec/ > programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/ > libraries: =/usr/lib32/:/usr/lib32/ > > And now it prints: > > install: /usr/libexec/ > programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/ > libraries: =/usr/lib/32/:/usr/lib/../lib32/:/usr/lib/:/usr/lib/ > > That works, but it's not entirely correct. > That's just an artifact of the way multilib works, I'm afraid. Is there a reason it could be harmful? -Nathan From owner-freebsd-arch@FreeBSD.ORG Fri Jul 30 19:20:17 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 530A0106566B; Fri, 30 Jul 2010 19:20:17 +0000 (UTC) (envelope-from alc@cs.rice.edu) Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31]) by mx1.freebsd.org (Postfix) with ESMTP id 28D0A8FC0A; Fri, 30 Jul 2010 19:20:17 +0000 (UTC) Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1]) by mail.cs.rice.edu (Postfix) with ESMTP id A7F142C2ACE; Fri, 30 Jul 2010 13:50:09 -0500 (CDT) X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu Received: from mail.cs.rice.edu ([127.0.0.1]) by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id RJ6E4GYb6DLF; Fri, 30 Jul 2010 13:50:02 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cs.rice.edu (Postfix) with ESMTP id 9F7332C2B32; Fri, 30 Jul 2010 13:50:00 -0500 (CDT) Message-ID: <4C531ED7.9010601@cs.rice.edu> Date: Fri, 30 Jul 2010 13:49:59 -0500 From: Alan Cox User-Agent: Thunderbird 2.0.0.24 (X11/20100501) MIME-Version: 1.0 To: John Baldwin References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> <201007270935.52082.jhb@freebsd.org> In-Reply-To: <201007270935.52082.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: alc@freebsd.org, Matthew Fleming , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 19:20:17 -0000 John Baldwin wrote: > On Monday, July 26, 2010 3:30:59 pm Alan Cox wrote: > >> As far as eliminating or reducing the manual tuning that many ZFS users do, >> I would love to see someone tackle the overly conservative hard limit that >> we place on the number of vnode structures. The current hard limit was put >> in place when we had just introduced mutexes into many structures and more a >> mutex was much larger than it is today. >> > > I took a look at the history of the "desiredvnodes" computation. Prior to r115266, in May of 2003, the computation was based on physical memory and there was no MAXVNODES_MAX limit. It was simply: desiredvnodes = maxproc + cnt.v_page_count / 4; r115266 introduced the min() that also took into account the virtual address space limit on the heap. As I recall, it was to stop "kmem_map too small" panics. In fact, I was asked to make this change by re@. Finally, in August 2004, r133038, introduced MAXVNODES_MAX. The commit message doesn't say, but I think the motivation was again to stop "kmem_map too small" panics. In effect, the virtual address space limit introduced by r115266 wasn't working. Enough history, here are some data points for the "desiredvnodes" computation on amd64 and i386 above and below the point where MAXVNODES_MAX has an effect. "phys" is the number of vnodes that would be allowed based upon physical memory size, and "virt" is the number of vnodes that would be allowed based upon virtual memory size. amd64: 2GB phys: 132668 virt: 397057 1.5GB phys: 100862 virt: 297228 1GB phys: 69056 virt: 197398 512MB phys: 35106 virt: 97569 i386: 2GB phys: 134106 virt: 328965 1.5GB phys: 101916 virt: 328965 1GB phys: 69725 virt: 328965 512MB phys: 35576 virt: 168875 For both architectures, the "phys" limit is the limiting factor until we reach about 1.5GB of physical memory. MAXVNODES_MAX is only a factor machines on machines with more than 1.5GB of RAM. So, whatever change we might make to MAXVNODES_MAX shouldn't affect the small embedded systems that are running FreeBSD. Even though "virt" is never a factor on amd64, it's worth noticing that in both absolute and relative terms "virt" grows faster than "phys". On i386, "virt" starts out larger than on amd64 because a vnode and a vm_object are smaller relative to vm_kmem_size, but "virt" reaches its maximum by 1GB of RAM because vm_kmem_size has already reached its maximum by then. Nonetheless, even on i386, "virt" is never a factor. (For what it's worth, if I extrapolate, an i386/PAE machine with greater than 5GB of RAM will have a larger "phys" than "virt".) > I have a strawman of that (relative to 7). It simply adjusts the hardcoded > maximum to instead be a function of the amount of physical memory. > > Unless I'm misreading this patch, it would allow "desiredvnodes" to grow (slowly) on i386/PAE starting at 5GB of RAM until we reach the (too high) "virt" limit of about 329,000. Yes? For example, an 8GB i386/PAE machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and it would not stop there. I think that we should be concerned about that, because MAXVNODE_MAX came about because the "virt" limit wasn't working. As the numbers above show, we could more than halve the growth rate for "virt" and it would have no effect on either amd64 or i386 machines with up to 1.5GB of RAM. They would have just as many vnodes. Then, with that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at least configure it to some absurdly large value), thereby relieving the fixed cap on amd64, where it isn't needed. With that in mind, the following patch slows the growth of "virt" from 2/5 of vm_kmem_size to 1/7. This has no effect on amd64. However, on i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by about 17%. Once we exceed the old cap, we increase desiredvnodes at a marginal rate that is almost the same as your patch, about 1% of physical memory. It's just computed differently. Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose about 7% of their vnodes, but they catch up and pass the old limit by 1.625GB. Perhaps, more importantly, i386 machines only exceed the old cap by 3%. Thoughts? Index: kern/vfs_subr.c =================================================================== --- kern/vfs_subr.c (revision 210504) +++ kern/vfs_subr.c (working copy) @@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA * Initialize the vnode management data structures. */ #ifndef MAXVNODES_MAX -#define MAXVNODES_MAX 100000 +#define MAXVNODES_MAX 8388608 /* Reevaluate when physmem exceeds 512GB. */ #endif static void vntblinit(void *dummy __unused) { + int physvnodes, virtvnodes; /* - * Desiredvnodes is a function of the physical memory size and - * the kernel's heap size. Specifically, desiredvnodes scales - * in proportion to the physical memory size until two fifths - * of the kernel's heap size is consumed by vnodes and vm - * objects. + * Desiredvnodes is a function of the physical memory size and the + * kernel's heap size. Generally speaking, it scales with the + * physical memory size. The ratio of desiredvnodes to physical pages + * is one to four until desiredvnodes exceeds 96K. Thereafter, the + * marginal ratio of desiredvnodes to physical pages is one to sixteen. + * However, desiredvnodes is limited by the kernel's heap size. The + * memory required by desiredvnodes vnodes and vm objects may not + * exceed one seventh of the kernel's heap size. */ - desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size / - (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); + physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(393216, + cnt.v_page_count) / 16; + virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) + + sizeof(struct vnode))); + printf("physvnodes = %d\nvirtvnodes = %d\n", physvnodes, virtvnodes); + desiredvnodes = min(physvnodes, virtvnodes); if (desiredvnodes > MAXVNODES_MAX) { if (bootverbose) printf("Reducing kern.maxvnodes %d -> %d\n", > Index: vfs_subr.c > =================================================================== > --- vfs_subr.c (revision 210934) > +++ vfs_subr.c (working copy) > @@ -288,6 +288,7 @@ > static void > vntblinit(void *dummy __unused) > { > + int vnodes; > > /* > * Desiredvnodes is a function of the physical memory size and > @@ -299,10 +300,19 @@ > desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size / > (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); > if (desiredvnodes > MAXVNODES_MAX) { > + > + /* > + * If there is a lot of physical memory, allow the cap > + * on vnodes to expand to using a little under 1% of > + * available RAM. > + */ > + vnodes = max(MAXVNODES_MAX, cnt.v_page_count * (PAGE_SIZE / > + 128) / (sizeof(struct vm_object) + sizeof(struct vnode))); > + KASSERT(vnodes < desiredvnodes, ("capped vnodes too big")); > if (bootverbose) > printf("Reducing kern.maxvnodes %d -> %d\n", > - desiredvnodes, MAXVNODES_MAX); > - desiredvnodes = MAXVNODES_MAX; > + desiredvnodes, vnodes); > + desiredvnodes = vnodes; > } > wantfreevnodes = desiredvnodes / 4; > mtx_init(&mntid_mtx, "mntid", NULL, MTX_DEF); > > From owner-freebsd-arch@FreeBSD.ORG Fri Jul 30 21:19:08 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 583211065677; Fri, 30 Jul 2010 21:19:08 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 271C68FC0A; Fri, 30 Jul 2010 21:19:08 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 2876946B7F; Fri, 30 Jul 2010 17:19:07 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 250A18A03C; Fri, 30 Jul 2010 17:19:06 -0400 (EDT) From: John Baldwin To: Alan Cox Date: Fri, 30 Jul 2010 16:14:40 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <4C4DB2B8.9080404@freebsd.org> <201007270935.52082.jhb@freebsd.org> <4C531ED7.9010601@cs.rice.edu> In-Reply-To: <4C531ED7.9010601@cs.rice.edu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201007301614.40768.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 30 Jul 2010 17:19:06 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: alc@freebsd.org, Matthew Fleming , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 21:19:08 -0000 On Friday, July 30, 2010 2:49:59 pm Alan Cox wrote: > John Baldwin wrote: > > I have a strawman of that (relative to 7). It simply adjusts the hardcoded > > maximum to instead be a function of the amount of physical memory. > > > > > > Unless I'm misreading this patch, it would allow "desiredvnodes" to grow > (slowly) on i386/PAE starting at 5GB of RAM until we reach the (too > high) "virt" limit of about 329,000. Yes? For example, an 8GB i386/PAE > machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and > it would not stop there. I think that we should be concerned about > that, because MAXVNODE_MAX came about because the "virt" limit wasn't > working. Agreed. > As the numbers above show, we could more than halve the growth rate for > "virt" and it would have no effect on either amd64 or i386 machines with > up to 1.5GB of RAM. They would have just as many vnodes. Then, with > that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at > least configure it to some absurdly large value), thereby relieving the > fixed cap on amd64, where it isn't needed. > > With that in mind, the following patch slows the growth of "virt" from > 2/5 of vm_kmem_size to 1/7. This has no effect on amd64. However, on > i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to > about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by > about 17%. Once we exceed the old cap, we increase desiredvnodes at a > marginal rate that is almost the same as your patch, about 1% of > physical memory. It's just computed differently. > > Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose > about 7% of their vnodes, but they catch up and pass the old limit by > 1.625GB. Perhaps, more importantly, i386 machines only exceed the old > cap by 3%. > > Thoughts? I think this is much better. My strawman was rather hackish in that it was layering a hack on top of the existing calculations. I prefer your approach. I do not think penalizing amd64 machines with less than 1.5GB is a big worry as most x86 machines with a small amount of memory are probably running as i386 anyway. Given that, I would probably lean towards 1/8 instead of 1/7, but I would be happy with either one. > Index: kern/vfs_subr.c > =================================================================== > --- kern/vfs_subr.c (revision 210504) > +++ kern/vfs_subr.c (working copy) > @@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA > * Initialize the vnode management data structures. > */ > #ifndef MAXVNODES_MAX > -#define MAXVNODES_MAX 100000 > +#define MAXVNODES_MAX 8388608 /* Reevaluate when physmem > exceeds 512GB. */ > #endif How is this value computed? I would prefer something like: '512 * 1024 * 1024 * 1024 / (sizeof(struct vnode) + sizeof(struct vm_object) / N' if that is how it is computed. A brief note about the magic number of 393216 would also be nice to have (and if it could be a constant with a similar formula value that would be nice, too.). > static void > vntblinit(void *dummy __unused) > { > + int physvnodes, virtvnodes; > > /* > - * Desiredvnodes is a function of the physical memory size and > - * the kernel's heap size. Specifically, desiredvnodes scales > - * in proportion to the physical memory size until two fifths > - * of the kernel's heap size is consumed by vnodes and vm > - * objects. > + * Desiredvnodes is a function of the physical memory size and the > + * kernel's heap size. Generally speaking, it scales with the > + * physical memory size. The ratio of desiredvnodes to physical > pages > + * is one to four until desiredvnodes exceeds 96K. Thereafter, the > + * marginal ratio of desiredvnodes to physical pages is one to > sixteen. > + * However, desiredvnodes is limited by the kernel's heap size. The > + * memory required by desiredvnodes vnodes and vm objects may not > + * exceed one seventh of the kernel's heap size. > */ > - desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * > vm_kmem_size / > - (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); > + physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(393216, > + cnt.v_page_count) / 16; > + virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) + > + sizeof(struct vnode))); > + printf("physvnodes = %d\nvirtvnodes = %d\n", physvnodes, > virtvnodes); > + desiredvnodes = min(physvnodes, virtvnodes); > if (desiredvnodes > MAXVNODES_MAX) { > if (bootverbose) > printf("Reducing kern.maxvnodes %d -> %d\n", > > -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Sat Jul 31 05:36:32 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 335271065676 for ; Sat, 31 Jul 2010 05:36:32 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.freebsd.org (Postfix) with ESMTP id B879F8FC18 for ; Sat, 31 Jul 2010 05:36:31 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c211-30-160-13.belrs4.nsw.optusnet.com.au [211.30.160.13]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o6V5aOWb018725 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 31 Jul 2010 15:36:26 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o6V5aNqc027887; Sat, 31 Jul 2010 15:36:23 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o6V5aLrS027886; Sat, 31 Jul 2010 15:36:21 +1000 (EST) (envelope-from peter) Date: Sat, 31 Jul 2010 15:36:21 +1000 From: Peter Jeremy To: Tijl Coosemans Message-ID: <20100731053621.GA27772@server.vk2pj.dyndns.org> References: <201007291718.12687.tijl@coosemans.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ReaqsoxgOBHFXBhH" Content-Disposition: inline In-Reply-To: <201007291718.12687.tijl@coosemans.org> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: pluknet , freebsd-arch@freebsd.org Subject: Re: Support for cc -m32 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2010 05:36:32 -0000 --ReaqsoxgOBHFXBhH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Jul-29 17:18:03 +0200, Tijl Coosemans wrote: >I've put the initial version of some patches online to support cross >compilation of 32 bit binaries on amd64. It's modelled after how NetBSD >does this. I presume you are aware of gnu/112215 (and maybe others). >With these patches something like "cc -m32 -o test test.c -pthread -lm" >generates a program that runs on FreeBSD/i386. That's an improvement on my patches (in 112215) - they resulted in the i386 binaries having references to /libexec/ld-elf32.so.1 and /usr/lib32/*.so - so they would run in i386 compatibility mode on amd64 but not on native i386. >Feel free to test the patches and to comment on any part of them. I hope to get some time to do this in a few days. --=20 Peter Jeremy --ReaqsoxgOBHFXBhH Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (FreeBSD) iEYEARECAAYFAkxTtlUACgkQ/opHv/APuIfUPwCgq6fgWy1GnMAcCZzFSW/CqvoR 5zMAn0JcWX8kNLllX+WA9oQsijaUanNP =r5tm -----END PGP SIGNATURE----- --ReaqsoxgOBHFXBhH-- From owner-freebsd-arch@FreeBSD.ORG Sat Jul 31 21:39:50 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 721D21065670; Sat, 31 Jul 2010 21:39:50 +0000 (UTC) (envelope-from alc@cs.rice.edu) Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31]) by mx1.freebsd.org (Postfix) with ESMTP id 3BE2C8FC1E; Sat, 31 Jul 2010 21:39:49 +0000 (UTC) Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1]) by mail.cs.rice.edu (Postfix) with ESMTP id 825672C2B32; Sat, 31 Jul 2010 16:39:49 -0500 (CDT) X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu Received: from mail.cs.rice.edu ([127.0.0.1]) by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id bt9g5GN86489; Sat, 31 Jul 2010 16:39:41 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cs.rice.edu (Postfix) with ESMTP id 579342C2ACA; Sat, 31 Jul 2010 16:39:41 -0500 (CDT) Message-ID: <4C54981B.9080209@cs.rice.edu> Date: Sat, 31 Jul 2010 16:39:39 -0500 From: Alan Cox User-Agent: Thunderbird 2.0.0.24 (X11/20100501) MIME-Version: 1.0 To: John Baldwin References: <4C4DB2B8.9080404@freebsd.org> <201007270935.52082.jhb@freebsd.org> <4C531ED7.9010601@cs.rice.edu> <201007301614.40768.jhb@freebsd.org> In-Reply-To: <201007301614.40768.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: alc@freebsd.org, freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2010 21:39:50 -0000 John Baldwin wrote: > On Friday, July 30, 2010 2:49:59 pm Alan Cox wrote: > >> John Baldwin wrote: >> >>> I have a strawman of that (relative to 7). It simply adjusts the hardcoded >>> maximum to instead be a function of the amount of physical memory. >>> >>> >>> >> Unless I'm misreading this patch, it would allow "desiredvnodes" to grow >> (slowly) on i386/PAE starting at 5GB of RAM until we reach the (too >> high) "virt" limit of about 329,000. Yes? For example, an 8GB i386/PAE >> machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and >> it would not stop there. I think that we should be concerned about >> that, because MAXVNODE_MAX came about because the "virt" limit wasn't >> working. >> > > Agreed. > > >> As the numbers above show, we could more than halve the growth rate for >> "virt" and it would have no effect on either amd64 or i386 machines with >> up to 1.5GB of RAM. They would have just as many vnodes. Then, with >> that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at >> least configure it to some absurdly large value), thereby relieving the >> fixed cap on amd64, where it isn't needed. >> >> With that in mind, the following patch slows the growth of "virt" from >> 2/5 of vm_kmem_size to 1/7. This has no effect on amd64. However, on >> i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to >> about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by >> about 17%. Once we exceed the old cap, we increase desiredvnodes at a >> marginal rate that is almost the same as your patch, about 1% of >> physical memory. It's just computed differently. >> >> Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose >> about 7% of their vnodes, but they catch up and pass the old limit by >> 1.625GB. Perhaps, more importantly, i386 machines only exceed the old >> cap by 3%. >> >> Thoughts? >> > > I think this is much better. My strawman was rather hackish in that it was > layering a hack on top of the existing calculations. I prefer your approach. > I do not think penalizing amd64 machines with less than 1.5GB is a big worry > as most x86 machines with a small amount of memory are probably running as > i386 anyway. Given that, I would probably lean towards 1/8 instead of 1/7, > but I would be happy with either one. > > I've looked a bit at an i386/PAE system with 8GB. I don't think that a default configuration, e.g., no changes to the mbuf limits, is at risk with 1/7. >> Index: kern/vfs_subr.c >> =================================================================== >> --- kern/vfs_subr.c (revision 210504) >> +++ kern/vfs_subr.c (working copy) >> @@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA >> * Initialize the vnode management data structures. >> */ >> #ifndef MAXVNODES_MAX >> -#define MAXVNODES_MAX 100000 >> +#define MAXVNODES_MAX 8388608 /* Reevaluate when physmem >> exceeds 512GB. */ >> #endif >> > > How is this value computed? I would prefer something like: > > '512 * 1024 * 1024 * 1024 / (sizeof(struct vnode) + sizeof(struct vm_object) / N' > > if that is how it is computed. A brief note about the magic number of 393216 > would also be nice to have (and if it could be a constant with a similar > formula value that would be nice, too.). > > I've tried to explain this computation below. Index: kern/vfs_subr.c =================================================================== --- kern/vfs_subr.c (revision 210702) +++ kern/vfs_subr.c (working copy) @@ -282,23 +282,34 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA /* * Initialize the vnode management data structures. + * + * Reevaluate the following cap on the number of vnodes after the physical + * memory size exceeds 512GB. In the limit, as the physical memory size + * grows, the ratio of physical pages to vnodes approaches sixteen to one. */ #ifndef MAXVNODES_MAX -#define MAXVNODES_MAX 100000 +#define MAXVNODES_MAX (512 * (1024 * 1024 * 1024 / PAGE_SIZE / 16)) #endif static void vntblinit(void *dummy __unused) { + int physvnodes, virtvnodes; /* - * Desiredvnodes is a function of the physical memory size and - * the kernel's heap size. Specifically, desiredvnodes scales - * in proportion to the physical memory size until two fifths - * of the kernel's heap size is consumed by vnodes and vm - * objects. + * Desiredvnodes is a function of the physical memory size and the + * kernel's heap size. Generally speaking, it scales with the + * physical memory size. The ratio of desiredvnodes to physical pages + * is one to four until desiredvnodes exceeds 98,304. Thereafter, the + * marginal ratio of desiredvnodes to physical pages is one to + * sixteen. However, desiredvnodes is limited by the kernel's heap + * size. The memory required by desiredvnodes vnodes and vm objects + * may not exceed one seventh of the kernel's heap size. */ - desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size / - (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); + physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(98304 * 4, + cnt.v_page_count) / 16; + virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) + + sizeof(struct vnode))); + desiredvnodes = min(physvnodes, virtvnodes); if (desiredvnodes > MAXVNODES_MAX) { if (bootverbose) printf("Reducing kern.maxvnodes %d -> %d\n",