From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 5 14:37:08 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D31AF6C2 for ; Tue, 5 Feb 2013 14:37:08 +0000 (UTC) (envelope-from drbaud@yahoo.com) Received: from nm26-vm0.bullet.mail.bf1.yahoo.com (nm26-vm0.bullet.mail.bf1.yahoo.com [98.139.213.74]) by mx1.freebsd.org (Postfix) with ESMTP id 812BED4F for ; Tue, 5 Feb 2013 14:37:07 +0000 (UTC) Received: from [98.139.215.143] by nm26.bullet.mail.bf1.yahoo.com with NNFMP; 05 Feb 2013 14:37:05 -0000 Received: from [98.139.212.200] by tm14.bullet.mail.bf1.yahoo.com with NNFMP; 05 Feb 2013 14:37:05 -0000 Received: from [127.0.0.1] by omp1009.mail.bf1.yahoo.com with NNFMP; 05 Feb 2013 14:37:05 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 639609.97030.bm@omp1009.mail.bf1.yahoo.com Received: (qmail 72292 invoked by uid 60001); 5 Feb 2013 14:37:05 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1360075025; bh=c3hyC4UdGgk8ckWlrGVZBUPeVAOcmrofG1xSWWrZIGE=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=WkxAIXnZu8hISE1oKUIytsDpV5NaqMLaXBOnSTUpd3MqFKdlR1aTW4SLPCyXGgCH9KFGfkQG/NcLKJf6QSs7ubQA1vSTZl6OfObpcWe0Kbqz3xmrtegQ468cM1z86cWSkCjIZr69QIH45HMzBlttghUQMyZjsBqtgae3Za749nk= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=MK9doxxtcYLNqKB3W5KVyqQwszmgOD+AMcl+mXhdjM0VQwrO4Brl6q4/hVOJ2oSDF4DTvzp6uGh+0YKw4IUACQJFtaCV//W4LPOKhOXJhGZ1cmJY1nIe4r1uQcBo0rSakCRX4qfwzdpHpM96gzMtKHqjlpLeHPv4/rZlwKmPlsY=; X-YMail-OSG: jRkGuXAVM1l3gD9D8qMlRpU9zZAwF2.HgkOiKQUNT7Kdl.j z6QbNw2lMGufrFH.Tr85kVR0_NP0kYaiWF1G0OlQ4Z60ZtTZg9P3I89Thb_S mGy8lYO_X40Oat2iKIwx0fi8qIb1FAw1jomK48.e6MSkTQkqdEytO6PoqbD6 aLvzGhpaq_tXIIuIroSgGDkIl5KtcMphR1wpkI2ssAgR4S3U80GLXukBEVYc uCkjiHrvz1V_N9mNvuWAZTeLmyjoC7bOYOYZgHx_y3VldLHqQqhj3wHt1vns dDTnPmsQdYJn_soeso56vZEfTCprlxoEC.oe54RxVO.MCDbsyb5ueNhryAhv .DhZCF4pz2xTBJ.kytdy3WIWjGBOCfrP8I2618d_kZ2T4zWanufslIbcXCll Woj_LDw2n76xbbWSVtJh0cX7P7t4IP0Ik7_Z2zWQesBB5a5nKLA8mbUp_Tud dc7YzRm_xl2hsZrjtEZApmXz1wikYxHECba6xM6myezXgNOAe6hI- Received: from [64.238.244.146] by web142505.mail.bf1.yahoo.com via HTTP; Tue, 05 Feb 2013 06:37:05 PST X-Rocket-MIMEInfo: 001.001, QWxsLAoKwqDCoMKgwqAgQW55b25lIHVzZSBtdXRleF9vd25lciBpbiBhIGR0cmFjZSBzY3JpcHQsIGFzIHRoZSBvYnZpb3VzIGRvZXMgbm90IHdvcmsgZm9yIG1lOgoKwqDCoMKgIENvbnRlbnQgb2Ygc3Bpbi5kOgoKIyEvdXNyL3NiaW4vZHRyYWNlIC1xcwoKOjo6KnNwaW4KewpzZWxmLT5tdXRleCA9IChrbXV0ZXhfdCAqKSBhcmcwOwpzZWxmLT5tdXRleF9vd25lciA9IG11dGV4X293bmVyKChrbXV0ZXhfdCAqKSA6c2VsZi0.bXV0ZXgpOwp9CgoKCiMgZHRyYWNlIC1zIHNwaW4uZApkdHJhY2U6IGZhaWxlZCABMAEBAQE- X-Mailer: YahooMailWebService/0.8.132.503 Message-ID: <1360075025.71615.YahooMailNeo@web142505.mail.bf1.yahoo.com> Date: Tue, 5 Feb 2013 06:37:05 -0800 (PST) From: "Dr. Baud" Subject: mutex_owner To: "freebsd-hackers@freebsd.org" MIME-Version: 1.0 X-Mailman-Approved-At: Tue, 05 Feb 2013 16:36:17 +0000 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: "Dr. Baud" List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 14:37:08 -0000 All,=0A=0A=A0=A0=A0=A0 Anyone use mutex_owner in a dtrace script, as the ob= vious does not work for me:=0A=0A=A0=A0=A0 Content of spin.d:=0A=0A#!/usr/s= bin/dtrace -qs=0A=0A:::*spin=0A{=0Aself->mutex =3D (kmutex_t *) arg0;=0Asel= f->mutex_owner =3D mutex_owner((kmutex_t *) :self->mutex);=0A}=0A=0A=0A=0A#= dtrace -s spin.d=0Adtrace: failed to compile script spin.d: line 5: mutex_= owner( ) argument #1 is i=0Ancompatible with prototype:=0A=A0=A0=A0=A0=A0= =A0=A0 prototype: struct mtx *=0A=A0=A0=A0=A0=A0=A0=A0=A0 argument: kmutex_= t *=0A=0A=0A=A0=A0=A0 Dr. From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 5 16:38:51 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2C9A755E; Tue, 5 Feb 2013 16:38:51 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id 04DBA7C6; Tue, 5 Feb 2013 16:38:50 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 008B44C0665; Tue, 5 Feb 2013 10:38:44 -0600 (CST) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id F28994C0653; Tue, 5 Feb 2013 10:38:43 -0600 (CST) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id 1FBFauo7Y3fz; Tue, 5 Feb 2013 10:38:43 -0600 (CST) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id 311A94C0633; Tue, 5 Feb 2013 10:38:43 -0600 (CST) Message-ID: <51113591.8050709@rice.edu> Date: Tue, 05 Feb 2013 10:38:41 -0600 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130127 Thunderbird/17.0.2 MIME-Version: 1.0 To: mdf@FreeBSD.org Subject: Re: dynamically calculating NKPT [was: Re: huge ktr buffer] References: <20130205151413.GL2522@kib.kiev.ua> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: davide@freebsd.org, alc@freebsd.org, avg@freebsd.org, rank1seeker@gmail.com, hackers@freebsd.org, Konstantin Belousov , Neel Natu X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Feb 2013 16:38:51 -0000 On 02/05/2013 09:45, mdf@FreeBSD.org wrote: > On Tue, Feb 5, 2013 at 7:14 AM, Konstantin Belousov wrote: >> On Mon, Feb 04, 2013 at 03:05:15PM -0800, Neel Natu wrote: >>> Hi, >>> >>> I have a patch to dynamically calculate NKPT for amd64 kernels. This >>> should fix the various issues that people pointed out in the email >>> thread. >>> >>> Please review and let me know if there are any objections to committing this. >>> >>> Also, thanks to Alan (alc@) for reviewing and providing feedback on >>> the initial version of the patch. >>> >>> Patch (also available at http://people.freebsd.org/~neel/patches/nkpt_diff.txt): >>> >>> Index: sys/amd64/include/pmap.h >>> =================================================================== >>> --- sys/amd64/include/pmap.h (revision 246277) >>> +++ sys/amd64/include/pmap.h (working copy) >>> @@ -113,13 +113,7 @@ >>> ((unsigned long)(l2) << PDRSHIFT) | \ >>> ((unsigned long)(l1) << PAGE_SHIFT)) >>> >>> -/* Initial number of kernel page tables. */ >>> -#ifndef NKPT >>> -#define NKPT 32 >>> -#endif >>> - >>> #define NKPML4E 1 /* number of kernel PML4 slots */ >>> -#define NKPDPE howmany(NKPT, NPDEPG)/* number of kernel PDP slots */ >>> >>> #define NUPML4E (NPML4EPG/2) /* number of userland PML4 pages */ >>> #define NUPDPE (NUPML4E*NPDPEPG)/* number of userland PDP pages */ >>> @@ -181,6 +175,7 @@ >>> #define PML4map ((pd_entry_t *)(addr_PML4map)) >>> #define PML4pml4e ((pd_entry_t *)(addr_PML4pml4e)) >>> >>> +extern int nkpt; /* Initial number of kernel page tables */ >>> extern u_int64_t KPDPphys; /* physical address of kernel level 3 */ >>> extern u_int64_t KPML4phys; /* physical address of kernel level 4 */ >>> >>> Index: sys/amd64/amd64/minidump_machdep.c >>> =================================================================== >>> --- sys/amd64/amd64/minidump_machdep.c (revision 246277) >>> +++ sys/amd64/amd64/minidump_machdep.c (working copy) >>> @@ -232,7 +232,7 @@ >>> /* Walk page table pages, set bits in vm_page_dump */ >>> pmapsize = 0; >>> pdp = (uint64_t *)PHYS_TO_DMAP(KPDPphys); >>> - for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + NKPT * NBPDR, >>> + for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + nkpt * NBPDR, >>> kernel_vm_end); ) { >>> /* >>> * We always write a page, even if it is zero. Each >>> @@ -364,7 +364,7 @@ >>> /* Dump kernel page directory pages */ >>> bzero(fakepd, sizeof(fakepd)); >>> pdp = (uint64_t *)PHYS_TO_DMAP(KPDPphys); >>> - for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + NKPT * NBPDR, >>> + for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + nkpt * NBPDR, >>> kernel_vm_end); va += NBPDP) { >>> i = (va >> PDPSHIFT) & ((1ul << NPDPEPGSHIFT) - 1); >>> >>> Index: sys/amd64/amd64/pmap.c >>> =================================================================== >>> --- sys/amd64/amd64/pmap.c (revision 246277) >>> +++ sys/amd64/amd64/pmap.c (working copy) >>> @@ -202,6 +202,10 @@ >>> vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ >>> vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ >>> >>> +int nkpt; >>> +SYSCTL_INT(_machdep, OID_AUTO, nkpt, CTLFLAG_RD, &nkpt, 0, >>> + "Number of kernel page table pages allocated on bootup"); >>> + >>> static int ndmpdp; >>> static vm_paddr_t dmaplimit; >>> vm_offset_t kernel_vm_end = VM_MIN_KERNEL_ADDRESS; >>> @@ -495,17 +499,42 @@ >>> >>> CTASSERT(powerof2(NDMPML4E)); >>> >>> +/* number of kernel PDP slots */ >>> +#define NKPDPE(ptpgs) howmany((ptpgs), NPDEPG) >>> + >>> static void >>> +nkpt_init(vm_paddr_t addr) >>> +{ >>> + int pt_pages; >>> + >>> +#ifdef NKPT >>> + pt_pages = NKPT; >>> +#else >>> + pt_pages = howmany(addr, 1 << PDRSHIFT); >>> + pt_pages += NKPDPE(pt_pages); >>> + >>> + /* >>> + * Add some slop beyond the bare minimum required for bootstrapping >>> + * the kernel. >>> + * >>> + * This is quite important when allocating KVA for kernel modules. >>> + * The modules are required to be linked in the negative 2GB of >>> + * the address space. If we run out of KVA in this region then >>> + * pmap_growkernel() will need to allocate page table pages to map >>> + * the entire 512GB of KVA space which is an unnecessary tax on >>> + * physical memory. >>> + */ >>> + pt_pages += 4; /* 8MB additional slop for kernel modules */ >> 8MB might be to low. I just checked one of my machines with fully >> modularized kernel, it takes slightly more than 6 MB to load 50 modules. >> I think that 16MB would be safer, but it probably needs to be scaled >> down based on the available phys memory. amd64 kernel could be booted >> on 128MB machine still. > Is there no way to not map the entire 512GB? Otherwise this patch > could really hose some vendors. E.g. the kernel module for the OneFS > file system is around 8MB all by itself. Mapping the entire 512 GB from the start would require the preallocation of 1 GB of memory for page table pages. > I found when we moved from FreeBSD 6 to 7 that the NKPT of 32 was > insufficient for our system to even boot so I put it back to 240 (I > didn't want to spend a lot of time playing). At that time our module > was loaded by the boot loader; now we do it during init to save some > seconds on boot. But we're probably not the only ones with a large > kernel module. This patch should make life easier for people who are loading modules through the boot loader. It will account for the size of these modules in sizing NKPT (or now nkpt).