From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 20 17:57:08 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C36CE2CA for ; Thu, 20 Nov 2014 17:57:08 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9E382F14 for ; Thu, 20 Nov 2014 17:57:08 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 847D5B923; Thu, 20 Nov 2014 12:57:07 -0500 (EST) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: 1 gig superpages Date: Thu, 20 Nov 2014 11:18:55 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201411201118.56050.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 20 Nov 2014 12:57:07 -0500 (EST) Cc: Sebastian Kuzminsky X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 17:57:08 -0000 On Monday, November 10, 2014 11:15:09 am Sebastian Kuzminsky wrote: > Hello hackers, I'm announcing the availability of a branch adding support for 1 GB superpages to FreeBSD. > > https://github.com/Seb-LineRate/freebsd/commits/seb/stable-10/1-gig-pages > > The branch is based on work done by Line Rate Systems and F5 Networks, and used in our LROS load-balancing product. > > Our product is based on FreeBSD 9.1; the branch I linked to above is our 1 gig page support rebased onto stable/10. I probably messed something up in the rebase, as lots of things changed both in pmap and vm since 9.1. There are also a handful of commits that i haven't gotten to yet, but they are less consequential - just performance improvements to the buddy allocator. I hope to push those over the next few days. > > It should be relatively easy to rebase the branch onto Current. > > > This is a work in progress, and I would appreciate feedback and comments. So my initial thoughts from having looked at this very briefly is that this is a bit hackish. In particular, the reservation system already supports multiple levels of reservations so that you could have a separate reservation layer for 1GB pages. However, that alone doesn't get you exactly what you want, which is that you want to guarantee a specific page size. This is also something that would be nice to have for 2MB pages as well, and I have talked a bit about that with Alan in the past. What I do think would be useful would be to have a new mmap flag which requests that a mapping not use demotion/promotion but fully use any reservations it makes. You could also have it fail if it can't get reservations for the entire range. Alan suggests to call this MAP_HUGETLB to match Linux since it would provide similar semantics. If you then add 1GB pages as an second reservation level on amd64 and make the semantics of MAP_HUGETLB such that it uses the largest reservations possible for the mapping size (so a request for 1G uses 1G page instead of 2M pages), then that would give you what you want without having various 1G-specific functions scattered in MI code, etc. It will also be more useful for other platforms going forward (some of which support multiple page sizes that aren't just the 'trim a tree layer' PSE-style you have on x86). -- John Baldwin