Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Dec 2014 15:42:56 -0800
From:      Peter Wemm <peter@wemm.org>
To:        freebsd-stable@freebsd.org
Cc:        Alfred Perlstein <bright@mu.org>, Ian Lepore <ian@freebsd.org>
Subject:   Re: i386 PAE kernel works fine on 10-stable
Message-ID:  <1641407.80FsgLC8bS@overcee.wemm.org>
In-Reply-To: <847BD158-0867-4F5F-83A9-1651E77D29EF@mu.org>
References:  <1418579278.2026.9.camel@freebsd.org> <1418580756.2026.12.camel@freebsd.org> <847BD158-0867-4F5F-83A9-1651E77D29EF@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On Sunday, December 14, 2014 10:53:14 AM Alfred Perlstein wrote:
> On Dec 14, 2014, at 10:12 AM, Ian Lepore wrote:
> > On Sun, 2014-12-14 at 10:09 -0800, Alfred Perlstein wrote:
> >> On Dec 14, 2014, at 9:47 AM, Ian Lepore wrote:
> >>> This is an out of the blue FYI post to let people know that despite all
> >>> the misinformation you'll run across if you search for information on
> >>> FreeBSD PAE support, it (still) works just fine.  I've been using it
> >>> (for reasons related to our build system and products at $work) since
> >>> 2006, and I can say unequivocally that it works fine on 6.x, 8.x, and
> >>> now 10.x (and presumably on the odd-numbered releases too but I've never
> >>> tried those).
> >>> 
> >>> In my most recent testing with 10-stable, I found it was compatible with
> >>> drm2 and radeonkms drivers and I was able to run Xorg and gnome just
> >>> fine.  All my devices, and apps, and even the linuxulator worked just
> >>> fine.
> >>> 
> >>> One thing that changed somewhere between 8.4 and 10.1 is that I had to
> >>> add a kernel tuning option to my kernel config:
> >>> 
> >>> option  KVA_PAGES=768	    # Default is 512
> >>> 
> >>> I suspect that the most frequent use of PAE is on laptops that have 4gb
> >>> and the default tuning is adequate for that.  My desktop machine has
> >>> 12gb and I needed to bump up that value to avoid errors related to being
> >>> unable to create new kernel stacks.
> >> 
> >> There already is a #define that is bifurcated based on PAE in pmap.h:
> >> 
> >> #ifndef KVA_PAGES
> >> #ifdef PAE
> >> #define KVA_PAGES       512
> >> #else
> >> #define KVA_PAGES       256
> >> #endif
> >> #endif
> >> 
> >> Do you think it will harm things to apply your suggested default to this
> >> file?> 
> > I would have to defer to someone who actually understands just what that
> > parm is tuning.  It was purely speculation on my part that the current
> > default is adequate for less memory than I have, and I don't know what
> > that downside might be for setting it too high.
> 
> KVA pages is the amount of pages reserved for kernel address space:
> 
>  * Size of Kernel address space.  This is the number of page table pages
>  * (4MB each) to use for the kernel.  256 pages == 1 Gigabyte.
>  * This **MUST** be a multiple of 4 (eg: 252, 256, 260, etc).
>  * For PAE, the page table page unit size is 2MB.  This means that 512 pages
> * is 1 Gigabyte.  Double everything.  It must be a multiple of 8 for PAE.
> 
> It appears that our default for PAE leaves 1GB for kernel address to play
> with?  That's an interesting default.  Wonder if it really makes sense for
> PAE since the assumption is that you'll have >4GB ram in the box, wiring
> down 1.5GB for kernel would seem to make sense…  Probably make sense to ask
> Peter or Alan on this.

It's always been a 1GB/3GB split.  It was never a problem until certain 
scaling defaults were changed to scale solely based on physical ram without 
regard for kva limits.

With the current settings and layout of the userland address space between the 
zero-memory hole, the reservation for maxdsiz, followed by the ld-elf.so.1 
space and shared libraries, there's just enough room to mmap a 2GB file and 
have a tiny bit of wiggle room left.

With changing the kernel/user split to 1.5/2.5 then userland is more 
restricted and is typically around the 1.8/1.9GB range.

You can get a large memory PAE system to boot with default settings by 
seriously scaling things down like kern.maxusers, mbufs limits, etc.

However, we have run ref11-i386 and ref10-i386 in the cluster for 18+ months 
with a 1.5/2.5 split and even then we've run out of kva and we've hit a few 
pmap panics and things that appear to be fallout of bounce buffer problems.

While yes, you can make it work, I am personally not convinced that it is 
reliable.

My last i386 PAE machine died earlier this year with a busted scsi backplane 
for the drives.  It went to the great server crusher.

> Also wondering how bad it would be to make these tunables, I see they
> trickle down quite a bit into the system, hopefully not defining some
> static arrays, but I haven't dived down that far.

They cause extensive compile time macro expansion variations that are exported 
to assembler code via genassym.  KVA_PAGES is not a good candidate for a 
runtime tunable unless you like the pain of i386/locore.s and friends.

-- 
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV
UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAABAgAGBQJUj3IEAAoJEDXWlwnsgJ4EkgQIAIj+f+4YMStdhhJB3v3m9EDa
5AELMW3jIZDyiRrC4qdo7EGcc2JjVRPxROup5+8xXpfvSVZgx05nbe0jZ73letoS
y0+DS2v5m1YC1f6j0t8aS3ZZmAkMpRX+ibKSiSCXEg9hGZHytBgefzDiXluOwS+3
wzLu0gSIUY10d0NzeyL7J8GjiyRFG2bkfYlkg5hhiTopXW/LKJ8qkUeBNbd8KmIG
I4yLhdzGF+zvX8RX9LW44kEUAVRt48xtA9ANA4rklJQt/mU38fNfFCH8DC9obvSj
pGEOCn7Y3T/hCFeOz4G40jtU4JNOJC30wV6y6RED3KYXbQvLUMzjlUa5qDVHKcA=
=vGGT
-----END PGP SIGNATURE-----

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1641407.80FsgLC8bS>