From owner-freebsd-stable@FreeBSD.ORG Tue Dec 16 00:33:05 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A9791C2D; Tue, 16 Dec 2014 00:33:05 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 883CA3A5; Tue, 16 Dec 2014 00:33:05 +0000 (UTC) Received: from [100.121.71.237] (175.sub-70-197-5.myvzw.com [70.197.5.175]) by elvis.mu.org (Postfix) with ESMTPSA id 0ED57341F83D; Mon, 15 Dec 2014 16:33:05 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: i386 PAE kernel works fine on 10-stable From: Alfred Perlstein X-Mailer: iPhone Mail (12B440) In-Reply-To: <1641407.80FsgLC8bS@overcee.wemm.org> Date: Mon, 15 Dec 2014 16:33:04 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1418579278.2026.9.camel@freebsd.org> <1418580756.2026.12.camel@freebsd.org> <847BD158-0867-4F5F-83A9-1651E77D29EF@mu.org> <1641407.80FsgLC8bS@overcee.wemm.org> To: Peter Wemm Cc: "freebsd-stable@freebsd.org" , Ian Lepore X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Dec 2014 00:33:05 -0000 > On Dec 15, 2014, at 3:42 PM, Peter Wemm wrote: >=20 >> On Sunday, December 14, 2014 10:53:14 AM Alfred Perlstein wrote: >>> On Dec 14, 2014, at 10:12 AM, Ian Lepore wrote: >>>> On Sun, 2014-12-14 at 10:09 -0800, Alfred Perlstein wrote: >>>>> On Dec 14, 2014, at 9:47 AM, Ian Lepore wrote: >>>>> This is an out of the blue FYI post to let people know that despite al= l >>>>> the misinformation you'll run across if you search for information on >>>>> FreeBSD PAE support, it (still) works just fine. I've been using it >>>>> (for reasons related to our build system and products at $work) since >>>>> 2006, and I can say unequivocally that it works fine on 6.x, 8.x, and >>>>> now 10.x (and presumably on the odd-numbered releases too but I've nev= er >>>>> tried those). >>>>>=20 >>>>> In my most recent testing with 10-stable, I found it was compatible wi= th >>>>> drm2 and radeonkms drivers and I was able to run Xorg and gnome just >>>>> fine. All my devices, and apps, and even the linuxulator worked just >>>>> fine. >>>>>=20 >>>>> One thing that changed somewhere between 8.4 and 10.1 is that I had to= >>>>> add a kernel tuning option to my kernel config: >>>>>=20 >>>>> option KVA_PAGES=3D768 # Default is 512 >>>>>=20 >>>>> I suspect that the most frequent use of PAE is on laptops that have 4g= b >>>>> and the default tuning is adequate for that. My desktop machine has >>>>> 12gb and I needed to bump up that value to avoid errors related to bei= ng >>>>> unable to create new kernel stacks. >>>>=20 >>>> There already is a #define that is bifurcated based on PAE in pmap.h: >>>>=20 >>>> #ifndef KVA_PAGES >>>> #ifdef PAE >>>> #define KVA_PAGES 512 >>>> #else >>>> #define KVA_PAGES 256 >>>> #endif >>>> #endif >>>>=20 >>>> Do you think it will harm things to apply your suggested default to thi= s >>>> file?> >>> I would have to defer to someone who actually understands just what that= >>> parm is tuning. It was purely speculation on my part that the current >>> default is adequate for less memory than I have, and I don't know what >>> that downside might be for setting it too high. >>=20 >> KVA pages is the amount of pages reserved for kernel address space: >>=20 >> * Size of Kernel address space. This is the number of page table pages >> * (4MB each) to use for the kernel. 256 pages =3D=3D 1 Gigabyte. >> * This **MUST** be a multiple of 4 (eg: 252, 256, 260, etc). >> * For PAE, the page table page unit size is 2MB. This means that 512 pag= es >> * is 1 Gigabyte. Double everything. It must be a multiple of 8 for PAE.= >>=20 >> It appears that our default for PAE leaves 1GB for kernel address to play= >> with? That's an interesting default. Wonder if it really makes sense fo= r >> PAE since the assumption is that you'll have >4GB ram in the box, wiring >> down 1.5GB for kernel would seem to make sense=E2=80=A6 Probably make se= nse to ask >> Peter or Alan on this. >=20 > It's always been a 1GB/3GB split. It was never a problem until certain=20= > scaling defaults were changed to scale solely based on physical ram withou= t=20 > regard for kva limits. Hmm the original patch I gave for that only changed scaling for machines wit= h 64 bit pointers. Why was it that the 32 bit stuff was made to change? >=20 > With the current settings and layout of the userland address space between= the=20 > zero-memory hole, the reservation for maxdsiz, followed by the ld-elf.so.1= =20 > space and shared libraries, there's just enough room to mmap a 2GB file an= d=20 > have a tiny bit of wiggle room left. >=20 > With changing the kernel/user split to 1.5/2.5 then userland is more=20 > restricted and is typically around the 1.8/1.9GB range. >=20 > You can get a large memory PAE system to boot with default settings by=20 > seriously scaling things down like kern.maxusers, mbufs limits, etc. >=20 > However, we have run ref11-i386 and ref10-i386 in the cluster for 18+ mont= hs=20 > with a 1.5/2.5 split and even then we've run out of kva and we've hit a fe= w=20 > pmap panics and things that appear to be fallout of bounce buffer problems= . >=20 > While yes, you can make it work, I am personally not convinced that it is=20= > reliable. >=20 > My last i386 PAE machine died earlier this year with a busted scsi backpla= ne=20 > for the drives. It went to the great server crusher. Oh I made dumb assumption that pae was 4/4 basically not split. Ok thanks.=20= >=20 >> Also wondering how bad it would be to make these tunables, I see they >> trickle down quite a bit into the system, hopefully not defining some >> static arrays, but I haven't dived down that far. >=20 > They cause extensive compile time macro expansion variations that are expo= rted=20 > to assembler code via genassym. KVA_PAGES is not a good candidate for a=20= > runtime tunable unless you like the pain of i386/locore.s and friends. Ouch. Ok.=20 -Alfred.=20=