Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Aug 2014 17:35:52 -0500
From:      Alan Cox <alan.l.cox@gmail.com>
To:        Warner Losh <imp@bsdimp.com>
Cc:        "freebsd-arch@freebsd.org" <arch@freebsd.org>, Peter Grehan <grehan@freebsd.org>
Subject:   Re: superpages for UMA
Message-ID:  <CAJUyCcM_4-jiJ5PqnmT6H-2qg63nEXmpZ69vGGb6SR0Trp8e0Q@mail.gmail.com>
In-Reply-To: <257A0976-7C5E-4029-AF32-BFB3A6C60832@bsdimp.com>
References:  <53F215A9.8010708@FreeBSD.org> <20140818183925.GP2737@kib.kiev.ua> <CAJUyCcM7ZipmYu8OLxT2TCPjS%2BCSTGPRnotdKgchoNQH8s8ndA@mail.gmail.com> <53F25E60.5050109@freebsd.org> <257A0976-7C5E-4029-AF32-BFB3A6C60832@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 18, 2014 at 3:26 PM, Warner Losh <imp@bsdimp.com> wrote:

>
> On Aug 18, 2014, at 2:13 PM, Peter Grehan <grehan@freebsd.org> wrote:
>
> >> Newer Intel CPUs have more entries, and AMD CPUs have long (since
> >> Barcelona) had more.  In particular, they allow 2 MB page mappings to =
be
> >> cached in a larger L2 TLB.  Nowadays, the trouble is with the 1 GB
> pages.
> >> A lot of CPUs still only support an 8 entry, 1 level TLB for 1 GB page=
s.
> >
> > There are new(ish) ones effectively without 1GB pages. From the
> "Software Optimization Guide for AMD Family 16h Processors"
> >
> > "Smashing"
> >  ...
> > "when the Family 16h processor encounters a 1-Gbyte page size, it will
> smash translations of that 1-Gbyte region into 2-Mbyte TLB entries, each
> > of which translates a 2-Mbyte region of the 1-Gbyte page."
>
> =E2=80=9Cwe=E2=80=99ll emulate this feature designed to make things go fa=
ster in hardware
> in software by doing the very thing that makes it go slow in hardware.=E2=
=80=9D
>
> Fun times. Performance Smashing!
>
>

I'm guessing that these are low-power processors, where they don't want to
have another CAM consuming power.  Under those circumstances, it's still
better to support 1 GB page mappings in the page table even if the TLB
doesn't support them than not to support 1 GB page mappings at all.  With
the hierarchical page tables on x86, you get a 512x reduction in page table
size with each increase in page size.  So, on a TLB miss, the page table
walk is more likely to be all L2 data cache hits, rather than misses that
go all the way to DRAM.

One feature that I always liked about the AMD performance counters was that
they allowed you to count L2 cache misses caused by page table walks on a
TLB miss.  This was often a better predictor of whether large pages were
going to be beneficial than counting TLB misses.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJUyCcM_4-jiJ5PqnmT6H-2qg63nEXmpZ69vGGb6SR0Trp8e0Q>