FreeBSD Mail Archives

Date:      Mon, 23 Sep 2013 17:16:44 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        Sebastian Kuzminsky <S.Kuzminsky@f5.com>
Cc:        Patrick Dung <patrick_dkt@yahoo.com.hk>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, "ivoras@freebsd.org" <ivoras@freebsd.org>
Subject:   Re: About Transparent Superpages and Non-transparent superapges
Message-ID:  <CAJ-Vmokq0NpBcA-CpWE7NDoRAbQoa0s3%2BLTTKkvyg0PuGpBvzg@mail.gmail.com>
In-Reply-To: <EF6D6322-D661-48BB-B814-CD1152DCF157@f5.com>
References:  <mailman.2681.1379448875.363.freebsd-hackers@freebsd.org> <1379520488.49964.YahooMailNeo@web193502.mail.sg3.yahoo.com> <22E7E628-E997-4B64-B229-92E425D85084@f5.com> <1379649991.82562.YahooMailNeo@web193502.mail.sg3.yahoo.com> <B3A1DB16-7919-4BFA-893C-5E8502F16C17@f5.com> <CAJ-Vmom8PRhJgVXzg=ASckJ4rOqrrFpxfiwM2cnD0WvYwBgm8w@mail.gmail.com> <EF6D6322-D661-48BB-B814-CD1152DCF157@f5.com>

On 23 September 2013 14:30, Sebastian Kuzminsky <S.Kuzminsky@f5.com> wrote:

> On Sep 23, 2013, at 15:24 , Adrian Chadd wrote:
>
> > On 20 September 2013 08:20, Sebastian Kuzminsky <S.Kuzminsky@f5.com>
> wrote:
> >
> > It's transparent for the kernel: all of UMA and
> kmem_malloc()/kmem_free() is backed by 1 gig superpages.
> >
> > .. not entirely true, as I've found out at work. :(
>
> Can you expand on this, Adrian?
>
> Did you compile & boot the github branch i pointed to, and run in to a
> situation where kmem_malloc() returned memory not backed by 1 gig pages, on
> hardware that supports it?
>
>
I haven't done that yet, sorry.

So the direct map is backed by 1GB pages, except when it can't be:

* first 1GB - because of the memory hole(s)
* the 4th GB - because of the PCI IO hole(s)
* the end of RAM - because of memory remapping so you don't lose hundreds
of megabytes of RAM behind said memory/IO/ROM holes, the end of RAM isn't
on a 1GB boundary.

So, those regions seem to get mapped by smaller pages.

I'm still tinkering with this; I'd like to hack things up to (a) get all
the VM structures in the last gig of aligned RAM, so it falls inside a 1GB
direct mapped page, and (b) prefer that 1GB page for kernel allocations, so
things like mbufs, vm_page entries, etc all end up coming from the same 1GB
direct map page.

I _think_ I have an idea of what to do -  I'll create a couple of 1GB sized
freelists in the last two 1GB direct mapped regions at the end of RAM, then
I'll hack up the vm_phys allocator to prefer allocating from those.

The VM structures stuff is a bit more annoying - it gets allocated from the
top of RAM early on during boot; so unless your machine has the last region
of RAM fall exactly on a 1GB boundary, it'll be backed by 4k/2m pages. I
tested this out by setting hw.physmem to force things to be rounded on a
boundary and it helped for a while. Unfortunately the fact that everything
else gets allocated from random places in physical memory meant that I'm
thrashing the TLB cache - there's only 4 1GB slots on Sandy Bridge Xeon;
and with 64gig of RAM I'm seeing a 10-12% miss load when serving lots of
traffic from SSD (all with mbuf and vm structure allocations.)

So, if I can remove that 10% of CPU cycles taken walking pages, I'll be
happy.

Note: I'm a newbie here in the physical mapping code. :-)

-adrian

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmokq0NpBcA-CpWE7NDoRAbQoa0s3%2BLTTKkvyg0PuGpBvzg>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation