Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 06 Jun 2002 05:04:56 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Miguel Mendez <flynn@energyhq.homeip.net>
Cc:        Stephen Montgomery-Smith <stephen@math.missouri.edu>, freebsd-hackers@freebsd.org
Subject:   Re: allocating memory
Message-ID:  <3CFF4FE8.86C44C31@mindspring.com>
References:  <3CFEEB99.AEDC5DB9@math.missouri.edu> <3CFF2780.FAD81226@mindspring.com> <20020606122702.A81113@energyhq.homeip.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Miguel Mendez wrote:
> > As for topping out at ~2.5G: yes: that's what's expected.  If you
> > really need more memory than that, you will need to drop ~US$10K
> > on a 64 bit Itanium machine, and petition Peter Wemm for the correct
> > dead chicken to wave over the thing.
> 
> Just out of curiosity, why to you advocate the Itanium so much?

Didn't think I was... telling someone that they would have to drop
~US$10K is hardly advocacy in my book.


> It's, by far, the worst 64bit arch I've ever seen. Maybe in 3 years
> or so it will be used to serious work, until then, you're much better
> off with a POWER4 or Sparc box, although newer Sun hardware seems to
> be pretty disappointing in the engineering department, so go IBM :)

FreeBSD doesn't run on these boxes yet; there's a SPARC 64 bit port
in progress, as well as a PPC port in progress (32 bit only, AFAIK).

The SPARC 64 that's probably not going to see a lot of use until the
number of 64 bit SPARC machines available second-hand comes anywhere
close to 10% of the 32 bit SPARC machines that can be had out there;
in terms of price/performance, they are still way too expensive.  I
guess if I didn't limit your remarks to FreeBSD, and included Solaris,
AIX, or NetBSD as options, the field would be larger.

You didn't mention the Alpha, but Alpha is end of life these days,
now that HP owns what Intel didn't already own.  The FreeBSD Alpha
port (which *is* complete) unfortunately can't handle more than 2G
of RAM (apparently, this has to do with the I/O architecture issues
that have yet to be resolved completely, by way of code changes
that *should* happen, but haven't, because the i386 doesn't need
them to happen).


> Or he could go the PAE way, and get an x86 box with 8 or 16GB of memory.
> However, I don't know how good the support for that in FreeBSD is.

It's not there; there would have been a big announcement, I think,
like "Hyperthreading" (really, SMT).  Peter Wemm was reported by
Alfred Perlstein to have been working on it.  If Peter is on it, it
will happen eventually, but I don't think it will be useful unless
your problem is swap-bound systems, not CPU or I/O bound systems;
IMO, most systems end up I/O bound because of system clock halving...
er... CPU clock doubling... er... whatever.  Peter is with Yahoo,
and Yahoo runs a lot of large user programs simultaneously on a given
machine, so they actually have a real use for this.


[ ...Skip to the end if you love PAE, and don't want to
     read anyone "insulting" it... ]

Here's my unvarnished opinion of PAE: it sucks to the point of
near non-utility.

Here's why:

The problem with PAE is that, while it extends the amount of
physical RAM you can access, it doesn't extend the amount of
physical RAM you can access *simultaneously*.  It also doesn't
increase your kernel or user virtual address space: the total
of the two still can't exceed 4G, even when using PAE.

It's basically the 32 bit version of the 16 bit bank selection
in the Commodore64, or if you want to dig deeper, the bank
selection that most of us eventually ran on our 8 bit SWTP
6800's or IMSAI 8080's, back in the mid 1970's, so we could
cram more than 256b of RAM into our little S-100 bus boxes;
those of us with infinitely deep wallets sometimes had up to
an unbelievable 4K in the buggers...

What do you get out of PAE?  Faster, very low granularity swap,
that eventually becomes much more expensive to LRU out to real
swap, once it's all filled up (guaranteed: your least recently
used data is not in the bank that's currently selected in!).

In other words, if you need X G of RAM, then 4 times that much
is not going to save you, and you need to reconsider your code.
Unless, of course, you don't expect you business or application or
data set to grow, over time, and your system load remains constant.

You can't DMA into the thing.  If you have a 64 bit PCI, and your
motherboard is designed just right, you might be able to swing it;
I've actually seen an Intel Server Products Division MB that would
let you DMA into banked-out memory (but it still flushed your L2
and L1 at the memory location mod 4G, meaning you should limit
yourself to Amiga-like "FastRAM", otherwise known as "bounce
buffers").  Mostly, you have to pretend you are a DEC Alpha,
and limit DMA buffers to the low 4G.  That's mbufs, disk DMAs,
everything.

Cool.  Start at 1G.  Add 1G.  Get more mbufs and buffer cache.
Add 1G.  Get more mbufs and buffer cache.   Add 1G.  Get more
mbufs and buffer cache.  Add 1G.  Lose 2G worth of mbufs and
buffer cache.

There's also the VM problem.  In a 4G system, you are likely
to be able to only grab a random chunk of 2G -- half the RAM
physically in the machine -- from the available "extra" memory.
What this boils down to is that the physical RAM ends up getting
used up for housekeeping of the physical RAM.  You can push this
up closer to 3G; but to do that, you have to make modifications
to pmap.c and machdep.c.

Adding PAE into the mix means you are probably going to spend,
at a minimum, 1/4 of the physical RAM on housekeeping.  For a
16G machine, that's 4G.  And guess what?  It can't all be in
core (bank selected in) at the same time, or you end up having
no KVA space left to bank-select in the RAM that it's the
housekeeping for.  So the only real approach is to got to a
MIPS-like software lookaside (hierarchical) so that you can
take each bank, and take the 1/4 out of the bank itself.  This
works, but it's incredibly expensive, overall.

Basically ...the memory is only good for programs themselves,
not for DMA, mmap'ed files, more mbufs... anything really useful.

Now add in that most new, fast memory is actually limited to
2G in a system because of the memory modules being used (e.g.
the new 450MHz memory busses can't handle more than 2G).

[ ...You can stop skipping now... ]

FreeBSD uses the standard recursive mapping, and it uses soft
switching, which means that it has a small TSS profile (Intel
processors are limited to 1024 of these, which is why Linux had
to move to a similar model: before that, they were intrinsically
limited to 1024 processes, as is any OS that uses one per process,
per the Intel programming manual recommendations -- yet another
glaring piece of evidence that software engineers are rarely
consulted before hardware is designed).

The easiest method of dealing with this would be to grab a
group of these (FreeBSD uses only a few of them, and tends to
use only one for all processes on the system, unless VM86 gets
involved), and then map them up to the banks, so that when
you do switching to a process, you can lazy-bind the PAE bank
selection at the same time.  Probably you would have to add in
scheduler mojo (like Linux) so that given a list of equal
priority processes, you prefer the ones that are in the same
bank as the current process.

Probably, you will want to end up double-caching things like
code pages from Apache, or making sure that shared code pages
end up outside the bank selection region, but if you are adding
significant attribution to the VM system to handle PAE in the
first place, then it's negligible additional overhead.

[ ...Skip again, if you don't want a negative bottom line... ]

The bottom line is that, in order to have a *usable* amount of
physical RAM over 4G, you pretty much have to go to a 64 bit
platform, and if you are a user, now that Alpha is dead and no
one looks to be making quick progress on the Alpha 2G barrier,
that pretty much means "Itanium".

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CFF4FE8.86C44C31>