Date: Thu, 06 Jun 2002 05:04:56 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Miguel Mendez <flynn@energyhq.homeip.net> Cc: Stephen Montgomery-Smith <stephen@math.missouri.edu>, freebsd-hackers@freebsd.org Subject: Re: allocating memory Message-ID: <3CFF4FE8.86C44C31@mindspring.com> References: <3CFEEB99.AEDC5DB9@math.missouri.edu> <3CFF2780.FAD81226@mindspring.com> <20020606122702.A81113@energyhq.homeip.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Miguel Mendez wrote: > > As for topping out at ~2.5G: yes: that's what's expected. If you > > really need more memory than that, you will need to drop ~US$10K > > on a 64 bit Itanium machine, and petition Peter Wemm for the correct > > dead chicken to wave over the thing. > > Just out of curiosity, why to you advocate the Itanium so much? Didn't think I was... telling someone that they would have to drop ~US$10K is hardly advocacy in my book. > It's, by far, the worst 64bit arch I've ever seen. Maybe in 3 years > or so it will be used to serious work, until then, you're much better > off with a POWER4 or Sparc box, although newer Sun hardware seems to > be pretty disappointing in the engineering department, so go IBM :) FreeBSD doesn't run on these boxes yet; there's a SPARC 64 bit port in progress, as well as a PPC port in progress (32 bit only, AFAIK). The SPARC 64 that's probably not going to see a lot of use until the number of 64 bit SPARC machines available second-hand comes anywhere close to 10% of the 32 bit SPARC machines that can be had out there; in terms of price/performance, they are still way too expensive. I guess if I didn't limit your remarks to FreeBSD, and included Solaris, AIX, or NetBSD as options, the field would be larger. You didn't mention the Alpha, but Alpha is end of life these days, now that HP owns what Intel didn't already own. The FreeBSD Alpha port (which *is* complete) unfortunately can't handle more than 2G of RAM (apparently, this has to do with the I/O architecture issues that have yet to be resolved completely, by way of code changes that *should* happen, but haven't, because the i386 doesn't need them to happen). > Or he could go the PAE way, and get an x86 box with 8 or 16GB of memory. > However, I don't know how good the support for that in FreeBSD is. It's not there; there would have been a big announcement, I think, like "Hyperthreading" (really, SMT). Peter Wemm was reported by Alfred Perlstein to have been working on it. If Peter is on it, it will happen eventually, but I don't think it will be useful unless your problem is swap-bound systems, not CPU or I/O bound systems; IMO, most systems end up I/O bound because of system clock halving... er... CPU clock doubling... er... whatever. Peter is with Yahoo, and Yahoo runs a lot of large user programs simultaneously on a given machine, so they actually have a real use for this. [ ...Skip to the end if you love PAE, and don't want to read anyone "insulting" it... ] Here's my unvarnished opinion of PAE: it sucks to the point of near non-utility. Here's why: The problem with PAE is that, while it extends the amount of physical RAM you can access, it doesn't extend the amount of physical RAM you can access *simultaneously*. It also doesn't increase your kernel or user virtual address space: the total of the two still can't exceed 4G, even when using PAE. It's basically the 32 bit version of the 16 bit bank selection in the Commodore64, or if you want to dig deeper, the bank selection that most of us eventually ran on our 8 bit SWTP 6800's or IMSAI 8080's, back in the mid 1970's, so we could cram more than 256b of RAM into our little S-100 bus boxes; those of us with infinitely deep wallets sometimes had up to an unbelievable 4K in the buggers... What do you get out of PAE? Faster, very low granularity swap, that eventually becomes much more expensive to LRU out to real swap, once it's all filled up (guaranteed: your least recently used data is not in the bank that's currently selected in!). In other words, if you need X G of RAM, then 4 times that much is not going to save you, and you need to reconsider your code. Unless, of course, you don't expect you business or application or data set to grow, over time, and your system load remains constant. You can't DMA into the thing. If you have a 64 bit PCI, and your motherboard is designed just right, you might be able to swing it; I've actually seen an Intel Server Products Division MB that would let you DMA into banked-out memory (but it still flushed your L2 and L1 at the memory location mod 4G, meaning you should limit yourself to Amiga-like "FastRAM", otherwise known as "bounce buffers"). Mostly, you have to pretend you are a DEC Alpha, and limit DMA buffers to the low 4G. That's mbufs, disk DMAs, everything. Cool. Start at 1G. Add 1G. Get more mbufs and buffer cache. Add 1G. Get more mbufs and buffer cache. Add 1G. Get more mbufs and buffer cache. Add 1G. Lose 2G worth of mbufs and buffer cache. There's also the VM problem. In a 4G system, you are likely to be able to only grab a random chunk of 2G -- half the RAM physically in the machine -- from the available "extra" memory. What this boils down to is that the physical RAM ends up getting used up for housekeeping of the physical RAM. You can push this up closer to 3G; but to do that, you have to make modifications to pmap.c and machdep.c. Adding PAE into the mix means you are probably going to spend, at a minimum, 1/4 of the physical RAM on housekeeping. For a 16G machine, that's 4G. And guess what? It can't all be in core (bank selected in) at the same time, or you end up having no KVA space left to bank-select in the RAM that it's the housekeeping for. So the only real approach is to got to a MIPS-like software lookaside (hierarchical) so that you can take each bank, and take the 1/4 out of the bank itself. This works, but it's incredibly expensive, overall. Basically ...the memory is only good for programs themselves, not for DMA, mmap'ed files, more mbufs... anything really useful. Now add in that most new, fast memory is actually limited to 2G in a system because of the memory modules being used (e.g. the new 450MHz memory busses can't handle more than 2G). [ ...You can stop skipping now... ] FreeBSD uses the standard recursive mapping, and it uses soft switching, which means that it has a small TSS profile (Intel processors are limited to 1024 of these, which is why Linux had to move to a similar model: before that, they were intrinsically limited to 1024 processes, as is any OS that uses one per process, per the Intel programming manual recommendations -- yet another glaring piece of evidence that software engineers are rarely consulted before hardware is designed). The easiest method of dealing with this would be to grab a group of these (FreeBSD uses only a few of them, and tends to use only one for all processes on the system, unless VM86 gets involved), and then map them up to the banks, so that when you do switching to a process, you can lazy-bind the PAE bank selection at the same time. Probably you would have to add in scheduler mojo (like Linux) so that given a list of equal priority processes, you prefer the ones that are in the same bank as the current process. Probably, you will want to end up double-caching things like code pages from Apache, or making sure that shared code pages end up outside the bank selection region, but if you are adding significant attribution to the VM system to handle PAE in the first place, then it's negligible additional overhead. [ ...Skip again, if you don't want a negative bottom line... ] The bottom line is that, in order to have a *usable* amount of physical RAM over 4G, you pretty much have to go to a 64 bit platform, and if you are a user, now that Alpha is dead and no one looks to be making quick progress on the Alpha 2G barrier, that pretty much means "Itanium". -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CFF4FE8.86C44C31>