Date: Thu, 06 Jun 2002 07:57:43 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Miguel Mendez <flynn@energyhq.homeip.net> Cc: freebsd-hackers@freebsd.org Subject: Re: allocating memory Message-ID: <3CFF7867.4F7193E2@mindspring.com> References: <3CFEEB99.AEDC5DB9@math.missouri.edu> <3CFF2780.FAD81226@mindspring.com> <20020606122702.A81113@energyhq.homeip.net> <3CFF4FE8.86C44C31@mindspring.com> <20020606152458.A81446@energyhq.homeip.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Miguel Mendez wrote: > On Thu, Jun 06, 2002 at 05:04:56AM -0700, Terry Lambert wrote: > How come? A Sun Blade 100 is about $1,000. That's not what I call > expensive. It's not an E4500, but not a bad box once you load it with a > bit more RAM and a SCSI controller. You can get Ultra 10 boxen pretty > cheap these days too. A Sun Blade 100 is limited to 2G of RAM. A Sun Blade 2000 (limited to 8G of RAM) is ~US$10K The lowest cost Ultra workstation (the 60) is also limited to 2G, and costs ~US$7K. The V120 rack mount is ~US$2.5K; it's the lowest end system that can do 4G. To get to 8G, you need to go to the 280R; also ~US$10K. > > guess if I didn't limit your remarks to FreeBSD, and included Solaris, > > AIX, or NetBSD as options, the field would be larger. > > Yes, I'd include those OS, as FreeBSD is not, and won't be, production > ready for a while for those platforms. I guess you could post about that to the "solaris-hackers@sun.com" mailing list, if such a thing existed... ;^). > > port (which *is* complete) unfortunately can't handle more than 2G > > of RAM (apparently, this has to do with the I/O architecture issues > > that have yet to be resolved completely, by way of code changes > > that *should* happen, but haven't, because the i386 doesn't need > > them to happen). > > It seems to me most developers have lost interest in it and moved > already towards more exciting targets, like the sparc port. Uh, there are some things that are transportable like that, but most things aren't. "I used to hack Alpha assembly code, but now I think I will go hack SPARC 64 assembly code" doesn't really happen in the real world (unless you are this crazy guy I know). > > It's not there; there would have been a big announcement, I think, > > like "Hyperthreading" (really, SMT). Peter Wemm was reported by > > Alfred Perlstein to have been working on it. If Peter is on it, it > > Well, now *that* would be interesting to see, as a hacker exercise. Peter is a commercial, professional programmer. Not just a hacker. You don't get that kind of depth in efforts out of volunteers who are not nuts. 8-). > Assume a (software based) 64bit address space, by means of using long > long for pointers. Of course you can only access a 4GB chunk at a time, > but programs need not to know about that. Do they want to malloc or mmap > 8GB? You let them. If the program is doing random access all the time, > it will spend a lot of time in kernel, as not only pages, but segments > have to be taken in account when accessing a memory location. It would > work pretty well for programs doing consecutive accesses to their > dataset (or within the 4GB boundary). Doing some MMU magic you can have > a transparent system to allow programs use more than 4GB. Virtual addressing is handled in hardware, which is limited to 32 bits. To make this idea work, you would have to take a fault on every memory access, and then do a fixup that (maybe) included a bank selection process as well (similar to how write faults are emulated for i386 in supervisor mode, since they do not result in faults, and you want to avoid people using copy-on-write on a read to a bogus address to spam kernel memory as a means of hacking a higher priviledge level by reading, say, a uid of 0 into the current process's cred). Handling this would be so incredibly expensive that you might as well give up and just add swap to the system in question. Really, the only way to deal with it adequately is by abusing hardware, at a task granularity, where you have work to do at task management time, anyway, and it can be amortized over a lot of CPU time. > Maybe if/when that hardware becomes affordable I'll try myself such a hack :) You should (in theory, from the documentation -- I don't have a PAE board with 3G of RAM lying around to check) be able to bank select even without the extra hardware, as long as the PAE is supported in the processor. You just need enough RAM to be able to fit in the low granularity for two windows contents worth. > > In other words, if you need X G of RAM, then 4 times that much > > is not going to save you, and you need to reconsider your code. > > Databases for one love to have huge amounts of memory. It's not uncommon > to have e.g. informix processes using 16GB of ram on Sun big iron. Good reason to buy 64 bit iron, IMO, instead of trying to pretend by emulating your 1GHz pentium with 32M of ram on your PC-XT and swapping to the old ST506 to simulate RAM. > > What this boils down to is that the physical RAM ends up getting > > used up for housekeeping of the physical RAM. You can push this > > up closer to 3G; but to do that, you have to make modifications > > to pmap.c and machdep.c. > > Such an enhancement needs a lot of modifications to the VM subsystem. Not as many as you might think, actually. The PPC and Alpha memory management somewhat resemble the work necessary to be done in software. And the task switching has to happen anyway. Most of the problem is in the bank selection and limiting device drivers to not using banked memory. Even so, I don't think it's worthwhile. The modifications that *are* needed are fugly, and unlikely to be committed by anyone polite, IMO. > > housekeeping for. So the only real approach is to got to a > > MIPS-like software lookaside (hierarchical) so that you can > > take each bank, and take the 1/4 out of the bank itself. This > > works, but it's incredibly expensive, overall. > > Hmm, yes. So what does Windows 2000 Datacenter do wrt that problem? > Waste memory like there's no tomorrow? It's always fun to try to poke at Windows with a sharp stick, but I'll take your comment literally, instead of as a sideways jab at Windows: Actually, I have no idea. I know how I would do it in Windows 98 and in Windows NT 3.5 and 4.0 SP2, if it were my job to do, but as to what they actually do, and in a more "modern" version of Windows, I don't know, since I haven't had the pleasure of grovelling through the code of a more modern Windows. The closest I could come would be some educated guesses. There are at least three places you would have to hack in VMM32.VXD, and about six other places in the IFSmgr and networking code, and I'm probably forgetting some esoteric code path I never had to crawl through with WinICE. Probably, the MS people got some input into the design, so it's close enough to what they were already doing that their overhead would be lower. The Linux overhead is pretty low, too, since they do a lot of stuff in software that FreeBSD does in hardware, in their VM, in order to make it more naturally easy to port. The design is less Intel-centric, making it a bit slower on Intel than it could be, if they were running closer to the glass. > > Basically ...the memory is only good for programs themselves, > > not for DMA, mmap'ed files, more mbufs... anything really useful. > > Of course, it's the applications demanding memory we are talking about. > For the OS itself, it's just a half assed solution, no practical at all. I'd really have to go out of my way to design a real pig of an application in order to make it need this. Almost everything I do these days ends up I/O bound, where the ability to move data in and out of memory ends up being the bottleneck. With rare exceptions, even going Gigabit, I have a hard time pushing an 800MHz CPU over 60%. The PAE increases the amount of copying, and so it doesn't save me bus cycles off my memory bus. Even if all I did was use the memory as a soft "L3 cache", I've got a lot more copy overhead, which means that if my problem is my memory bus bandwidth, all I'm doing is shooting myself im my knee so as to avoid hitting my foot. I have applications that would really like the extra memory, but as they are all network applications, and I can't use the extra memory as mbufs because I can't DMA into or out of it without adding an extra copy in both directions, and I's have to add a copy, both in and out, in most cases where I don't have one today. So unless I did something dumb, like run a whole bunch of virtual servers (I'd be inclined to bank select between servers, and then time slice on the same boundary), I'd be hard pressed to find a situation where it was a win (dumb because I might as well build more 1U boxes: they're less expensive, faster, and one crashing doesn't kill everyone else). > > Now add in that most new, fast memory is actually limited to > > 2G in a system because of the memory modules being used (e.g. > > the new 450MHz memory busses can't handle more than 2G). > > Add more memory buses :) There is one motherboard that I know of with 2 (limit 4G). The problem is that there is good evidence that most people who build chipsets couldn't build one that could walk and chew gum at the same time without causing problems. I would have a very hard time tryusing something like that. The AMD Hammer stuff with the Hyperchannel, I think will be OK, if they ever start selling boxes this century. They are already 8 months behind their opriginal "tape out" date of last November. Right now, it's just so much vapor. > > priority processes, you prefer the ones that are in the same > > bank as the current process. > > I'd keep all .text pages in the low 4GB of the machine. The probability > that a program's code is bigger than that is, imho, null. At this point, you are redesigning it into an application specific OS, rather than a general purpose OS. If you do that, you really can't expect that anyone would be willing to accept the penalty or maintain the code, if they didn't. This might work well for your specific need (the ability to have up to 4 times the current memory limit without buying a 64 bit processor, at the expense of really expensive 32 bit hardware), but the marginal returns mean that the IRR on the investment is going to satisfy maybe 5% of people who need that much RAM, which I would argue are, at most, 5% of the users. That's just under 3 tenths of a percent, agregate, for the user base. No wonder it is not already supported. 8-). > > The bottom line is that, in order to have a *usable* amount of > > physical RAM over 4G, you pretty much have to go to a 64 bit > > platform, and if you are a user, now that Alpha is dead and no > > one looks to be making quick progress on the Alpha 2G barrier, > > that pretty much means "Itanium". > > Except Itanium is nowhere production ready, so you probably need > something else, e.g. sparc or ppc. Mips is also a nice arch to work > with, btw, unfortunately SGI hardware is extremely expensive. Production :== I can buy one at Fry's and load FreeBSD on it, and it will work. So it counts as "production", I think. If you want a MIPS box that supports a lot of RAM, buy a Sibytes card. Chris Dimetreau is one of the guys who worked on it, so it runs NetBSD, and plugs into a PCI slot. It's supposed to be a "network processor". Be warned that the CPU speed on the MIPS cores is pretty freakishly slow, compared to the original product announcement, but if you are willing to entertain the idea of PAE, then "freakishly slow" obviously doesn't bother you. ;^). Personally, I think that's a lot of effort, just to make political noises about Itanium. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CFF7867.4F7193E2>