Date: Thu, 30 Jan 2003 16:57:32 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Peter Wemm <peter@wemm.org> Cc: David Schultz <dschultz@uclink.Berkeley.EDU>, "Andrew R. Reiter" <arr@watson.org>, Scott Long <scott_long@btc.adaptec.com>, arch@FreeBSD.ORG Subject: Re: PAE (was Re: bus_dmamem_alloc_size()) Message-ID: <3E39C9FC.3EAF3345@mindspring.com> References: <20030131003323.B42622A8A1@canning.wemm.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter Wemm wrote: > I beg to differ about PSE36. Since it still runs on 32 bit page tables, > all PSE36 does is enable 4MB mappings that are targeted at above the 4G > bounrary. It does this by shifting the PTD entries for 4MB pages across > by 4 bits in order to squeeze the extra bits in. Actually, they're 2M. They eat a bit there. 8-). > For some things this would be useful. But remember you can *only* use > it in 4MB chunks. Our VM system isn't geared for that and we'd have > to come up with an infrastructure to somehow get it to within reach of > userland. Maybe it could be used to provide backing store for things > like system V shared memory, but the lack of size granularity would make > it interesting. And since its 4MB chunks, forget paging and mmap etc. > PSE36 really treats memory above 4G as second-class. That's pretty much my point: the memory above 4G *is* second class, in that it requires making memory below 4G *unavailable* in order to make itself available, even if you use PAE. The problem is one of simultaneous access by multiple processes, and PSE36 at least allows that, if badly, whereas PAE doesn't. You're right about the VM system not being geared for it. Going to 2M instead of 4M "PSE pages" would be rather a pain, and that's just one of a half dozen issues. As to paging of 2M pages, I've actually always thought it needed to be fixed so that large pages could be supported directly via paging. It's not unreasonable to want to page at a ratio of 1:32,768, which is what you would be getting. Comparing 4K page on a 4G system, it's a 1:1,048,576 ratio; that's only really an denflation of 32 times in the number of pageable objects mapping an entire address space. > On the other hand, PAE treats all memory as "first class" and is useable > everywhere. The cost is that you need to do 64 bit idempotent writes to > the page tables if you ever want to use it on SMP. But at least it > can be used for page cache, generic process data, malloc etc etc. It's usable, but not simultaneously. A really good example here would be buffer cache entries and mbufs, for something like a "sendfile" operation. If you have an FTP server with this arrangement, and it's loaded enough to actually use the RAM, then you will end up with FTP clients that end up stalling each other at the driver level. You could *maybe* get around it by making sure that the network cards all did checksum offloading, were all capable of doing 64 bit addressing, and then pre-creating the mbuf list for the entire "wired" region of the file, well in excess of the sendspace limit. I've done that in a product or two (jacked around with ignoring the sendspace limit, and putting huge chains of mbufs on a list). But the cost of doing that is moving your mbufs to a 64 bit address space, seperate from the rest of the kernel. If you don't seperate inbound and outbound mbuf pools into 32 bit and 64 bit pools, then you have to face the possibility of dealing with the simultaneous access issue, for, for example, every mbuf in an mbuf chain for an m_pullup operation. The overhead for several TCP streams where you are doing that would be killer. I think it's probably better to acknowledge that the memory above 4G *is* second class, and then treat it as an L3 cache, and (maybe) a DMA target for transfers *into* it, but not for transfers out. It gets ugly fast, because of the cross-boundary stalls. To me, PAE is more like the segments in Windows 3.11; the OS has to be built from the ground up to expect them, and use them properly. 8-(. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E39C9FC.3EAF3345>