FreeBSD Mail Archives

Date:      Thu, 30 Jan 2003 17:22:11 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Scott Long <scott_long@btc.adaptec.com>
Cc:        Julian Elischer <julian@elischer.org>, David Schultz <dschultz@uclink.berkeley.edu>, "Andrew R. Reiter" <arr@watson.org>, arch@freebsd.org
Subject:   Re: PAE (was Re: bus_dmamem_alloc_size())
Message-ID:  <3E39CFC3.9EF4A67E@mindspring.com>
References:  <Pine.BSF.4.21.0301301059310.35796-100000@InterJet.elischer.org> <3E39B52E.E46AF9EA@mindspring.com> <3E39C764.3070500@btc.adaptec.com>

Scott Long wrote:
> > Using the memory by declaring a small copy window that's accessed
> > via PAE in the kernel, and not really supporting PAE at all, can
> > make this work...
> 
> [...]
> 
> This troll is totally unneccessary.  Making peripheral devices work with
> PAE is a matter handled between the device driver and the busdma system.
>   Drivers that cannot pass 64 bit bus addresses to their hardware will
> have the data bounced by busdma, just like what happens in the ISA
> world.  The whole point of the busdma push that Robert and Maxime
> started a few months ago is to prepare drivers for the possible coming
> of PAE.

This is a great idea, until you get that scatter/gather for network
cards won't work very well in the context of the mbuf system, as
it exists today, unless you are willing to split incoming and
outgoing mbufs into two different pools, or you're willing to add
a copy operation to everything.

> Honestly, though, if you're going to spend the money on a PAE-capable
> motherboard and all the memory to go along with it, are you really going
> to put a Realtek nic and an Advansys scsi card into it?

I'm going to have whatever the manufacturer put on the motherboard,
most likely, which may or may not be 64bit capable.

If I'm spending all the money building it up from "to spec"
components in the first place, I'm more likely to just buy a
64bit machine, instead.  My biggest cost is going to end up
going to 3rd parth 64 bit capable cards, and RAM, anyway.

> Also, the PAE work that might happen is not going to affect the vast
> majority of FreeBSD/i386 users at all; I can only imagine that it will
> be a config(8) option that will most likely default to 'off'.

This would result in potentially significant duplicate sections
of code in the VM system, seperated by #ifdef's, if true, unless
all the VM references that needed to switch between 32 and 36 bits
were macrotized, and certain parts rewritten from scratch.  That's
always possible, I suppose.

> There is nothing to bikeshed here.  Please respect that there are people
> who need PAE, understand PAE, and will happily accept PAE.  Those who do
> not need, understand, or accept it can go along with their lives
> blissfully happy with it turned off.

Realize that I've personally built a system with 4G of memory,
based on FreeBSD, that could handle 1.6M simultaneous connections,
for a proxy caching company.  We had a lot of reason to look into
PAE, because number of simultaneous connections and number of mbufs
available for caching data, are inversely proportional (obviously).
Using PAE was one potential approach to the "add more RAM" approach
to throwing resources rather than intelligence at the problem.

The problem with using PAE for this application is that the mbuf
chains can not be simultaneously available in the inbound and
outbound space, without copying.

Now it doesn't matter whether the inbound space is from a network
card, or from a disk controller: if there is host processing that
has to take place, then you have to span multiple of these PAE
pages simultaneously.

As Peter rightly points out, the regions are large enough to be
problematic for paging.  Effectively, you have to disassociate the
VM and buffer cache, or find some way of supporting paging of
much-larger-than-4K units.

I have yet to see someone suggest a real application for PAE that
wasn't tantamount to an L3 cache and/or a RAMdisk.  It does not
increase your UVA or KVA size above the 4G limit: all your pointers
your compiler generates are still 32 bits, and you are still limited
to 4G.

What *would* have been useful is if the Intel guys had gone 64bit,
like the AMD folks did, so that the UVA or KVA or both could be
made larger than 4G.

Frankly, the most useful thing that might come out of this is a
change to the copyin/copyout/copyinstr/etc. code to seperate the
UVA and KVA spaces, making them both 4G.  At *that* point, it
could be useful to make programs larger.  But you could have that
*without* PAE, and with PAE, you would *still* need to split the
VM and buffer cache apart to create a copy boundary for kernel
vs. user data.

At least with an explicit coherency requirment, and the code to
implement it, we could expect FS stacking to start working like
it was designed to work, ten years ago.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E39CFC3.9EF4A67E>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation