From owner-freebsd-hackers  Sat Nov 16 13:17:52 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E156137B401
	for <freebsd-hackers@freebsd.org>; Sat, 16 Nov 2002 13:17:47 -0800 (PST)
Received: from snipe.mail.pas.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3DEC743E3B
	for <freebsd-hackers@freebsd.org>; Sat, 16 Nov 2002 13:17:47 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0320.cvx40-bradley.dialup.earthlink.net ([216.244.43.65] helo=mindspring.com)
	by snipe.mail.pas.earthlink.net with esmtp (Exim 3.33 #1)
	id 18DAJw-0000de-00; Sat, 16 Nov 2002 13:17:40 -0800
Message-ID: <3DD6B5A6.1E867697@mindspring.com>
Date: Sat, 16 Nov 2002 13:16:22 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Gary Thorpe <gathorpe79@yahoo.com>
Cc: freebsd-hackers@freebsd.org
Subject: Re: bus_dmamem_alloc failing
References: <20021116185204.61345.qmail@web41211.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Gary Thorpe wrote:
> > Really, there's a lot of the kernel which could be pageable,
> > which would help this.  But for this to work, all the code
> > in the paging path has to be marked non-pageable.
> >
> > The way Windows handles this is to have seperate ELF sections
> > for pageable vs. unpageable vs. init vs. other code/data.  At
> > present, FreeBSD only supports the concept of code, data, and
> > BSS sections, so you would need to change the loader, if you
> > wanted to do this.
> 
> Does UNIX have the ability to specify "wired" pages that will not be
> paged out and will always remain in memory?

It can set these attributes on pages which already exist, but can
not enforce them.  Specifically, there are functions for doing it
(vm_page_wire(), vm_page_unwire()), but it's not something you can
tell the loader to do for you when a page is loaded into the KVA
space (at this time), and it's not a defined section attribute in
ELF (excepet in the Microsofot PELDR specification).


> > It's still possible to do wbithout kernel paging, but it will
> > be a lot harder.
> 
> Since most DMA controllers do not know about virtual memory but only
> physical memory, wouldn't it be unworkable to put memory used in dma
> transfers into virtual (paged) memory?

Yes.  Memory which is potentially a target of a DMA transfer is
usually wired.  There was a recent discussion about whether this
was necessary, in fact (I think that it simplifies matters a lot
for it to be wired).

As a rule, DMA transfers are to physical memory, not KVA space,
anyway, and the reason for having them in KVA space at all is to
permit pages to be cached; unless the kernel knows about the
pages, they may as well be lost.  The error in the idea that the
pages used for DMA could be not mapped into the KVA is that the
VM and buffer cache are unified in FreeBSD.  The savings, in that
case, are non-existant: what is being attempted is to DMA to user
buffers directly, which are mapped into a process address space,
and not into the KVA space.  The idea is that by doing this you
are "saving" copies.  This isn't true on a unified VM and buffer
cache system, in which what you are really doing is unsharing
cached data.  It really makes no sense for these pages to be not
in the KVA space -- doing that neglects the ability to mark any
modified pages dirty for writeback, which is a kernel dunction.


> A x86 DMA transfer also requires that this memory is physically
> contiguous right? I *think* other architectures can do dma to
> virtual memory (MIPS, SPARC???).

The motherboard DMA engine generally does not support scatter/gather,
if that's what you are asking.  Most card hardware these days has
built-in support for it, but you have to give it a list of buffers
to pretend are virtually contiguous.  The motherboard DMA is not
used for most things; old floppy controllers, etc., maybe.

Technically, you could use the AGP hardware to remap physical
memory from a virtual address into a fixed physical window; this
was also discussed recently: abusing AGP to make life easier in
the face of PSE36 and/or PAE for more than 4G of RAM.  It turns
out that you would need the newer version of the AGP specification
to support the use of more than a signel window.  Another issue is
that the window is "committed" for the duration of the controller's
"ownership" of the buffers -- between the time the request is made,
and the operation is complete.

The AGP approach is similar to the window mapping approach used in
the Alpha architecture.  There was a recent "TODO" discussion on the
-alpha list which referenced this... in retrospect, it's probably
better to bounce the buffers, than it is to stall requests waiting
until completion so the window location can be remapped.  This is
similar to the single AGP window case; it's really a software designer's
desire to get around hardware that's not designed how software people
want to use it, which leads to the desire to "abuse" AGP this way.


> Even if kernel paging is not necessarily the solution in this case,
> would it make some things easier?

Yes.  If the kernel is pageable, then you can relocate a virtual
page in physical memory in order to clear a contiguous run of
physical memory by treating it as a page out of the memory you
want to clear, followed by an immediate page in, to a different
physical page.  Unlike I/O paging, this also works with 4M pages
(or the 2M pages, in PAE mode).

The issue here is that you have to accumulate physical memory
into an immutable kernel allocation while you do this, so that any
other requestor doing the same thing doesn't step on your range
allocation.  Fragmentation is till possible, with the allocation
and deallocation of these sections leaving "too small" areas capable
of being relocated between physical allocations which can't be, but
that takes a long time to happen: there's a difference in the
persistance of allocations, and physical allocations are generally
all "long term", so they're unlikely to be freed back over a short
enough time that another physical allocation frags the physical
address space.  Physical allocations are also very rare, and tend
to be one-time events.


> However, wouldn't it also make the system much less reliable since
> FreeBSD overallocates memory?

Not necessarily.

> What do you do when the kernel triggers a page fault and there is
> nowhere to put the incoming page (and no swap space to page out the
> page it will replace)? Reliable in this case means random processes
> will not be killed.

Kernel paging does not necessarily have to participate in overcommit;
in other words, "Some pigs are more equal than others".  In the case
of kernel paging, you could commit to the physical RAM plus swap store
necessary to handle the kernel pages, with any left over going to the
user processes.

Note that right now, it's possible to have an overcommit of a KVA
space allocation, without the ability to obtain physical pages to
back the allocation.  This is in fact what leads to the failure of
Jeff's new allocator, when you run out of kmem_map space.  But even
before Jeff's changes there, it's possible to overcommit KVA space,
relative to the available physical memory.  What happens then is
that the zalloc() request would simply fail.  You would print an
out of mbufs warning to the console, and then give the mbuf that
came from the interrupt back to the network card (for example),
which is the same thing as dropping the packet (only you have to
eat overhead to do it), and then increment the "denied" counter
that shows up in "netstat -m"... not a problem.


> > Meanwhile, a common approach is to have a seperate partial
> > driver to do the memory allocation on your behalf, and hold
> > the references, but do nothing with them, so that you can load
> > and unload the actual driver mechanics while you are developing
> > the driver in the first place.  Then you reference the memory
> > allocated by the first driver from your module, instead of
> > allocating and deallocating it each time.
> 
> This would not free the resources used by the driver and would
> effectively mean that a driver can be enabled and disabled but never
> really unloaded.

Not exactly.  You only care about the resources in the bus_space,
and the reason you care about them is your inability to allocate
them, then free and reallocate them at will.  So your developement
driver can be loaded and unloaded all you want, but your stub can't.

Since the case we care about here is developement (practically, the
driver will be loaded once at boot time, once it's complete), there
is no problem with fragmentation or other issues for the driver we
are discussing.

If this were something like a video capture driver, which would be
loaded each time a capture is requested, and unloaded on completion,
to conserve kernel resources... then that's a different matter
entirely.

In any case, we are not talking about deploying a driver that has
an allocation stub, we are merely talking about using an allocation
stub during developement to avoid having to fix the underlying
problem, which is that the physical address space can become
fragmented, and FreeBSD is currently incapable of dealing with this
situation by defragging physical memory out from under the KVA.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message