From owner-freebsd-hackers Sat Nov 16 13:17:52 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E156137B401 for ; Sat, 16 Nov 2002 13:17:47 -0800 (PST) Received: from snipe.mail.pas.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3DEC743E3B for ; Sat, 16 Nov 2002 13:17:47 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0320.cvx40-bradley.dialup.earthlink.net ([216.244.43.65] helo=mindspring.com) by snipe.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 18DAJw-0000de-00; Sat, 16 Nov 2002 13:17:40 -0800 Message-ID: <3DD6B5A6.1E867697@mindspring.com> Date: Sat, 16 Nov 2002 13:16:22 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Gary Thorpe Cc: freebsd-hackers@freebsd.org Subject: Re: bus_dmamem_alloc failing References: <20021116185204.61345.qmail@web41211.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Gary Thorpe wrote: > > Really, there's a lot of the kernel which could be pageable, > > which would help this. But for this to work, all the code > > in the paging path has to be marked non-pageable. > > > > The way Windows handles this is to have seperate ELF sections > > for pageable vs. unpageable vs. init vs. other code/data. At > > present, FreeBSD only supports the concept of code, data, and > > BSS sections, so you would need to change the loader, if you > > wanted to do this. > > Does UNIX have the ability to specify "wired" pages that will not be > paged out and will always remain in memory? It can set these attributes on pages which already exist, but can not enforce them. Specifically, there are functions for doing it (vm_page_wire(), vm_page_unwire()), but it's not something you can tell the loader to do for you when a page is loaded into the KVA space (at this time), and it's not a defined section attribute in ELF (excepet in the Microsofot PELDR specification). > > It's still possible to do wbithout kernel paging, but it will > > be a lot harder. > > Since most DMA controllers do not know about virtual memory but only > physical memory, wouldn't it be unworkable to put memory used in dma > transfers into virtual (paged) memory? Yes. Memory which is potentially a target of a DMA transfer is usually wired. There was a recent discussion about whether this was necessary, in fact (I think that it simplifies matters a lot for it to be wired). As a rule, DMA transfers are to physical memory, not KVA space, anyway, and the reason for having them in KVA space at all is to permit pages to be cached; unless the kernel knows about the pages, they may as well be lost. The error in the idea that the pages used for DMA could be not mapped into the KVA is that the VM and buffer cache are unified in FreeBSD. The savings, in that case, are non-existant: what is being attempted is to DMA to user buffers directly, which are mapped into a process address space, and not into the KVA space. The idea is that by doing this you are "saving" copies. This isn't true on a unified VM and buffer cache system, in which what you are really doing is unsharing cached data. It really makes no sense for these pages to be not in the KVA space -- doing that neglects the ability to mark any modified pages dirty for writeback, which is a kernel dunction. > A x86 DMA transfer also requires that this memory is physically > contiguous right? I *think* other architectures can do dma to > virtual memory (MIPS, SPARC???). The motherboard DMA engine generally does not support scatter/gather, if that's what you are asking. Most card hardware these days has built-in support for it, but you have to give it a list of buffers to pretend are virtually contiguous. The motherboard DMA is not used for most things; old floppy controllers, etc., maybe. Technically, you could use the AGP hardware to remap physical memory from a virtual address into a fixed physical window; this was also discussed recently: abusing AGP to make life easier in the face of PSE36 and/or PAE for more than 4G of RAM. It turns out that you would need the newer version of the AGP specification to support the use of more than a signel window. Another issue is that the window is "committed" for the duration of the controller's "ownership" of the buffers -- between the time the request is made, and the operation is complete. The AGP approach is similar to the window mapping approach used in the Alpha architecture. There was a recent "TODO" discussion on the -alpha list which referenced this... in retrospect, it's probably better to bounce the buffers, than it is to stall requests waiting until completion so the window location can be remapped. This is similar to the single AGP window case; it's really a software designer's desire to get around hardware that's not designed how software people want to use it, which leads to the desire to "abuse" AGP this way. > Even if kernel paging is not necessarily the solution in this case, > would it make some things easier? Yes. If the kernel is pageable, then you can relocate a virtual page in physical memory in order to clear a contiguous run of physical memory by treating it as a page out of the memory you want to clear, followed by an immediate page in, to a different physical page. Unlike I/O paging, this also works with 4M pages (or the 2M pages, in PAE mode). The issue here is that you have to accumulate physical memory into an immutable kernel allocation while you do this, so that any other requestor doing the same thing doesn't step on your range allocation. Fragmentation is till possible, with the allocation and deallocation of these sections leaving "too small" areas capable of being relocated between physical allocations which can't be, but that takes a long time to happen: there's a difference in the persistance of allocations, and physical allocations are generally all "long term", so they're unlikely to be freed back over a short enough time that another physical allocation frags the physical address space. Physical allocations are also very rare, and tend to be one-time events. > However, wouldn't it also make the system much less reliable since > FreeBSD overallocates memory? Not necessarily. > What do you do when the kernel triggers a page fault and there is > nowhere to put the incoming page (and no swap space to page out the > page it will replace)? Reliable in this case means random processes > will not be killed. Kernel paging does not necessarily have to participate in overcommit; in other words, "Some pigs are more equal than others". In the case of kernel paging, you could commit to the physical RAM plus swap store necessary to handle the kernel pages, with any left over going to the user processes. Note that right now, it's possible to have an overcommit of a KVA space allocation, without the ability to obtain physical pages to back the allocation. This is in fact what leads to the failure of Jeff's new allocator, when you run out of kmem_map space. But even before Jeff's changes there, it's possible to overcommit KVA space, relative to the available physical memory. What happens then is that the zalloc() request would simply fail. You would print an out of mbufs warning to the console, and then give the mbuf that came from the interrupt back to the network card (for example), which is the same thing as dropping the packet (only you have to eat overhead to do it), and then increment the "denied" counter that shows up in "netstat -m"... not a problem. > > Meanwhile, a common approach is to have a seperate partial > > driver to do the memory allocation on your behalf, and hold > > the references, but do nothing with them, so that you can load > > and unload the actual driver mechanics while you are developing > > the driver in the first place. Then you reference the memory > > allocated by the first driver from your module, instead of > > allocating and deallocating it each time. > > This would not free the resources used by the driver and would > effectively mean that a driver can be enabled and disabled but never > really unloaded. Not exactly. You only care about the resources in the bus_space, and the reason you care about them is your inability to allocate them, then free and reallocate them at will. So your developement driver can be loaded and unloaded all you want, but your stub can't. Since the case we care about here is developement (practically, the driver will be loaded once at boot time, once it's complete), there is no problem with fragmentation or other issues for the driver we are discussing. If this were something like a video capture driver, which would be loaded each time a capture is requested, and unloaded on completion, to conserve kernel resources... then that's a different matter entirely. In any case, we are not talking about deploying a driver that has an allocation stub, we are merely talking about using an allocation stub during developement to avoid having to fix the underlying problem, which is that the physical address space can become fragmented, and FreeBSD is currently incapable of dealing with this situation by defragging physical memory out from under the KVA. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message