Date: Sat, 30 Nov 1996 17:13:53 +0100 From: Poul-Henning Kamp <phk@critter.tfs.com> To: "John S. Dyson" <toor@dyson.iquest.net> Cc: msmith@atrad.adelaide.edu.au (Michael Smith), imp@village.org, platforms@FreeBSD.org, dyson@FreeBSD.org Subject: Re: FreeBSD/MIPS anybody Message-ID: <5734.849370433@critter.tfs.com> In-Reply-To: Your message of "Sat, 30 Nov 1996 09:57:18 EST." <199611301457.JAA10579@dyson.iquest.net>
index | next in thread | previous in thread | raw e-mail
>I started a VM doc, but lost it in a disk crash. I might be able
>to start one again, but it would be a week or so at least to
>write (to complete.) Can't promise anything in the next week
>or so though, as bugs like the MSDOS problem appear regularly.
Would this by any chance be a help ?
Poul-Henning
Return-Path: toor@dyson.iquest.net
From: "John S. Dyson" <toor@dyson.iquest.net>
Message-Id: <199606170513.AAA07241@dyson.iquest.net>
Subject: Re: cvs commit: src/sys/vm pmap.h vm_page.c vm_pageout.c src/sys/i386/i386 pmap.c
To: phk@freebsd.org (Poul-Henning Kamp)
Date: Mon, 17 Jun 1996 00:13:10 -0500 (EST)
>
> > 5) The pmap code has been sorely in need of re-organization, and I
> > have taken a first (of probably many) steps. Please tell me
> > if you have any ideas.
>
> Since this is some of the first code to become a roadblock for a different
> architecture, and since our VM system is unique (in the most positive
> meaning of the word): Stick a comment before each function called from
> the arch-independent VM system and explain what the interface is and what
> the code shall do.
>
I don't disagree, and have been thinking about it. I have some info,
also (a first cut at some vm system docs). This is FYI, and remember
that my language skills are very very poor, also this is not fully
up to date now:
Here is a very rough description of the FreeBSD MACH-based VM system
internals... This document is not definitive, but meant as a quick reference
or overview. The source code is currently the ONLY definitive documentation.
If there is enough positive feedback from this document, I might be motivated
enough to fill this in with more detail. Routines or symbols that will
probably be supported forever in one way or another have a "SUPPORTED" notation.
Those routines that could be at risk sometime in the future have no such
notation.
Definitions:
Data Structure Equivalent
vm_map ............... Address space
vm_map_entry ......... Portion of address space pointing to only
one vm_object or another vm_map
vm_object ............ Repository for data
vm_page .............. Indivisible amount of data
pmap ................. Physical representation of Address space
(Note that the names above, with "_t" appended refer to pointers
as opposed to the structure itself... e.g. vm_map_t is a pointer to a
struct vm_map.)
Terminology Equivalent
pager ................ A sort-of class that is described by
information in the vm_object and the type
of access to external data
vnode_pager .......... Code in the vm system that knows how to do
paging I/O with filesystem files.
swap_pager ........... Code in the vm system that knows how to do
paging I/O with swap partitions and files.
Anonymous data in the system (e.g. bss) is
paged using this.
default_pager ........ A logical "placeholder" pager that takes
few system resources until paging is needed.
The default_pager is currently used only for
vm_object's that might need to be paged using
the swap_pager. When pageouts are needed,
objects that are marked "default" are converted
to "swap" with the associated allocation of
swap data structures.
device_pager ......... Code in the vm system that can provide memory
mapped I/O with memory mapped devices. Most
common use of this is X-Windows.
kva .................. Kernel virtual address, usually of type
vm_offset_t or caddr_t.
sva,eva,va ........... Virtual address(s). sva - start virtual
address, eva - end virtual address.
pa ................... Physical address.
m,p .................. Usually used for vm_page_t.
offset ............... Offsets into objects are usually vm_ooffset_t,
>>>> ^^^^^^^^^^^^
which translates into a long long.
wired ................ Not pageable.
clean ................ (As in "the page is clean"), means it is in
sync with the backing store.
Useful "handles" describing address spaces:
(warning -- in the most general case, you should make sure that
there is a "curproc"!!!). If your code is being called from
a system call or I/O initiation routine, you should be safe.
Current process address space (vm_map_t): (SUPPORTED)
&curproc->p_vmspace->vm_map
Current process pmap (pmap_t): (SUPPORTED)
&curproc->p_vmspace->vm_pmap
Kernel address space (vm_map_t): (SUPPORTED)
kernel_map
Kernel pmap (pmap_t): (SUPPORTED)
kernel_pmap, (best referred to as vm_map_pmap(kernel_map))
Less commonly used "handles":
Address spaces (submaps of kernel_map, unless noted otherwise):
never checked for modification: clean_map
buffer cache: buffer_map (submap of clean_map)
pager and cluster buffers: pager_map (submap of clean_map)
used for bounce buffers: io_map (submap of clean_map)
malloc and mbuf cluster area: kmem_map
mbuf clusters: mb_map (submap of kmem_map)
args during exec: exec_map
temporary mapping of exec hdr: exech_map
UPAGES per process: upage_map
Important macros:
pa = VM_PAGE_TO_PHYS(m); (SUPPORTED)
returns the physical address for a vm_page_t.
m = PHYS_TO_VM_PAGE(pa);
returns the vm_page_t associated with a physical address.
(try to avoid PHYS_TO_VM_PAGE -- it doesn't always work,
because not every physical address has a page, and it
usually implies a design flaw, or a quick work-around
that needs to be corrected in the future.)
PAGE_WAKEUP(m); (SUPPORTED)
This is used to free the lock on a page as represented
by the PG_BUSY bit. Other processes that are waiting
on that page are waken up. In order to wait on a page
the following could be done:
s = splhigh();
while ((m->flags & PG_BUSY) || m->busy) {
m->flags |= PG_WANTED;
tsleep(m, PVM, "xxxxxx", 0);
}
splx(s);
>>>> should a macro be made for this ? ^^^^^
You would do that normally after a vm_page_lookup.
VM_WAIT; (SUPPORTED)
Use this if you have tried to do a vm_page_alloc
in non-interrupt state. This blocks your process and wakes
up the pageout daemon. When this returns, there likely will
have some memory, so vm_page_alloc can be retried.
Likely return values from most of the vm routines <vm/vm_param.h>:
KERN_SUCCESS, KERN_INVALID_ADDRESS, KERN_PROTECTION_FAILURE,
KERN_NO_SPACE, KERN_INVALID_ARGUMENT, KERN_FAILURE,
KERN_RESOURCE_SHORTAGE, KERN_NOT_RECEIVER?, KERN_NO_ACCESS?
Important X86 tidbit:
The kernel_pmap is always effectively mapped into the user's pmap.
When referring to kernel space, one should use the kernel_pmap, and
all processes will see the change in the kernel.
Memory queues:
vm_page_queue_free -- free pages
vm_page_queue_zero -- free pages that are zero
vm_page_queue_cache -- free pages that still have info
may NOT be BUSY or mapped.
vm_page_queue_active -- active pages
vm_page_queue_inactive -- inactive pages
Commonly needed VM system routines:
int vm_map_find(map, object, offset, addr, length, find_space,
prot, max, cow);
(SUPPORTED)
This finds AND allocates virtual space from the specified map
(Address space). The user can optionally specify a vm object
to map into the space (e.g. mapped file.)
The parameters associated with the address space include:
map -- The specific vm_map_t involved with the op
addr -- Ptr to the address in the vm_map
length -- Length of the mapping in bytes
Note that the address (addr) above is equivalent to
the address in a process or in the kernel. If the address
is >= VM_MIN_KERNEL_ADDRESS you MUST use kernel_map, and
not &curproc->p_vmspace->vm_map!!! Secondary note, unless
you *really* know what you are doing, do not do a vm_map_find
in the kernel map. Please use kmem_alloc instead.
If you specify an initial value for addr, and find_space
is zero, then the allocation request will succeed only if
there is enough memory available at the specified address.
The parameters associated with the vm object:
object -- Optional VM object -- if NULL, a default
pager object will be created as needed
offset -- Offset into the object (long long, vm_ooffset_t)
Additional parameters modifying the operation of the routine:
find_space -- If there is no space at 'addr', space is
found after that place.
prot,max -- R/W permissions to address space:
VM_PROT_READ, VM_PROT_WRITE, VM_PROT_EXEC
cow -- Copy-on-write, original obj is NOT modified.
Error returns:
KERN_SUCCESS -- Operation completed
KERN_INVALID_ADDRESS -- Address specified is invalid
KERN_NO_SPACE -- No space in the map
int vm_map_remove(map, start, end);
(SUPPORTED)
This routine deallocates the virtual space between start and end.
All objects that are backing this space are deallocated as appropriate.
This is sort-of inverse of vm_map_find above. Always returns
KERN_SUCCESS.
int vm_map_protect(map, start, end, new_prot, set_max);
(SUPPORTED)
Changes the access permissions for a virtual address range
in the specified map. This routine makes all necessary modfications
to the pmap associated with the map also.
int vm_map_pageable(map, start, end, new_pageable);
(SUPPORTED)
Allows sections of a map to be wired or unwired into memory.
int vm_map_check_protection(map, start, end, protection);
int vm_map_lookup(map, addr, fault_type, out_entry, object, pindex,
out_prot, wired, single_use);
vm_page_t vm_page_alloc(object, pindex, flags);
(SUPPORTED)
flags -- VM_ALLOC_NORMAL normal process allocation
VM_ALLOC_SYSTEM preferential allocation
VM_ALLOC_INTERRUPT allocate interrupt-safely
VM_ALLOC_ZERO normal process with priority
to zero pages
NON-BLOCKING.
This is the lowest level page allocation routine. A NULL is returned
if the allocation cannot be currently satisified. The pages are
returned to the user with the PG_BUSY bit set and are not on any
queue. After allocating the page, it is a good idea to issue
a PAGE_WAKEUP(m) on the page, and at least wire the page.
void vm_page_free(object, pindex);
(SUPPORTED)
This is the lowest level page free routine. This routine does NOT
remove ANY mappings associated with the page. Chaos will ensue if
the page is not properly removed from all pmap's. A normally used
page can be removed from all pmap's by a
vm_page_protect(m,VM_PROT_NONE);
However, kernel mappings must be removed one-by-one.
void vm_page_activate(m);
void vm_page_deactivate(m);
void vm_page_cache(m);
void vm_page_wire(m);
(SUPPORTED)
NON-BLOCKING.
These are the queue manipulation routines. These are used to affect
the policy of the paging and allocation system. If a page is activated,
it is not likely to be freed soon. If it is deactivated, it will more
likely be used. Cached pages are similar to freed pages, available
for allocation, but still have their identity for quick reuse.
If a page is not in one of the other states for a long time, it is
best to wire it so the system can at least account for it. A page
that is wired is "hidden" from the pageout daemon.
vm_object_t vm_object_allocate(type, size);
type -- OBJT_DEFAULT, default -- converts to swap
OBJT_VNODE, vnode object
OBJT_SWAP, swap object
OBJT_DEVICE, device object
This is the routine that creates an object. The user should only
normally create objects of type OBJT_DEFAULT. Note that the
size is in units of pages.
vm_object_t vm_pager_allocate(type, handle, size, prot, foff);
(SUPPORTED)
type -- OBJT_DEFAULT, default -- converts to swap
OBJT_VNODE, vnode object
OBJT_SWAP, swap object
OBJT_DEVICE, device object
This is the routine that creates an object and associates the
object with a file. If the object already exists, the reference
count for the object will be incremented. In the case of a
vnode object, the handle is the vnode pointer, and the foff and prot
are both ignored. In the case of a swap object the handle is a
unique 32bit number (probably address), and the foff and prot are
both ignored. The handle for a device object is likely the
device vnode, the prot is the protection that the memory device
can support, and the foff is the offset into the device.
vm_object_deallocate(object);
(SUPPORTED)
This routine decrements the reference bit for the object, potentially
freeing it.
vm_page_protect(m, prot);
(SUPPORTED)
Used to turn off permissions for pages mapped into processes.
vm_page_protect(m, VM_PROT_READ) helps implement COW, and
vm_page_protect(m, VM_PROT_NONE) is an important step in freeing pages.
vm_fault(map, vaddr, fault_type, change_wiring);
(SUPPORTED)
Does the things necessary to bring a page into a processes
address space. The most common use of this routine is in the
trap code to implement demand-paging. Most normal driver
or system use would be as follows:
vm_fault(map, vaddr, VM_PROT_READ or (VM_PROT_READ|VM_PROT_WRITE), 0);
KMEM series of operations (meant to be used on kernel_map or submaps of
kernel_map), they always return page aligned addresses.
kva = kmem_alloc(map, size);
kva = kmem_alloc_pageable(map, size);
(SUPPORTED)
kmem_alloc and kmem_alloc_pageable each allocate space from the
kernel_map (or any of it's submaps except kmem_map). If memory
is being allocated (instead of just virtual space), you should
generally use kmem_alloc. kmem_alloc_pageable does not do all
of the correct things in all cases for the setup of the underlying
kernel_object offset. It is best to use kmem_alloc_pageable when
you plug the pages directly into the kernel address space.
kmem_free(map,addr,size);
(SUPPORTED)
Use kmem_free to give back the kernel address space as allocated
by kmem_alloc or kmem_malloc. Be careful to remove any mappings
specifically created by pmap_enter before freeing the address range.
kva = kmem_malloc(map, size, waitflag);
Use this special form of kmem_alloc for kmem_map or mb_map. Except
for current usage, it is best not to use kmem_malloc in new kernel
extensions. It is best to use malloc/free for things that you CAN
use kmem_malloc for.
MALLOC/FREE (refer to /sys/sys/malloc.h for available types.) These return
aligned memory, but not necessarily on 1 page boundaries.
kva = malloc(size, type, flags);
(SUPPORTED)
flags = M_NOWAIT (call like this from interrupt level.)
= M_KERNEL (preferential allocation of memory.)
= M_WAITOK (normal call for non-interrupt level.)
malloc is callable using M_NOWAIT from both splbio and splimp
interrupt levels.
(void) free(kva, type);
(SUPPORTED)
The kva specified to free must be identical to the one returned
by malloc. The type likewise should be the same, otherwise malloc
usage accounting will not work correctly.
>>>> more like "..., otherwise the system panics."
PMAP routines. These routines are the lowest level defined interface to
the processor memory management hardware. Given the virtual addresses
have been set-up correctly, pmap can be kernel_pmap, the current processes'
pmap or in some cases, another processes pmap.
void pmap_enter(pmap, va, pa, prot, wired);
(SUPPORTED)
map a single page into the physical address space.
void pmap_remove(pmap, sva, eva);
(SUPPORTED)
remove a range of pages from the physical address space.
pa = pmap_extract(pmap, va);
(SUPPORTED)
get the physical address associated with the specified
mapped page.
pa = pmap_kextract(pmap, va);
(SUPPORTED)
same as pmap_extract, except is much more efficient and
works only for the kernel_pmap.
>>>> or submaps ? why else pmap arg ?
va = pmap_map(va, startp, endp, prot);
(SUPPORTED)
map a contiguous range of pages from physical address startp
through endp at virtual address va. The returned address
points to the next address that can be used for mapping.
pmap_protect(pmap, sva, eva, prot);
(SUPPORTED)
Removes permissions from page protections on pages in the
specified range. It does NOT remove protections for other
pmaps on the pages.
pmap_qenter(va, m, count);
pmap_qremove(va, count);
(PROBABLY SUPPORTED)
pmap_qenter/pmap_qremove are used for fast kernel mappings
of vm_page's allocated from the VM system. The implied pmap
is kernel_pmap, and must refer to va's that
are >= VM_MIN_KERNEL_ADDRESS. Usually one would use address
that were returned by kmem_alloc_pageable.
The second argument to pmap_qenter is a pointer to an array
of pages. This is used often in the buffer cache code for
quick mapping of vm_page_t's.
pmap_kenter(va, pa);
pmap_kremove(va);
(PROBABLY SUPPORTED)
pmap_kenter/pmap_kremove are used for fast kernel mappings.
The implied pmap is kernel_pmap and must refer to va's that
are >= VM_MIN_KERNEL_ADDRESS. Usually one would use address
that were returned by kmem_alloc_pageable.
pmap_growkernel(topaddr);
This routine supports the creation of additional pagetable
pages to encompass the address "topaddr". Kind-of the
equiv of sbrk for the kernel. FreeBSD does not need to
preallocate all of the needed kernel pagetables up-front
because of this routine.
pmap_destroy(pmap);
Decrements pmap ref-count, and if zero, destroy's it.
pmap_reference(pmap);
Increments pmap ref-count.
pmap_pinit(pmap)
Creates a pmap.
pmap_object_init_pt(pmap, addr, object, pindex, size);
Prefaults pages into a processes pmap. If the pages are
in memory, they are placed directly into a processes address
space. This is called at mmap time.
pmap_prefault(pmap, addra, entry, object);
Prefaults pages into a processes pmap. This only places
pages that are in a region around the specified address.
This is called at vm_fault time.
pmap_change_wiring(pmap, va, wired);
This notates the page as being wired. This DOES NOT
actually wire the page.
pmap_copy(dst_pmap, src_pmap, dst_addr, len, src_addr);
This is a routine that might be used to short-circuit
faulting pages into an address space from another. It
is currently NOT used.
pmap_zero_page(dstpa);
(SUPPORTED)
This is the routine that is used to zero a page for demand
zero.
pmap_copy_page(srcpa,dstpa);
(SUPPORTED)
This is the routine that is used to copy a page for COW.
pmap_pageable(pmap, sva, eva, pageable);
This notates a range of pages as being pageable and is
information. It is currently NOT used.
pmap_page_protect(dstpa, prot);
Decreases the protection for a given page. It is used
to remove a page from all address spaces (for example,
prior to being freed), or to write-protect (for example,
for setting up an address space for COW.) This routine
should not normally be used, vm_page_protect is vastly
superior.
The pte bit routines below are much more complicated than they
appear, because they have to check the pte's for each page in
every pmap that the page is mapped.
pmap_is_referenced(srcpa);
(SUPPORTED)
Senses the reference bit on a given page.
pmap_is_modified(srcpa);
(SUPPORTED)
Senses the modified bit on a given page.
pmap_clear_modify(dstpa);
(SUPPORTED)
Clears the modified bit for a given page.
pmap_clear_reference(dstpa);
(SUPPORTED)
Clears the reference bit for a given page.
kva = pmap_mapdev(pa, size);
(SUPPORTED)
Maps device memory into the kernel. kva space is allocated, and
the physical device is mapped directly into the kernel_pmap ptes.
This allows full memory access to the device from the kernel.
Additional miscellaneous routines that are useful to kernel developers,
but refer to them in the source. They most likely will be around for a
"long time."
vmspace_alloc(min, max, pageable);
vmspace_free(vm);
vm_map_reference(map);
vm_map_deallocate(map);
vm_map_insert(map, object, offset, start, end, prot, max, cow);
vm_map_findspace(map, start, length, addr);
vm_map_lookup(map, address, entry);
vm_map_inherit(map, start, end, new_inheritance);
vm_map_clean(map, start, end, syncio, invalidate);
--
Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox.
whois: [PHK] | phk@tfs.com TRW Financial Systems, Inc.
Power and ignorance is a disgusting cocktail.
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5734.849370433>
