From owner-freebsd-hackers Sat Dec 26 16:07:45 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA17832 for freebsd-hackers-outgoing; Sat, 26 Dec 1998 16:07:45 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA17818; Sat, 26 Dec 1998 16:07:43 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.1/8.9.1) id QAA33903; Sat, 26 Dec 1998 16:07:29 -0800 (PST) (envelope-from dillon) Date: Sat, 26 Dec 1998 16:07:29 -0800 (PST) From: Matthew Dillon Message-Id: <199812270007.QAA33903@apollo.backplane.com> To: cvs-all@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: new swap system work Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG After a number of long conversations with John Dyson in regards to memory deadlock issues and related kernel hacks used to get around them, I've decided to embark on a major project to revamp the vm/pager_* code to allow paging to occur in zero-memory-free situations. This work is going to be mostly concentrated in vm/swap_pager.c but will have a ripple effect throughout the core VM system (vm/*.c) and the memory subsystem. The work is going to be broken down into two parts: * rewriting the swap_pager. * filesystem / VOP_STRATEGY work to guarentee that all VOP_STRATEGY() calls for all filesystems and devices will operate without a memory deadlock occuring in zero-free-memory situations. The first part I'm working on now and expect to commit sometime POST 3.0.1. We'll see how long it takes me to get it solid. Basically, fixing the swap system requires moving the allocation of the swap metadata structures out of the pageout code. To accomplish this, vm_page_t will get a new field, called 'swapblk'. All swap-backed memory-resident pages will have their swap blocks stored in the vm_page_t rather then the swap-metadata structure. Swap blocks assigned to resident pages do not have to be moved into the object swap metadata structures until the page is actually freed (at which point there is free memory available to allocate the swap metadata structure, hence the ability to operate in a zero-free-page environment). The side effects of doing this are all beneficial. The VM system becomes more swap-aware and doesn't have to worry about free memory as much. A great deal of simplification can be done all over place. These simplifications will take longer to accomplish since my goal is to get the thing working first, but I think the long term prospects are very good. Eventually we should be able to page out swap metadata associated with active processes (but that's a long ways off). The raw swap allocation / deallocation code (the rlist stuff) will also eventually be rewritten to remove the memory blocking constraints that rlist_free currently has and to make it possible to remove swap. - I'll start work on the second part after I finish the first part. Fixing VOP_STRATEGY basically involves giving each device or filesystem its own guarenteed pool of N private pages (e.g. like 5 or so per active device or mount). The device drivers will then be modified such that they are able to guarentee operation without memory deadlock when operating solely out of their private pool (i.e. when no system global free pages are available). So, for example, a VOP_*() call could still block on memory, but the use of the private pool means it would be guarenteed to unblock sometime later as other I/O's in progress on that device complete and free private pool memory. The system's global free pool could then be reduced accordingly, an overall wash. Still, I think the use of private pools will actually make low-memory FreeBSD configurations more efficient. Fixing VOP_STRATEGY() and the swapper will together allow reliable paging to files and remove memory deadlock issues related to VFS layering (e.g. like mounting a vn partition on top of NFS and then mounting a filesystem through that) - though even so there are still a number of deadlock issues still remaining in the VFS layering department. -Matt Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message