From owner-freebsd-hackers Sat Dec 26 17:14:24 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA21927 for freebsd-hackers-outgoing; Sat, 26 Dec 1998 17:14:24 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from implode.root.com (root.com [208.221.12.98]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA21920; Sat, 26 Dec 1998 17:14:23 -0800 (PST) (envelope-from root@implode.root.com) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id RAA03548; Sat, 26 Dec 1998 17:10:48 -0800 (PST) Message-Id: <199812270110.RAA03548@implode.root.com> To: Matthew Dillon cc: cvs-all@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: Re: new swap system work In-reply-to: Your message of "Sat, 26 Dec 1998 16:07:29 PST." <199812270007.QAA33903@apollo.backplane.com> From: David Greenman Reply-To: dg@root.com Date: Sat, 26 Dec 1998 17:10:47 -0800 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > The first part I'm working on now and expect to commit sometime POST 3.0.1. > We'll see how long it takes me to get it solid. You will not commit anything like this without careful review by at least myself and perhaps others. > Basically, fixing the swap system requires moving the allocation of the > swap metadata structures out of the pageout code. To accomplish this, > vm_page_t will get a new field, called 'swapblk'. All swap-backed > memory-resident pages will have their swap blocks stored in the vm_page_t > rather then the swap-metadata structure. Swap blocks assigned to > resident pages do not have to be moved into the object swap metadata > structures until the page is actually freed (at which point there is > free memory available to allocate the swap metadata structure, hence > the ability to operate in a zero-free-page environment). This seems to assume that all pages are backed by swap, which is definately not the case. On many system, it is not even 'most'. I could almost swallow this if it was abstracted to a pager-private struct. > The side effects of doing this are all beneficial. I don't agree. I can think of at least two negatives: It bloats the vm_page struct and it makes a mess out of the layering. > The VM system becomes > more swap-aware and doesn't have to worry about free memory as much. I don't think this is a significant advantage. Most of the problems we've seen in the past are actually on the vnode pager side and not the swap pager side. > A great deal of simplification can be done all over place. I'm not convinced of this. I'm sure the code will be different, but I doubt it will be much simpler. > These > simplifications will take longer to accomplish since my goal is to get > the thing working first, but I think the long term prospects are very > good. Eventually we should be able to page out swap metadata associated > with active processes (but that's a long ways off). The raw swap > allocation / deallocation code (the rlist stuff) will also eventually be > rewritten to remove the memory blocking constraints that rlist_free > currently has and to make it possible to remove swap. It is possible to remove swap with the current framework. Noone has bothered to write the code to do it, however. It seems to me that it will be much more difficult to remove swap in the future if you put pager related storage data in each struct vm_page. > I'll start work on the second part after I finish the first part. Fixing > VOP_STRATEGY basically involves giving each device or filesystem its own > guarenteed pool of N private pages (e.g. like 5 or so per active device > or mount). Yuck. One of the benefits of 4.4BSD (and further work by us) was getting rid of private pools of memory. In some cases we reverted for performance reasons, but private pools almost always get in the way of dynamicly scaled systems. > Fixing VOP_STRATEGY() and the swapper will together allow reliable > paging to files and remove memory deadlock issues related to VFS > layering (e.g. like mounting a vn partition on top of NFS and then > mounting a filesystem through that) - though even so there are still a > number of deadlock issues still remaining in the VFS layering department. I think the deadlock issues are a bit overrated. The main problem that I know about has to do with allocating really large swap block arrays for large objects. There are ways of solving this at the swap pager level without moving it into the struct vm_page. -DG David Greenman Co-founder/Principal Architect, The FreeBSD Project To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message