From owner-freebsd-hackers Wed May 15 07:46:03 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id HAA23298 for hackers-outgoing; Wed, 15 May 1996 07:46:03 -0700 (PDT) Received: from brasil.moneng.mei.com (brasil.moneng.mei.com [151.186.109.160]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id HAA23292; Wed, 15 May 1996 07:45:58 -0700 (PDT) Received: (from jgreco@localhost) by brasil.moneng.mei.com (8.7.Beta.1/8.7.Beta.1) id JAA12093; Wed, 15 May 1996 09:45:08 -0500 From: Joe Greco Message-Id: <199605151445.JAA12093@brasil.moneng.mei.com> Subject: Re: A question for the VM gurus..! To: toor@dyson.iquest.net (John S. Dyson) Date: Wed, 15 May 1996 09:45:08 -0500 (CDT) Cc: dyson@freebsd.org, hackers@freebsd.org In-Reply-To: <199605150415.XAA19042@dyson.iquest.net> from "John S. Dyson" at May 14, 96 11:15:29 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > I think I'm going to have to reveal my stupidity here. :-( > > YOU ARE NOT STUPID :-). Sure I am. :-) I just fool a lot of people. > > I do not understand your statement. I would assume that "pages [that] are > > mapped in a process... [and] ...don't have to be faulted" would necessarily > > have to be resident. > > What you say is right, but there are pages that are resident that are NOT > mapped into the process address space. The VM system doesn't modify the > pte's until the process faults them. (That is not strictly true, but is > true in the case that I think that you are talking about.) So what I was > trying to say is that mincore would miss some of those pages that are really > in memory. Hmm, okay, so it seems to me that there is an "unresolved/the-vm-system- doesn't-yet-know-where-the-page-is-so-just-fault-on-access" state too, that clarifies it somewhat. :-) > > Now given 200 readers and 2,000,000 articles, the likelihood that any > > particular reader will access an article that someone else has recently read > > is fairly low. This data is ripe for being discarded ASAP, and it would be > > handy to tag these pages as "noncacheable" or "cacheable but VERY > > discardable". True, currently file accesses are file the open()/close() > > interface, but it is easy to mmap() the articles instead. Further, I can > > even mark the pages as MADV_SEQUENTIAL (only useful on large articles I > > would think), although I don't know how useful this hint would be to the VM > > system. > > How's about an ioctl or somesuch as a hint to the filesystem so that when > a file is closed, it's pages (or object), is marked somehow for quick > reuse (freeing?) You then could keep the read/write code, but an ioctl > (or fcntl) could be issued to change the behavior. (Note that I still > plan to do the madvise thing though :-)). Not portable. Solaris, at least, implements the madvise() stuff.. What it seems to do: If you set MADV_SEQUENTIAL on a region, if it has to fault a page "n" in, it looks like it discards all pages from 0 to n-1 in that region.. notably it does NOT seem to do anything if it doesn't have to fault a page in. MADV_RANDOM seems to cause a lot more faults if you are doing sequential accesses, as far as I can tell it just tells Solaris not to read ahead. MADV_DONTNEED appears to junk pages (asynchronously, from what I can tell) MADV_WILLNEED appears to fault pages (again, asynchrnously, from what I can tell). > > My other "pet project" requires a functional mincore() - the history file on > > a large news server may be 150-200MB, and I would like to create a daemon to > > handle history lookup requests. The file can be mapped and marked with > > MADV_RANDOM, and when a request comes in requiring a particular bit of data, > > pages that are found to be !mincore() can be marked with MADV_WILLNEED to > > ask the VM system to bring them in, while the code goes on to service other > > requests. I am trying to allow the process to spin through its connection > > list as rapidly as possible, and with 64MB or 128MB RAM, you can hopefully > > see how this would be very efficient (from an overall viewpoint). > > MADV_RANDOM would probably implemented by bringing in only one page at a time, > instead of a cluster. MADV_WILLNEED is problematical (a bit more difficult) > since we don't currently have a way to asynchronously read pages in -- but > it wouldn't be very hard. I have been looking into the possiblity of > adding kernel threads -- that could help the async VM read problem. I guess in the _short_ term I am most interested in support for MADV_DONTNEED, it is the most generally useful change for my application. However, it would be nice to have a full suite of this stuff in the future. I haven't seen too many Solaris applications that make use of these functions (although cat, mv, some printing stuff, and some audio tools seem to use it). Thanks, ... Joe ------------------------------------------------------------------------------- Joe Greco - Systems Administrator jgreco@ns.sol.net Solaria Public Access UNIX - Milwaukee, WI 414/546-7968