Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 May 1996 09:45:08 -0500 (CDT)
From:      Joe Greco <jgreco@brasil.moneng.mei.com>
To:        toor@dyson.iquest.net (John S. Dyson)
Cc:        dyson@freebsd.org, hackers@freebsd.org
Subject:   Re: A question for the VM gurus..!
Message-ID:  <199605151445.JAA12093@brasil.moneng.mei.com>
In-Reply-To: <199605150415.XAA19042@dyson.iquest.net> from "John S. Dyson" at May 14, 96 11:15:29 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > I think I'm going to have to reveal my stupidity here.  :-(
> 
> YOU ARE NOT STUPID :-).

Sure I am.  :-)  I just fool a lot of people.

> > I do not understand your statement.  I would assume that "pages [that] are
> > mapped in a process... [and] ...don't have to be faulted" would necessarily
> > have to be resident.
> 
> What you say is right, but there are pages that are resident that are NOT
> mapped into the process address space.  The VM system doesn't modify the
> pte's until the process faults them.  (That is not strictly true, but is
> true in the case that I think that you are talking about.)  So what I was
> trying to say is that mincore would miss some of those pages that are really
> in memory.

Hmm, okay, so it seems to me that there is an "unresolved/the-vm-system-
doesn't-yet-know-where-the-page-is-so-just-fault-on-access" state too, that
clarifies it somewhat.  :-)

> > Now given 200 readers and 2,000,000 articles, the likelihood that any
> > particular reader will access an article that someone else has recently read
> > is fairly low.  This data is ripe for being discarded ASAP, and it would be
> > handy to tag these pages as "noncacheable" or "cacheable but VERY
> > discardable".  True, currently file accesses are file the open()/close()
> > interface, but it is easy to mmap() the articles instead.  Further, I can
> > even mark the pages as MADV_SEQUENTIAL (only useful on large articles I
> > would think), although I don't know how useful this hint would be to the VM
> > system.
> 
> How's about an ioctl or somesuch as a hint to the filesystem so that when
> a file is closed, it's pages (or object), is marked somehow for quick
> reuse (freeing?)  You then could keep the read/write code, but an ioctl
> (or fcntl) could be issued to change the behavior.  (Note that I still
> plan to do the madvise thing though :-)).

Not portable.  Solaris, at least, implements the madvise() stuff..

What it seems to do:

If you set MADV_SEQUENTIAL on a region, if it has to fault a page "n" in, it
looks like it discards all pages from 0 to n-1 in that region..  notably it
does NOT seem to do anything if it doesn't have to fault a page in.

MADV_RANDOM seems to cause a lot more faults if you are doing sequential
accesses, as far as I can tell it just tells Solaris not to read ahead.

MADV_DONTNEED appears to junk pages (asynchronously, from what I can tell)

MADV_WILLNEED appears to fault pages (again, asynchrnously, from what I can
tell).

> > My other "pet project" requires a functional mincore() - the history file on
> > a large news server may be 150-200MB, and I would like to create a daemon to
> > handle history lookup requests.  The file can be mapped and marked with
> > MADV_RANDOM, and when a request comes in requiring a particular bit of data,
> > pages that are found to be !mincore() can be marked with MADV_WILLNEED to
> > ask the VM system to bring them in, while the code goes on to service other
> > requests.  I am trying to allow the process to spin through its connection
> > list as rapidly as possible, and with 64MB or 128MB RAM, you can hopefully
> > see how this would be very efficient (from an overall viewpoint).
> 
> MADV_RANDOM would probably implemented by bringing in only one page at a time,
> instead of a cluster.  MADV_WILLNEED is problematical (a bit more difficult)
> since we don't currently have a way to asynchronously read pages in -- but
> it wouldn't be very hard.  I have been looking into the possiblity of
> adding kernel threads -- that could help the async VM read problem.

I guess in the _short_ term I am most interested in support for
MADV_DONTNEED, it is the most generally useful change for my application.
However, it would be nice to have a full suite of this stuff in the future.

I haven't seen too many Solaris applications that make use of these
functions (although cat, mv, some printing stuff, and some audio tools 
seem to use it).

Thanks,

... Joe

-------------------------------------------------------------------------------
Joe Greco - Systems Administrator			      jgreco@ns.sol.net
Solaria Public Access UNIX - Milwaukee, WI			   414/546-7968



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605151445.JAA12093>