From owner-freebsd-hackers  Sat Dec 16 15:56:55 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id PAA11748
          for hackers-outgoing; Sat, 16 Dec 1995 15:56:55 -0800 (PST)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id PAA11739
          for <hackers@FreeBSD.ORG>; Sat, 16 Dec 1995 15:56:52 -0800 (PST)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id QAA08488; Sat, 16 Dec 1995 16:54:10 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199512162354.QAA08488@phaeton.artisoft.com>
Subject: Re: mmap and memory utilization
To: daveho@infocom.com (David Hovemeyer)
Date: Sat, 16 Dec 1995 16:54:10 -0700 (MST)
Cc: hackers@FreeBSD.ORG, daveho@infocom.com
In-Reply-To: <Pine.BSF.3.91.951216123927.19249G-100000@infocom.com> from "David Hovemeyer" at Dec 16, 95 01:16:39 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@FreeBSD.ORG
Precedence: bulk

> I am writing a program that needs to do a lot of character at
> a time reading through regular files, both forwards and backwards.
> I decided to use mmap(2) rather than read(2) to implement this.
> My reasoning is that it is easier to do random access in memory
> than in a file.  Also, I imagine that calling read(2) for reading
> single characters is inefficient in terms of system call overhead.
> I could use iostreams (it's a C++ program) to get buffering of
> reads, but iostreams are big and hairy, and still more awkward
> to use than memory for what I want to do.

Buffered I/O really would be the best approach if you can do it.

> What I am wondering is
> 
> 	What is the impact of mmap'ing a large file and then
> 	scanning linearly through it?

You will cause the pages to be faulted in and hooked to the LRU
in inverse fault order (ie: last page accessed will be at the top
of the LRU).

If you do this fast enough, I expect that the cache will be thrashed
for other processes.  There was a change in 1.1.5.1 (I believe) that
implemented working set quotas that would alleviate this problem.
It's a simple issue of list insertion order in the case of 'n'
buffers being on the list for a particular vp (or process, if you
want to hit it that way).

> Currently I am thinking of mmap'ing the entire file; if the file is
> approximately as large as physical memory, will this cause excessive
> paging?  (I am assuming an infinite amount of virtual memory,
> but a limited amount of physical memory.)  Will the program degrade
> the performance of other programs which are running?  Would the mmap
> be likely to fail?

In order:

o	Yes, if you hit all of the pages.  Is this excessive? 
	Depends on your application.

o	Yes, potentially (see above).

o	Not if you have set your soft and hard limits appropriately.
	"man limit' for some details.


> Hmm, I just noticed the madvise(2) man page: is this what I need
> to use?  Maybe I could say MADV_DONTNEED to pages scanned past,
> and MADV_WILLNEED to pages about to be accessed?

I'm not sure the madvise isn't a nullop.  It really depends on your
exact version.  I don't think for a linear scan that you would need
to set it anyway.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.