Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 May 1997 12:06:22 +0100
From:      James Mansion <james@westongold.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: mmap()
Message-ID:  <337C3FAE.4295@westongold.com>
References:  <199705152313.QAA16073@phaeton.artisoft.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Terry Lambert wrote:
> 
> > > Maybe you can convince John Dyson that coding this would be fun (it
> > > might even actually *be* fun 8-)), and then checking the degradation
> > > this causes in the general case to see if it's unacceptably high for
> > > your special case.
> >
> > I can't see that this would be a high cost.  You'd only tell the real
> > benefit on a loaded system anyway.
> 
> It would have a cost higher than not doing it (ie: non-zero).  Using
> mmap() and then doing sequential I/O is probably a very limited market,

I disagree that there is a limited market.  If you have BLOB data stored
in
an RDBMS then I'd think it quite likely that you'll access sequential
pages,
same for any sort of table or index scan.  I can't see that it makes
much
sense to COPY data into memory that has been allocated swap space if you
don't
need to.  Of course, in such cases you might not have given the
sequential
hint and such a heuristic as you describe is more valuable in this case.

I'd also expect it to be a worthwhile technique for web servers and FTP
servers.

> so John would have to amke a decision to accept the non-0 degradation
> on that basis.

I take issue with moans about 'it costs more than zero' and
being so defensive about a few cycles.  Its not the first time this
attitude
is apperent on this list.

Consider this case:
 - you have a store with every page in
 - you have a decision with every page in on whether to start another
   (maybe several) reads

These decisions will take a few branches only.  How long - a
microsecond?
10 microsoeconds?  This is for an operation which is going to a hardware
device FFS!

I know that VM pagein code is used a lot, but how much of the total of
(say) ftp.cdrom.com is in this code?  Say 5%?  Heck, you could add 20%
to
the code path before you get a 'measurable' difference of 1%.  (And do I
believe its 5%?  No.  It'll be less than this).

The same argument applies to the single SMP binary case - if the kernel
is 10% of the load profile and you degrade the whole lot by 10% (which
is
a LOT for this sort of thing) you'll get a massive hit of a whole 1%. 
Big
deal.

We *know* that the Linux kernel is relatively unaffected by compiler
optimisation levels providing that the obvious frame-pointer expense
isn't
incurred, and there are suggestions that changes to eg register calling
will
still have negligible effect, so it is somewhat unlikely that minor
algorithms
of this sort are going to hurt.  Its much more likely in my view that
the
kind of graph closure coding that you propose for a fine grained SMP
system
will have a measurable impact.

Please, people, be reasonable in your opposition to extending code
paths.
Its not bad if the cost is low and you get material benefits, and its
quite
easy to estimate upper bounds for the user-level hit that you'd see for
most things.

> 
> I'm not saying that it's unlikely, only that there *is* a trade involved,
> and so a decision tobe made.

Sure, but let's keep things in perspective.  Knee-jerk reaction seems to
be
'there's a cost!  There's a cost!'.  Clearly there is, and clearly
there's
no gain without some (small) pain.

> 
> > Could you even use the simpler and hugely kludgy approach of 'if the
> > faulting process' descriptor has sequential access, then fault the next
> > <n> pages' where <n> is some tunable value?
> 
> This would lead to bursty operation without high and low watermarks...
> and you are back to storing the last page for that compare.

Well, not necessarily, if the 'follow-on' pages are faulted using a
low-level facility that DOESN'T make this decision.  You'll only
start another multi-page read when you miss the current set.

Conceivably you could fault them all in in a synchronous operation.

Hopefully this would fit well with the layout of files on disk, and
you could limit it to a read of data from the same track or cylinder
so that the extra cost in terms of IO will be low.  Its only an advisory
optimisation anyway.

> 
> > Factor 2 degradation seems a big hit, particularly when using a
> > technique that 'should' be efficient, but maybe this would be cheap
> > and remove much of the performance hit?
> 
> Well, it's a "not doing an optimization" hit, not a "doing something
> wrong hit".  Frankly, sequentially accessing mmap()'ed data is... wierd.

Its not wrong by any means, and is quite common if you have to work on
large files (say, larger than core) that you will normally scan but may
also seek in.  Dumb to make the swapper work hard if you are just doing
read-only access.

Its also the case that you might have these large data stores shared
between
multiple processes and reading chunks into private swappable memory
seems
doubly wasteful in this case.

> 
> 8-).
> 
> Like I said, best bet is to talk to John (or do it yourself).
> 
> > The technique you mention is more along the lines of one that one might
> > use for cases where the MADV_SEQUENTIAL 'hint' has NOT been given and
> > you still want a heuristic to identify when sequential access is
> > nevertheless occuring locally, as might be the case with (say) a DBMS.
> 
> Yes.
> 
> > In this case, the hint is plainly given and you might act on it more
> > directly.
> 
> Yes again; but you then add a compare for the hint, and still need to
> save the last value, and compare it if the hint is set.  I guess it
> saves you the save in the general case.

Well, clearly you have to check to see if the flag is set, but then
you've got a whole bunch of flags to check anyway.

Not sure why you need to save the last value - why not just page in
multiple pages right away?  If the latency for getting a subsequent
page is low, this will probably be cheaper than trying to set up an
async request.

> 
> Really, I think the hit is negligible; however, it's not my decision to
> make, no matter what.  I'm just suggesting possible approaches to the
> problem.
> 
> > What happens for executable image pages?  Do those get any readahead
> > now?
> 
> No.  You'd think data pages, would, though.  Datapage might benefit from
> this optimization as well, so you may be able to sell it that way.  It's
> just that your case is odd, so it's not likely to attract a lot of effort
> to optimize.  I'd suggest you make the changes yourself, if you can, and
> then see what it does as far as performance.

(Can't, I'm afraid, owing to hardware failure at the moment.  Not to
mention
imminent arrival of triplets.)

I would have thought that code pages would benefit too, especially if
you
have an opportunity to perform reordering of the functions so that there
is
good locality of reference.  Admittedly, ld doesn't do this now.

I think this is relevant, though Win32 targetted:
http://www.cs.washington.edu/homes/bershad/etch/index.html

I'm sure its not the only such system.

> 
>                                         Regards,
>                                         Terry Lambert
>                                         terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

-- 
Westongold Ltd                 C++/Java
 Multithread development and libraries
+44 1920 444284     info@westongold.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?337C3FAE.4295>