Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Jan 2012 15:12:30 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        src-committers@freebsd.org
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org
Subject:   Re: svn commit: r230782 - head/sys/kern
Message-ID:  <201201301512.30116.jhb@freebsd.org>
In-Reply-To: <201201301935.q0UJZGW7099426@svn.freebsd.org>
References:  <201201301935.q0UJZGW7099426@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, January 30, 2012 2:35:16 pm John Baldwin wrote:
> Author: jhb
> Date: Mon Jan 30 19:35:15 2012
> New Revision: 230782
> URL: http://svn.freebsd.org/changeset/base/230782
> 
> Log:
>   Refine the implementation of POSIX_FADV_NOREUSE for the read(2) case such
>   that instead of using direct I/O it allows read-ahead similar to
>   POSIX_FADV_NORMAL, but invokes VOP_ADVISE(POSIX_FADV_DONTNEED) after the
>   read(2) has completed to purge just-read data.  The write(2) path continues
>   to use direct I/O for POSIX_FADV_NOREUSE for now.  Note that NOREUSE works
>   optimally if an application reads and writes full fs blocks.

Oops, forgot:

Tested by:	jilles

The NOREUSE bits may still need further refinement.  For example, if we allow
something along the lines of 'POSIX_FADV_NOREUSE | POSIX_FADV_SEQUENTIAL',
then we could change the VOP_ADVISE() here to use 0 as the starting offset
which should do a better job of not leaving data in RAM due to reading partial
blocks.  Also, sequentially reading a file on unaligned block offsets with
NOREUSE can result in extraneous reads currently, and we could possibly alleviate
those by changing DONTNEED to only flush wholly contained-blocks rather than
wholly-contained pages from the backing VM object.  However, without the
previous change I suggested that will exacerbate the problem of NOREUSE not
actually purging any data from RAM.  The problem with the | approach though is
that it is not portable, so it is not likely that portable programs like vlc
will use it.  HP/UX had an extended variant of fadvise() that allowed multiple
policies to be set on a range, apparently to handle exactly this case
(sequential and noreuse).  The problem seems to be that noreuse is really
orthogonal to the other access-pattern hints (normal vs random vs sequential).

Finally, I've wondered if POSIX_FADV_SEQUENTIAL shouldn't just mandate the
maximum read-ahead and write-clustering rather than using the heuristics.
It's not completely clear if we did that what the "right" thing to do if an
application does posix_fadvise(POSIX_FADV_SEQUENTIAL) followed by
fcntl(F_READAHEAD) with a different size, esp. given that posix_fadvise()
can theoretically only apply to a range of the file descriptor whereas
F_READAHEAD applies globally to the file descriptor.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201201301512.30116.jhb>