From owner-freebsd-current@FreeBSD.ORG Tue Jan 31 18:42:27 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2A14106566C for ; Tue, 31 Jan 2012 18:42:27 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 974368FC0A for ; Tue, 31 Jan 2012 18:42:27 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 4CD3046B09; Tue, 31 Jan 2012 13:42:27 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D09E4B91E; Tue, 31 Jan 2012 13:42:26 -0500 (EST) From: John Baldwin To: Ulrich =?iso-8859-1?q?Sp=F6rlein?= Date: Tue, 31 Jan 2012 13:21:47 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201201191739.48327.tijl@coosemans.org> <201201300936.45290.jhb@freebsd.org> <20120131172107.GP3489@acme.spoerlein.net> In-Reply-To: <20120131172107.GP3489@acme.spoerlein.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-Id: <201201311321.47714.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 31 Jan 2012 13:42:26 -0500 (EST) Cc: Tijl Coosemans , freebsd-current@freebsd.org Subject: Re: posix_fadvise noreuse disables file caching X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jan 2012 18:42:27 -0000 On Tuesday, January 31, 2012 12:21:07 pm Ulrich Sp=F6rlein wrote: > On Mon, 2012-01-30 at 09:36:45 -0500, John Baldwin wrote: > > On Sunday, January 29, 2012 10:08:10 am Tijl Coosemans wrote: > > > On Wednesday 25 January 2012 17:29:22 John Baldwin wrote: > > > > On Friday, January 20, 2012 2:12:13 pm John Baldwin wrote: > > > >> On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans wrote: > > > >>> I recently noticed that multimedia/vlc generates a lot of disk IO= when > > > >>> playing media files. For instance, when playing a 320kbps mp3 gst= at > > > >>> reports about 1250kBps (=3D10000kbps). That's quite a lot of over= head. > > > >>>=20 > > > >>> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire file = and > > > >>> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as if > > > >>> O_DIRECT was specified during open(2), i.e. it disables all cachi= ng. > > > >>> That means every 1028 byte read turns into a 32KiB read (new defa= ult > > > >>> block size in 9.0) which explains the above numbers. > > > >>>=20 > > > >>> I've copied the relevant vlc code below (modules/access/file.c:Op= en()). > > > >>> It's interesting to see that on OSX it sets F_NOCACHE which disab= les > > > >>> caching too, but combined with F_RDAHEAD there's still read-ahead > > > >>> caching. > > > >>>=20 > > > >>> I don't think POSIX intended for NOREUSE to mean O_DIRECT. It sho= uld > > > >>> still cache data (and even do read-ahead if F_RDAHEAD is specifie= d), > > > >>> and once data is fetched from the cache, it can be marked WONTNEE= D. > > > >>=20 > > > >> POSIX doesn't specify O_DIRECT, so it's not clear what it asks for. > > > >>=20 > > > >>> Is it possible to implement it this way, or if not to just ignore > > > >>> the NOREUSE hint for now? > > > >>=20 > > > >> I think it would be good to improve NOREUSE, though I had sort of > > > >> assumed that applications using NOREUSE would do their own bufferi= ng > > > >> and read full blocks. We could perhaps reimplement NOREUSE by doi= ng > > > >> the equivalent of POSIX_FADV_DONTNEED after each read to free buff= ers > > > >> and pages after the data is copied out to userland. I also have an > > > >> XXX about whether or not NOREUSE should still allow read-ahead as = it > > > >> isn't very clear what the right thing to do there is. HP-UX (IIRC) > > > >> has an fadvise() that lets you specify multiple policies, so you > > > >> could specify both NOREUSE and SEQUENTIAL for a single region to > > > >> get read-ahead but still release memory once the data is read once. > > > > > > > > So I've came up with this untested patch. It uses > > > > VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE region, = and > > > > leaves read-ahead caching enabled for NOREUSE. FADV_DONTNEED doesn= 't > > > > do any good really for writes (it only flushes clean buffers), so I= 've > > > > left write(2) operations as using IO_DIRECT still. Does this sound > > > > reasonable? I've not yet tested this at all: > > >=20 > > > The patch drastically improves vlc, but there's still a tiny overhead. > > > Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD buffer > > > size). With NOREUSE there's an extra transfer of 32KiB (block size). > >=20 > > This is probably because vlc is not reading on block boundaries, so the= =20 > > noreuse is throwing away partial blocks at the end of a read that then = have to=20 > > be re-read. We could maybe fix this by making FADV_DONTNEED only throw > > away completely-contained blocks rather than completely-contained pages. > > However, this will probably result in NOREUSE not actually throwing away > > anything at all if an app always reads sub-blocksize chunks. > >=20 > > We could maybe make the case of vlc work ok in this case though by allo= wing > > an extension where you can do 'posix_fadvise(SEQUENTIAL | NOREUSE)', and > > in this case we could make the VOP_ADVISE(DONTNEED) in read() use an of= fset > > of 0 rather than the start of the read request. > >=20 > > However, posix_fadvise() really is going to work best if the userland=20 > > application reads aligned FS blocks. >=20 > I find it questionable in general that an application can tell the > system what to do wrt. caching. Perhaps I'm running 100s of VLC players > all on the same file and actually *do* want reads to be cached? >=20 > What happens if I seek back in the file? It has to do a potentially > high-latency read again. The system has a better overview of blocks that > are frequently being requested than any individual application. >=20 > I fully understand the intention, and in 99.99% of the cases, this data > *is* just being read once so there's no need to cache any reads for > actually requested data. But as the example shows, requested data is not > necessarily the data that lower layers have to fetch from the disk. >=20 > Perhaps taking to VLC people on why they think this is useful and where > it actually, measurably helped them would be interesting. >=20 > Sorry if this is all perfectly obvious There are certainly cases where the user can choose to run specific apps in such a way where this makes sense, so the OS needs this functionality. As to whether or not specific apps should use these APIs or if they should make use of these APIs configurable, that is a question for each app (e.g. vlc). However, the OS should provide the tools. =2D-=20 John Baldwin