Date: Wed, 16 Dec 2015 15:42:20 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Gleb Smirnoff <glebius@FreeBSD.org> Cc: jeff@FreeBSD.org, alc@FreeBSD.org, scottl@FreeBSD.org, pho@FreeBSD.org, arch@FreeBSD.org Subject: Re: new vm_pager_get_pages() KPI, round 3 Message-ID: <20151216134220.GW3625@kib.kiev.ua> In-Reply-To: <20151214175046.GR78497@FreeBSD.org> References: <20151205052940.GJ42565@FreeBSD.org> <20151214111335.GB82577@kib.kiev.ua> <20151214175046.GR78497@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 14, 2015 at 08:50:46PM +0300, Gleb Smirnoff wrote: > On Mon, Dec 14, 2015 at 01:13:35PM +0200, Konstantin Belousov wrote: > K> I fail to understand how the case of count > 1 and non-contiguous blocks > K> in the non-readahead case is handled by new vnode_pager_generic_getpages(). > K> > K> I do not understand how a hole somewhere in the requested range is handled. > K> Code has a comment that a hole must not appear in the range. > K> > K> Both issues mean that vm_pager_has_page() still must be called before > K> pagein, for count > 1 use. E.g. the exec_map_first_page() uses *after > K> value returned from has_pages() to calculate count, which is an advisory > K> and not the contract. Same issue prevents converting GEM and TTM (and > K> probably md) to use the count > 1 KPI. > > The *after and *before are now not advisory, but a contract. Those consumers, > who want to utilize count > 1, must preceed the call to vm_pager_get_pages() > with call to vm_pager_haspage(). Only region approved by vm_pager_haspage() > or smaller will succeed. Let me repeat what I discussed: int vm_pager_get_pages(vm_object_t obj, vm_page_t *mrun, int count, int before, int after); mrun contains exactly count contiguous busy pages owned by obj. All mrun pages must be returned busy. All mrun pages must be returned valid, or pager must return an error. Advisory, up to 'before' pages befire mrun[0] and up to 'after' pages after mrun[count - 1] might be read by pager, if it succeeds in finding the pages on queues, or allocates them, and no pages outside the mrun are busy etc. Pager is allowed to ignore hints for any reason, typical cause for the VOP_BMAP() backed pager would be the end of contig run on disk. The additional pages are not owned by the called thread after vm_pager_get_pages() return. > > K> Same is true for swap pager, and this prevents the removal of the loop > K> in vm_thread_swaping(). > K> > K> Code assumes that the partially valid page may only appear in the last > K> position of the page run for the local pager, which again requires > K> pre-validation of the vm_pager_get_pages() on the caller side. > > Yes, asking for page in into a valid page is a risk of data corruption. This is an internal pager detail. In the interface I discussed, the pages returned should be valid. If they are already valid at the call time, it is the pager job to avoid corruption by not doing i/o. > > K> Overall this is not an KPI that was discussed. It seemingly does not > K> change semantic for count == 1 case, but is not what it should be for > K> count > 1. As discussed, new vm_pager_get_pages() was support to just > K> work for any count, doing the loop over the non-contig ranges or short > K> reads, and guaranteeing that all existing (or hole-filled) pages are > K> read until EOF is met. This KPI was supposed to: > K> - fix my compaints about short reads > > I will not take your complaints about short reads. The get pages KPI > is not a complement to VOP_READ(), neither of a read(2) syscall. If > a underlying filesystem has problems in it, it must deal with these > problems on its own, doing multiple I/Os per VOP_GETPAGES(). > > K> - avoid excessive VOP_BMAP() call from has_pages before get_pages() > > It is now avoided for count == 1. > > K> - allowed to remove the loops from all current get_pages() consumers, > K> it vm_thread_swapin(), GEM/TTM, image activator > > This wasn't discussed at all. I like this idea, that can be done later. > > -- > Totus tuus, Glebius.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151216134220.GW3625>