Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 04 May 2015 17:50:39 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>, Gleb Smirnoff <glebius@FreeBSD.org>
Cc:        alc@FreeBSD.org, arch@FreeBSD.org
Subject:   Re: more strict KPI for vm_pager_get_pages()
Message-ID:  <554740EF.7030808@freebsd.org>
In-Reply-To: <20150504082426.GC2390@kib.kiev.ua>
References:  <20150430142408.GS546@nginx.com> <20150504082426.GC2390@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/4/15 4:24 PM, Konstantin Belousov wrote:
> On Thu, Apr 30, 2015 at 05:24:08PM +0300, Gleb Smirnoff wrote:
>>    Hi!
>>
>>    The reason to write down this patch emerges from the
>> projects/sendfile branch, where vm_pager_get_pages() is
>> used in the sendfile(2) system call. Although the new
>> sendfile works flawlessly, it makes some assumptions
>> about vnode_pager that theoretically may not be valid,
>> however always hold in our current code.
>>
>> Going deeper in the problem I have found more important
>> points, which yielded in the suggested patch. To start,
>> let me display the current KPI assumptions:
>>
>> 1) vm_pager_get_pages() works on an array of consequtive
>>     array of pages. Pindex of (n+1)-th pages must be pindex
>>     of n-th + 1. One page is special, it is called reqpage.
>> 2) vm_pager_get_pages() guarantees to swapin only the reqpage,
>>     and may skip or fail other pages for different reasons, that
>>     may vary from pager to pager.
>> 3) There also is function vm_pager_has_page(), which reports
>>     availability of a page at given index in the pager, and also
>>     provides hints on how many consequtive pages before this one
>>     and after this one can be swapped in in single pager request.
>>     Most pagers return zeros in these hints. The vnode pager for
>>     UFS returns a strong promise, that one can later utilize in
>>     vm_pager_get_pages().
>> 4) All pages must be busied on enter. On exit only reqpage
>>     will be left busied. The KPI doesn't guarantee that rest
>>     of the pages is still in place. The pager usually calls
>>     vm_page_readahead_finish() on them, which can either free,
>>     or put the page on active/inactive queue, using quite
>>     a strange approach to choose a queue.
>> 5) The pages must not be wired, since vm_page_free() may be
>>     called on them. However, this is violated by several
>>     consumers of KPI, relying on lack of errors in the pager.
>>     Moreover, the swap pager has a special function to skip
>>     wired pages, while doing the sweep, to avoid this problem.
>>     So, passing wired pages to swapper is OK, while to the
>>     reset is not.
>> 6) Pagers may replace a page in the object with a new one.
>>     The sg_pager actually does that. To protect from this
>>     event, consumers of vm_pager_get_pages() always run
>>     vm_page_lookup() over the array of pages to relookup the pages.
>>     However, not all consumers do this.
>>
>> Speaking of pagers and their consumers:
>> - 11 consumers request array of size 1, a single page
>> - 3 consumers actually request array
>>
>> My suggestion is to change the KPI assumptions to the following:
>>
>> 1) There is no reqpage. All pages are entered busied, all pages
>>     are returned busied and validated. If pager fails to validate
>>     all pages it must return error.
>> 2) The consumer (not the pager!) is to decide what to do with the
>>     pages: vm_page_active, vm_page_deactivate, vm_page_flash or just
>>     vm_page_free them. The consumer also unbusies pages, if it
>>     wants to. The consumer is free to wire pages before the call.
>> 3) Consumers must first query the pager via vm_pager_has_page(),
>>     and use the after/before hints to limit the size of the
>>     requested pages array.
>> 4) In case if pager replaces pages, it must also update the array,
>>     so that consumer doesn't need to do relookup.
>>
>> Doing this sweep, I also noticed that all pagers have a copy-pasted
>> code of zeroing invalid regions of partially valid pages. Also,
>> many pagers got a set of assertions copy and pasted from each
>> other. So, I decided to un-inline the vm_pager_get_pages(), bring
>> it to the vm_pager.c file and gather all these copy-pastes
>> into one place.
>>
>> The suggested patch is attached. As expected, it simplifies and
>> removes quite a lot of code.
>>
>> Right now it is tested on UFS only, testing NFS and ZFS is on my list.
>> There is one panic known, but it seems unrelated, and Peter pho@ says
>> that once it has been seen before.
> Below is the summary of my part of the internal discussion about the changes.
>
> Traditionally, Unix allows the filesystems to perform the short reads.
> Most fundamental change in the patch removes this freedom from the
> filesystem implementation, and I think that only local filesystems could
> be compliant with the proposed strictness.
>
> IMO, the response from vm_pager_haspages() is only advisory, since
> filesystem might not control the external entities which are the source
> of the required data.

Also since the backing object is not locked, a truncate() may be performed
between the operations making the prior return information invalid.
Certainly in remote filesystems, possibly on local ones too.
>
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?554740EF.7030808>