Date: Mon, 04 May 2015 17:50:39 +0800 From: Julian Elischer <julian@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com>, Gleb Smirnoff <glebius@FreeBSD.org> Cc: alc@FreeBSD.org, arch@FreeBSD.org Subject: Re: more strict KPI for vm_pager_get_pages() Message-ID: <554740EF.7030808@freebsd.org> In-Reply-To: <20150504082426.GC2390@kib.kiev.ua> References: <20150430142408.GS546@nginx.com> <20150504082426.GC2390@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/4/15 4:24 PM, Konstantin Belousov wrote: > On Thu, Apr 30, 2015 at 05:24:08PM +0300, Gleb Smirnoff wrote: >> Hi! >> >> The reason to write down this patch emerges from the >> projects/sendfile branch, where vm_pager_get_pages() is >> used in the sendfile(2) system call. Although the new >> sendfile works flawlessly, it makes some assumptions >> about vnode_pager that theoretically may not be valid, >> however always hold in our current code. >> >> Going deeper in the problem I have found more important >> points, which yielded in the suggested patch. To start, >> let me display the current KPI assumptions: >> >> 1) vm_pager_get_pages() works on an array of consequtive >> array of pages. Pindex of (n+1)-th pages must be pindex >> of n-th + 1. One page is special, it is called reqpage. >> 2) vm_pager_get_pages() guarantees to swapin only the reqpage, >> and may skip or fail other pages for different reasons, that >> may vary from pager to pager. >> 3) There also is function vm_pager_has_page(), which reports >> availability of a page at given index in the pager, and also >> provides hints on how many consequtive pages before this one >> and after this one can be swapped in in single pager request. >> Most pagers return zeros in these hints. The vnode pager for >> UFS returns a strong promise, that one can later utilize in >> vm_pager_get_pages(). >> 4) All pages must be busied on enter. On exit only reqpage >> will be left busied. The KPI doesn't guarantee that rest >> of the pages is still in place. The pager usually calls >> vm_page_readahead_finish() on them, which can either free, >> or put the page on active/inactive queue, using quite >> a strange approach to choose a queue. >> 5) The pages must not be wired, since vm_page_free() may be >> called on them. However, this is violated by several >> consumers of KPI, relying on lack of errors in the pager. >> Moreover, the swap pager has a special function to skip >> wired pages, while doing the sweep, to avoid this problem. >> So, passing wired pages to swapper is OK, while to the >> reset is not. >> 6) Pagers may replace a page in the object with a new one. >> The sg_pager actually does that. To protect from this >> event, consumers of vm_pager_get_pages() always run >> vm_page_lookup() over the array of pages to relookup the pages. >> However, not all consumers do this. >> >> Speaking of pagers and their consumers: >> - 11 consumers request array of size 1, a single page >> - 3 consumers actually request array >> >> My suggestion is to change the KPI assumptions to the following: >> >> 1) There is no reqpage. All pages are entered busied, all pages >> are returned busied and validated. If pager fails to validate >> all pages it must return error. >> 2) The consumer (not the pager!) is to decide what to do with the >> pages: vm_page_active, vm_page_deactivate, vm_page_flash or just >> vm_page_free them. The consumer also unbusies pages, if it >> wants to. The consumer is free to wire pages before the call. >> 3) Consumers must first query the pager via vm_pager_has_page(), >> and use the after/before hints to limit the size of the >> requested pages array. >> 4) In case if pager replaces pages, it must also update the array, >> so that consumer doesn't need to do relookup. >> >> Doing this sweep, I also noticed that all pagers have a copy-pasted >> code of zeroing invalid regions of partially valid pages. Also, >> many pagers got a set of assertions copy and pasted from each >> other. So, I decided to un-inline the vm_pager_get_pages(), bring >> it to the vm_pager.c file and gather all these copy-pastes >> into one place. >> >> The suggested patch is attached. As expected, it simplifies and >> removes quite a lot of code. >> >> Right now it is tested on UFS only, testing NFS and ZFS is on my list. >> There is one panic known, but it seems unrelated, and Peter pho@ says >> that once it has been seen before. > Below is the summary of my part of the internal discussion about the changes. > > Traditionally, Unix allows the filesystems to perform the short reads. > Most fundamental change in the patch removes this freedom from the > filesystem implementation, and I think that only local filesystems could > be compliant with the proposed strictness. > > IMO, the response from vm_pager_haspages() is only advisory, since > filesystem might not control the external entities which are the source > of the required data. Also since the backing object is not locked, a truncate() may be performed between the operations making the prior return information invalid. Certainly in remote filesystems, possibly on local ones too. > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?554740EF.7030808>