Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Aug 2015 17:30:31 +0300
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        arch@FreeBSD.org, alc@freebsd.org
Subject:   Re: more strict KPI for vm_pager_get_pages()
Message-ID:  <20150810143031.GF75813@glebius.int.ru>
In-Reply-To: <20150808084121.GX2072@kib.kiev.ua>
References:  <20150430142408.GS546@nginx.com> <20150807133844.GS889@FreeBSD.org> <20150808084121.GX2072@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
  Konstantin,

On Sat, Aug 08, 2015 at 11:41:21AM +0300, Konstantin Belousov wrote:
K> > 1) There could be a user of KPI who would be fine with partial success.
K> > 
K> > My answer: right now there is none, and if one emerges, the code can
K> > be easily adopted to return VM_PAGER_ERROR, but still mark validated
K> > pages as valid. The user of KPI then can scan the array and take valid
K> > pages. So, the patch doesn't put any obstacles on appearance of such
K> > user.
K> The vm_fault.c is already fine with the partial success, it only cares
K> that the requested page was validated and no error from pager is
K> returned.

vm_fault.c can be easily adopted to partial success, but I don't see
a good reason to do this, since right now no pager does return partial
success.

K> > 2) Filesystems can do short reads by design, and thus fail to validate
K> >    the entire array.
K> > 
K> > My answer: yes, that's true. By design NFS, SMBFS and FUSE should be
K> > able to return short reads. However, the VOP_GETPAGES methods for all
K> > three FSes right now do not have any code that would support that. So,
K> > it looks like there is an open issue with these filesystems, not related
K> > to my patch. When this issue is addressed in any of aforementioned FSes,
K> > the VOP_GETPAGES should be fixed to do several I/Os in case of short
K> > reads.
K> 
K> And this is a bug in the networking filesystems (most likely).  Rick was
K> asked about NFS, but he did not responded.  You are proposing to make the
K> bug a part of the interface.

Yes, this is likely a bug in current filesystems. Note that the bug is
already here, and it is not a part of my patch. Also note that the bug
has no evidence, we only modeled it in a thought experiment.

And I don't propose to make a bug part of interface. I propose that if FS
maintainers find time to analyze this issue, then they should fix it,
hiding internal peculiarities of their FS into its implementation, instead
of escalating it up to the pager interface.

K> I object against this change. It is wrong philosophically, and it
K> encodes the incomplete or accidental behaviour of several filesystems at
K> the interface level.

I object to your objection. The current KPI is wrong philosophically.
It is wrong that caller passes an array of locked entities (busied pages)
and gets partially unlocked array on return. This doesn't allow to write
robust code safe from race conditions without introducting workarounds.

K> In fact, you are taking some rather secondary feature of the current
K> interface, that the non-requested pages are made busy for the duration
K> of the paging request, to the level of the fundamental property (would
K> the non-mreq pages be not busied, proposed change immediately causes
K> applications segfault on parallel file truncation, or makes user data
K> corrupted, for example).

I really don't get this paragraph. "would the non-mreq pages be not busied" --
but this isn't the case. They are busied and this is intentional.

K> All this rototilling is because you do not want to code the proper FSA
K> in your reworked sendfile patch.

You see, a proper FSA can be built around even more brain damaged
interface. Alternatively, I can just say that if the array KPI is
so fragile, I will do pages one by one in a busy loop. So, yes, the
"problem" is solvable without touching KPI, but the solution isn't right
one.

-- 
Totus tuus, Glebius.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150810143031.GF75813>