Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Dec 2021 13:38:52 -0600
From:      Mike Karels <mike@karels.net>
To:        Jessica Clarke <jrtc27@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: a4e4132fa3bf - main - swapoff(2): replace special device name argument with a structure
Message-ID:  <9FA0F081-7F17-479B-96D6-F0754A19029B@karels.net>
In-Reply-To: <6AA150D7-483E-4F11-B35A-23D6F28ECABB@freebsd.org>
References:  <202112042221.1B4ML7Ov002151@gitrepo.freebsd.org> <EE06FFF1-7587-4F6E-8649-63454155F2C8@freebsd.org> <Yay8/x8lTm59vTlo@kib.kiev.ua> <CCBD810D-80DB-43ED-9957-4F9A9CB950E5@freebsd.org> <Ya0KTUavns2cN/0z@kib.kiev.ua> <6AA150D7-483E-4F11-B35A-23D6F28ECABB@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5 Dec 2021, at 12:56, Jessica Clarke wrote:

> On 5 Dec 2021, at 18:51, Konstantin Belousov <kostikbel@gmail.com> wrot=
e:
>> On Sun, Dec 05, 2021 at 05:14:54PM +0000, Jessica Clarke wrote:
>>> On 5 Dec 2021, at 13:22, Konstantin Belousov <kostikbel@gmail.com> wr=
ote:
>>>>
>>>> On Sun, Dec 05, 2021 at 03:03:26AM +0000, Jessica Clarke wrote:
>>>>> On 4 Dec 2021, at 22:21, Konstantin Belousov <kib@FreeBSD.org> wrot=
e:
>>>>>>
>>>>>> The branch main has been updated by kib:
>>>>>>
>>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=3Da4e4132fa3bfadb6047=
fc0fa5f399f4640460300
>>>>>>
>>>>>> commit a4e4132fa3bfadb6047fc0fa5f399f4640460300
>>>>>> Author:     Konstantin Belousov <kib@FreeBSD.org>
>>>>>> AuthorDate: 2021-11-29 16:26:31 +0000
>>>>>> Commit:     Konstantin Belousov <kib@FreeBSD.org>
>>>>>> CommitDate: 2021-12-04 22:20:58 +0000
>>>>>>
>>>>>>  swapoff(2): replace special device name argument with a structure=

>>>>>>
>>>>>>  For compatibility, add a placeholder pointer to the start of the
>>>>>>  added struct swapoff_new_args, and use it to distinguish old vs. =
new
>>>>>>  style of syscall invocation.
>>>>>>
>>>>>>  Reviewed by:    markj
>>>>>>  Discussed with: alc
>>>>>>  Sponsored by:   The FreeBSD Foundation
>>>>>>  MFC after:      1 week
>>>>>>  Differential revision:  https://reviews.freebsd.org/D33165
>>>>>> ---
>>>>>> sys/vm/swap_pager.c | 27 +++++++++++++++++++++++++--
>>>>>> sys/vm/swap_pager.h |  8 ++++++++
>>>>>> 2 files changed, 33 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
>>>>>> index 165373d1b527..dc1df79f4fcd 100644
>>>>>> --- a/sys/vm/swap_pager.c
>>>>>> +++ b/sys/vm/swap_pager.c
>>>>>> @@ -2491,15 +2491,38 @@ sys_swapoff(struct thread *td, struct swap=
off_args *uap)
>>>>>> 	struct vnode *vp;
>>>>>> 	struct nameidata nd;
>>>>>> 	struct swdevt *sp;
>>>>>> -	int error;
>>>>>> +	struct swapoff_new_args sa;
>>>>>> +	int error, probe_byte;
>>>>>>
>>>>>> 	error =3D priv_check(td, PRIV_SWAPOFF);
>>>>>> 	if (error)
>>>>>> 		return (error);
>>>>>>
>>>>>> +	/*
>>>>>> +	 * Detect old vs. new-style swapoff(2) syscall.  The first
>>>>>> +	 * pointer in the memory pointed to by uap->name is NULL for
>>>>>> +	 * the new variant.
>>>>>> +	 */
>>>>>> +	probe_byte =3D fubyte(uap->name);
>>>>>> +	switch (probe_byte) {
>>>>>> +	case -1:
>>>>>> +		return (EFAULT);
>>>>>> +	case 0:
>>>>>> +		error =3D copyin(uap->name, &sa, sizeof(sa));
>>>>>> +		if (error !=3D 0)
>>>>>> +			return (error);
>>>>>> +		if (sa.flags !=3D 0)
>>>>>> +			return (EINVAL);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		bzero(&sa, sizeof(sa));
>>>>>> +		sa.name =3D uap->name;
>>>>>> +		break;
>>>>>> +	}
>>>>>
>>>>> Doesn=E2=80=99t this change the semantics of swapoff("")?
>>>>>
>>>>> Previously it would fail deterministically, presumably with ENOENT =
or
>>>>> something, but now it reinterprets whatever follows that string in
>>>>> memory as the new argument structure. It probably doesn=E2=80=99t m=
atter, but
>>>>> this approach is ugly. Can we not just define a new syscall rather =
than
>>>>> this kind of bodge?
>>>>
>>>> Having two swapoff() syscalls is worse, and having them only differ =
in
>>>> semantic by single flag is kind of crime.
>>>>
>>>> I do not see swapoff("") as problematic, we are changing a minor sem=
antic of
>>>> the management syscall.  I only wanted to avoid flag day for swapoff=
 binaries.
>>>>
>>>> BTW, I considered requiring proper alignment for uap->name, and then=
 checking
>>>> the whole uap->name_old_syscall for NULL, but then decided that this=
 is
>>>> overkill.  If you think that swapoff("") that important, I can add t=
hat
>>>> additional verification.
>>>
>>> Why=E2=80=99s it worse? It=E2=80=99s just a syscall number, you depre=
cate the old one
>>> and move on, we do that for things relatively regularly. This is real=
ly
>>> not a good solution; harder to use as a caller since the prototype is=

>>> wrong, impossible to ensure you preserve the semantics for the existi=
ng
>>> interface in all cases, and ugly to implement. You don=E2=80=99t need=
 a flag
>>> day for a new syscall, either, you can continue to only use the new
>>> method for -f for a release and then switch over to the new syscall
>>> entirely. Or switch over to the new syscall entirely now and fall bac=
k
>>> on the old syscall if -f isn=E2=80=99t passed. Defining a new syscall=
 also lets
>>> you not need the name_old_syscall member in the struct, and gives you=
 a
>>> clean, fully-extensible syscall to which future features can be added=

>>> in a backwards-compatible way, rather than forever keeping around thi=
s
>>> legacy mess.
>>
>> I disagree, it is not just a syscall number, it is whole user/kernel
>> interface that bloats, which means cognitive efforts from anybody usin=
g
>> this interfaces, and for which we must maintain ABI compatibility.
>
> Which is just as true of this approach; you have the same two
> interfaces here, just smashed together into a single harder-to-use
> syscall rather than two separate syscalls. Having a separate syscall at=

> least allows the old one to return ENOSYS in the future, whereas if you=

> ever want to deprecate the old interface with this method then you=E2=80=
=99ll
> need some other weird error response that=E2=80=99s harder to interpret=
 as
> meaning =E2=80=9Cthat variant of this syscall doesn=E2=80=99t exist any=
 more=E2=80=9D.
>
>> New syscall allocation should be done only as a last resort, when exis=
ting
>> interfaces cannot be adopted for new functionality.
>
> Which this can=E2=80=99t without breaking the existing well-defined sem=
antics,
> as I=E2=80=99ve stated.
>
>> Good (or rather, bad) example of the uglyness that is backed by the at=
titude
>> that syscalls are free, is whole *at() mess, or specific stat*() mess =
(old,
>> other bsds, pre-ino64, ino64, at, then stat vs fstat, then Linux statx=
 which
>> probably fixes the interface ultimately).
>
> It=E2=80=99s better than this approach.

I have less resistance to adding new syscalls, and I agree with Jess that=
 it
is the right thing to do in this case.  Adding a syscall means the kernel=

supports either old or new interface, so there is no flag day.  And it is=

easier to clean up; maintaining two syscalls should only be needed for a
while on -current, and not in any release.

		Mike

> Jess



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9FA0F081-7F17-479B-96D6-F0754A19029B>