Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 Dec 2020 05:16:27 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Alan Somers <asomers@freebsd.org>, Kirk McKusick <mckusick@mckusick.com>, Mark Johnston <markj@freebsd.org>
Subject:   Re: r367672 broke the NFS server
Message-ID:  <YQXPR0101MB0968B36B267F4217C74CC9E9DDD60@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <X%2By4bK8/1akNNh4Z@kib.kiev.ua>
References:  <YQXPR0101MB0968DC349AFD081480768028DDD70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X%2Bx4NSeWI%2Bz5QkP3@kib.kiev.ua> <YQXPR0101MB09687E622BDBBE59C964FAC4DDD70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <X%2By4bK8/1akNNh4Z@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Rick Macklem wrote:=0A=
>Kostik wrote:=0A=
> >=0A=
> >Idea of the change is to restart the syscall at top level.  So for NFS=
=0A=
> >server the right approach is to not send a response and also to not=0A=
> >free the request mbuf chain, but to restart processing.=0A=
> Yes. I took a look and I think restarting the operation by rolling the=0A=
> working position in the mbuf lists back and redoing the operation=0A=
> is feasible and easier than fixing the individual operations.=0A=
>=0A=
> For NFSv4, you cannot redo the entire compound, since non-idempotent=0A=
> operations like exclusive open may have already been completed.=0A=
> However, rolling back to the beginning of the operation should be=0A=
> doable.=0A=
Turned out to be quite easy. I'll stick a patch up on phabricator=0A=
tomorrow, after I do a little more testing.=0A=
NFSv4.0 is still broken, because it screws up the seqid, but I can=0A=
fix that separately.=0A=
=0A=
I do see the code looping about 2-3 times before it gets a successful=0A=
ufs_create(). Does that sound reasonable?=0A=
Here's some debug printfs for the test run of 4 concurrent compiles.=0A=
(proc=3D8 is create and proc=3D12 is remove. Each line is a ERELOOKUP=0A=
 retry. This is for the 4 threads, but I had the thread tid in another prin=
tf=0A=
 and it showed 2-3 attempts for the same thread. They should be serialized=
=0A=
 by the exclusive lock on the directory vnode.)=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D12=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D12=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D12=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D12=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D12=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D12=0A=
tryag3 stat=3D0 proc=3D8=0A=
tryag3 stat=3D0 proc=3D8=0A=
=0A=
Thanks for the suggestion, Kostik.=0A=
=0A=
rick=0A=
=0A=
> --> It will serve as a good test, in that it may expose bugs in the=0A=
>       RPC/operation code where failure (ERELOOKUP) doesn't clean=0A=
>       things up correctly.=0A=
>       --> In NFSv4, there is the open/lock state that cannot be updated=
=0A=
>             for this error case. (The seqid stuff in NFSv4.0 Open can be =
fun.=0A=
>             Its used to serialize the operations and the number must be=
=0A=
>             incremented for some errors, but not for others. The 10026=0A=
>             error occurs when you don't get this right.)=0A=
Note that ERELOOKUP error can only show up from the VOPs that modify the vo=
lume.=0A=
Otherwise we simply do not call into SU.  In particular, I believe that ope=
ns=0A=
in the sense of NFS are safe.=0A=
=0A=
Regardless of it, there should be either a catch-all check for ERELOOKUP,=
=0A=
or assert that ERELOOKUP did not leaked, as it is done for syscalls=0A=
=0A=
>=0A=
> I'll start working on this to-day, but I have no idea how long it might=
=0A=
> take?=0A=
>=0A=
> >I am sorry I forgot about NFS server when designing this fix, the only=
=0A=
> >mild excuse I can provide is that the change was quite complicated as is=
.=0A=
> >I will start looking at the fix.=0A=
> No problem. Sometimes I'd like to forget about NFS too;-).=0A=
>=0A=
> For the rollback/redo the RPC/operation case, it's probably easier for me=
=0A=
> to do it. As above, I'll start on it, but...=0A=
>=0A=
> My main concern is how long it will take, given the FreeBSD13 release=0A=
> starts soon.=0A=
For sure I will help you if needed, and I believe that we could ask for=0A=
testing from Peter.=0A=
_______________________________________________=0A=
freebsd-current@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A=
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"=
=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968B36B267F4217C74CC9E9DDD60>