Date: Thu, 31 Dec 2020 05:16:27 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Alan Somers <asomers@freebsd.org>, Kirk McKusick <mckusick@mckusick.com>, Mark Johnston <markj@freebsd.org> Subject: Re: r367672 broke the NFS server Message-ID: <YQXPR0101MB0968B36B267F4217C74CC9E9DDD60@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <X%2By4bK8/1akNNh4Z@kib.kiev.ua> References: <YQXPR0101MB0968DC349AFD081480768028DDD70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X%2Bx4NSeWI%2Bz5QkP3@kib.kiev.ua> <YQXPR0101MB09687E622BDBBE59C964FAC4DDD70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <X%2By4bK8/1akNNh4Z@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Rick Macklem wrote:=0A= >Kostik wrote:=0A= > >=0A= > >Idea of the change is to restart the syscall at top level. So for NFS= =0A= > >server the right approach is to not send a response and also to not=0A= > >free the request mbuf chain, but to restart processing.=0A= > Yes. I took a look and I think restarting the operation by rolling the=0A= > working position in the mbuf lists back and redoing the operation=0A= > is feasible and easier than fixing the individual operations.=0A= >=0A= > For NFSv4, you cannot redo the entire compound, since non-idempotent=0A= > operations like exclusive open may have already been completed.=0A= > However, rolling back to the beginning of the operation should be=0A= > doable.=0A= Turned out to be quite easy. I'll stick a patch up on phabricator=0A= tomorrow, after I do a little more testing.=0A= NFSv4.0 is still broken, because it screws up the seqid, but I can=0A= fix that separately.=0A= =0A= I do see the code looping about 2-3 times before it gets a successful=0A= ufs_create(). Does that sound reasonable?=0A= Here's some debug printfs for the test run of 4 concurrent compiles.=0A= (proc=3D8 is create and proc=3D12 is remove. Each line is a ERELOOKUP=0A= retry. This is for the 4 threads, but I had the thread tid in another prin= tf=0A= and it showed 2-3 attempts for the same thread. They should be serialized= =0A= by the exclusive lock on the directory vnode.)=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D12=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D12=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D12=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D12=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D12=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D12=0A= tryag3 stat=3D0 proc=3D8=0A= tryag3 stat=3D0 proc=3D8=0A= =0A= Thanks for the suggestion, Kostik.=0A= =0A= rick=0A= =0A= > --> It will serve as a good test, in that it may expose bugs in the=0A= > RPC/operation code where failure (ERELOOKUP) doesn't clean=0A= > things up correctly.=0A= > --> In NFSv4, there is the open/lock state that cannot be updated= =0A= > for this error case. (The seqid stuff in NFSv4.0 Open can be = fun.=0A= > Its used to serialize the operations and the number must be= =0A= > incremented for some errors, but not for others. The 10026=0A= > error occurs when you don't get this right.)=0A= Note that ERELOOKUP error can only show up from the VOPs that modify the vo= lume.=0A= Otherwise we simply do not call into SU. In particular, I believe that ope= ns=0A= in the sense of NFS are safe.=0A= =0A= Regardless of it, there should be either a catch-all check for ERELOOKUP,= =0A= or assert that ERELOOKUP did not leaked, as it is done for syscalls=0A= =0A= >=0A= > I'll start working on this to-day, but I have no idea how long it might= =0A= > take?=0A= >=0A= > >I am sorry I forgot about NFS server when designing this fix, the only= =0A= > >mild excuse I can provide is that the change was quite complicated as is= .=0A= > >I will start looking at the fix.=0A= > No problem. Sometimes I'd like to forget about NFS too;-).=0A= >=0A= > For the rollback/redo the RPC/operation case, it's probably easier for me= =0A= > to do it. As above, I'll start on it, but...=0A= >=0A= > My main concern is how long it will take, given the FreeBSD13 release=0A= > starts soon.=0A= For sure I will help you if needed, and I believe that we could ask for=0A= testing from Peter.=0A= _______________________________________________=0A= freebsd-current@freebsd.org mailing list=0A= https://lists.freebsd.org/mailman/listinfo/freebsd-current=0A= To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"= =0A= =0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968B36B267F4217C74CC9E9DDD60>