From owner-freebsd-current@freebsd.org Wed Dec 30 18:23:15 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 53EB44CB069 for ; Wed, 30 Dec 2020 18:23:15 +0000 (UTC) (envelope-from pho@holm.cc) Received: from relay05.pair.com (relay05.pair.com [216.92.24.67]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4D5fmM1v6Fz4VCh; Wed, 30 Dec 2020 18:23:14 +0000 (UTC) (envelope-from pho@holm.cc) Received: from x8.osted.lan (5.186.117.10.cgn.fibianet.dk [5.186.117.10]) by relay05.pair.com (Postfix) with ESMTP id 154FA1A2D5C; Wed, 30 Dec 2020 13:23:13 -0500 (EST) Received: from x8.osted.lan (localhost [127.0.0.1]) by x8.osted.lan (8.15.2/8.15.2) with ESMTPS id 0BUINDU1022375 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 30 Dec 2020 19:23:13 +0100 (CET) (envelope-from pho@x8.osted.lan) Received: (from pho@localhost) by x8.osted.lan (8.15.2/8.15.2/Submit) id 0BUINDld022374; Wed, 30 Dec 2020 19:23:13 +0100 (CET) (envelope-from pho) Date: Wed, 30 Dec 2020 19:23:13 +0100 From: Peter Holm To: Konstantin Belousov Cc: Rick Macklem , "freebsd-current@freebsd.org" , Alan Somers , Kirk McKusick , Mark Johnston Subject: Re: r367672 broke the NFS server Message-ID: <20201230182313.GA22299@x8.osted.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4D5fmM1v6Fz4VCh X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Dec 2020 18:23:15 -0000 On Wed, Dec 30, 2020 at 07:27:08PM +0200, Konstantin Belousov wrote: > On Wed, Dec 30, 2020 at 04:48:27PM +0000, Rick Macklem wrote: > > Kostik wrote: > > >On Wed, Dec 30, 2020 at 02:02:48AM +0000, Rick Macklem wrote: > > >> Hi, > > >> > > >> Post r367671... > > >> When multiple files are being created by an NFS client in the same > > >> directory, the VOP_CREATE()/ufs_create() can fail with ERELOOKUP. > > >> This results in a EIO return to the NFS client. > > >> --> This causes "nfsv4 client/server protocol prob err=10026" > > >> on the client for NFSv4.0 mounts. > > >> --> This explains why this error has been reported by > > >> several people lately, although it should "never happen". > > >> > > >> Unfortunately, for the NFS server, the Lookup call is done separately > > >> and it will not be easy to redo it, given the current NFS code structure. > > >> > > >> Is there another way to deal with the problem r367672 was fixing that > > >> avoids ufs_create() returning ERELOOKUP? > > > > > >Idea of the change is to restart the syscall at top level. So for NFS > > >server the right approach is to not send a response and also to not > > >free the request mbuf chain, but to restart processing. > > Yes. I took a look and I think restarting the operation by rolling the > > working position in the mbuf lists back and redoing the operation > > is feasible and easier than fixing the individual operations. > > > > For NFSv4, you cannot redo the entire compound, since non-idempotent > > operations like exclusive open may have already been completed. > > However, rolling back to the beginning of the operation should be > > doable. > > --> It will serve as a good test, in that it may expose bugs in the > > RPC/operation code where failure (ERELOOKUP) doesn't clean > > things up correctly. > > --> In NFSv4, there is the open/lock state that cannot be updated > > for this error case. (The seqid stuff in NFSv4.0 Open can be fun. > > Its used to serialize the operations and the number must be > > incremented for some errors, but not for others. The 10026 > > error occurs when you don't get this right.) > Note that ERELOOKUP error can only show up from the VOPs that modify the volume. > Otherwise we simply do not call into SU. In particular, I believe that opens > in the sense of NFS are safe. > > Regardless of it, there should be either a catch-all check for ERELOOKUP, > or assert that ERELOOKUP did not leaked, as it is done for syscalls > > > > > I'll start working on this to-day, but I have no idea how long it might > > take? > > > > >I am sorry I forgot about NFS server when designing this fix, the only > > >mild excuse I can provide is that the change was quite complicated as is. > > >I will start looking at the fix. > > No problem. Sometimes I'd like to forget about NFS too;-). > > > > For the rollback/redo the RPC/operation case, it's probably easier for me > > to do it. As above, I'll start on it, but... > > > > My main concern is how long it will take, given the FreeBSD13 release > > starts soon. > For sure I will help you if needed, and I believe that we could ask for > testing from Peter. Absolutely. Not sure how I missed running NFS test the first time around. - Peter > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"