From owner-freebsd-current@freebsd.org Thu Dec 31 11:40:18 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 215414BD9DE for ; Thu, 31 Dec 2020 11:40:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4D65mx6BCdz4dWD; Thu, 31 Dec 2020 11:40:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 0BVBe3tq060457 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 31 Dec 2020 13:40:06 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 0BVBe3tq060457 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 0BVBe3Lr060428; Thu, 31 Dec 2020 13:40:03 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 31 Dec 2020 13:40:03 +0200 From: Konstantin Belousov To: Rick Macklem Cc: "freebsd-current@freebsd.org" , Alan Somers , Kirk McKusick , Mark Johnston Subject: Re: r367672 broke the NFS server Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on tom.home X-Rspamd-Queue-Id: 4D65mx6BCdz4dWD X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Dec 2020 11:40:18 -0000 On Thu, Dec 31, 2020 at 05:16:27AM +0000, Rick Macklem wrote: > Rick Macklem wrote: > >Kostik wrote: > > > > > >Idea of the change is to restart the syscall at top level. So for NFS > > >server the right approach is to not send a response and also to not > > >free the request mbuf chain, but to restart processing. > > Yes. I took a look and I think restarting the operation by rolling the > > working position in the mbuf lists back and redoing the operation > > is feasible and easier than fixing the individual operations. > > > > For NFSv4, you cannot redo the entire compound, since non-idempotent > > operations like exclusive open may have already been completed. > > However, rolling back to the beginning of the operation should be > > doable. > Turned out to be quite easy. I'll stick a patch up on phabricator > tomorrow, after I do a little more testing. > NFSv4.0 is still broken, because it screws up the seqid, but I can > fix that separately. > > I do see the code looping about 2-3 times before it gets a successful > ufs_create(). Does that sound reasonable? In the simple case, it could be described as is: ERELOOKUP is returned if the parent directory cannot be locked sleep-less, and we have to drop the lock for opened vnode to sleep on it. More elaborate (but still not precise) description is that parent directory might also need to be synced, in which case its parent might need to be locked, and so on recursively. Slightly reformulating, I expect that ERELOOKUPs come out in case several threads create files in the same directory. > Here's some debug printfs for the test run of 4 concurrent compiles. > (proc=8 is create and proc=12 is remove. Each line is a ERELOOKUP > retry. This is for the 4 threads, but I had the thread tid in another printf > and it showed 2-3 attempts for the same thread. They should be serialized > by the exclusive lock on the directory vnode.) I cannot make any conclusion from the output and its description. Are there opens that do not result in ERELOOKUP, i.e. does the op eventually succeed ? What is the ratio of ERELOOKUP vs. success ? Also note that any VOP that modify the volume' metadata might result in ERELOOKUP. > tryag3 stat=0 proc=8 > tryag3 stat=0 proc=8