Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 Dec 2020 13:40:03 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Alan Somers <asomers@freebsd.org>, Kirk McKusick <mckusick@mckusick.com>, Mark Johnston <markj@freebsd.org>
Subject:   Re: r367672 broke the NFS server
Message-ID:  <X%2B24k2sOYQ5bPfOR@kib.kiev.ua>
In-Reply-To: <YQXPR0101MB0968B36B267F4217C74CC9E9DDD60@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
References:  <YQXPR0101MB0968DC349AFD081480768028DDD70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X%2Bx4NSeWI%2Bz5QkP3@kib.kiev.ua> <YQXPR0101MB09687E622BDBBE59C964FAC4DDD70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X%2By4bK8/1akNNh4Z@kib.kiev.ua> <YQXPR0101MB0968B36B267F4217C74CC9E9DDD60@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 31, 2020 at 05:16:27AM +0000, Rick Macklem wrote:
> Rick Macklem wrote:
> >Kostik wrote:
> > >
> > >Idea of the change is to restart the syscall at top level.  So for NFS
> > >server the right approach is to not send a response and also to not
> > >free the request mbuf chain, but to restart processing.
> > Yes. I took a look and I think restarting the operation by rolling the
> > working position in the mbuf lists back and redoing the operation
> > is feasible and easier than fixing the individual operations.
> >
> > For NFSv4, you cannot redo the entire compound, since non-idempotent
> > operations like exclusive open may have already been completed.
> > However, rolling back to the beginning of the operation should be
> > doable.
> Turned out to be quite easy. I'll stick a patch up on phabricator
> tomorrow, after I do a little more testing.
> NFSv4.0 is still broken, because it screws up the seqid, but I can
> fix that separately.
> 
> I do see the code looping about 2-3 times before it gets a successful
> ufs_create(). Does that sound reasonable?
In the simple case, it could be described as is: ERELOOKUP is returned
if the parent directory cannot be locked sleep-less, and we have to drop
the lock for opened vnode to sleep on it. More elaborate (but still
not precise) description is that parent directory might also need to
be synced, in which case its parent might need to be locked, and so on
recursively.

Slightly reformulating, I expect that ERELOOKUPs come out in case several
threads create files in the same directory.

> Here's some debug printfs for the test run of 4 concurrent compiles.
> (proc=8 is create and proc=12 is remove. Each line is a ERELOOKUP
>  retry. This is for the 4 threads, but I had the thread tid in another printf
>  and it showed 2-3 attempts for the same thread. They should be serialized
>  by the exclusive lock on the directory vnode.)
I cannot make any conclusion from the output and its description.
Are there opens that do not result in ERELOOKUP, i.e. does the op
eventually succeed ?  What is the ratio of ERELOOKUP vs. success ?

Also note that any VOP that modify the volume' metadata might result
in ERELOOKUP.

> tryag3 stat=0 proc=8
> tryag3 stat=0 proc=8



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X%2B24k2sOYQ5bPfOR>