Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Sep 2020 02:58:51 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Alan Somers <asomers@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: panic!("docallb") in nfsrv_docallback
Message-ID:  <YTBPR01MB396648BCDB31564AE81B59B4DD2A0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CAOtMX2h3ZTVaZB8bdMg3QeGXCLtJLAuOCEQQm-WNTVaJz1HEDA@mail.gmail.com>
References:  <CAOtMX2h3ZTVaZB8bdMg3QeGXCLtJLAuOCEQQm-WNTVaJz1HEDA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Alan Somers wrote:=0A=
>I just saw this panic on a 12-stable machine.  Unfortunately, I don't have=
=0A=
>a core dump, just a stack trace.  It was serving NFS v4.0, with delegation=
s=0A=
>enabled.  The clients were all Debian, with Linux 3.16.0.=0A=
I will generically note that I believe the Linux NFS client developers most=
ly=0A=
test NFSv4.1, 4.2, so if the clients support NFSv4.1, it might be worth upg=
rading?=0A=
=0A=
Also, delegations aren't enabled by default for a couple of reasons.=0A=
1 - For a long time, Linux only knew how to use read delegations and I felt=
=0A=
     (and still feel) they are pretty useless.=0A=
2 - They are complex to get right.=0A=
3 - Although they should reduce the number of Open operations against the=
=0A=
     server, I haven't observed dramatic performance improvements because=
=0A=
     of them.=0A=
=0A=
>The proximal cause of the panic seems to be that the file had a write=0A=
>delegation issued to an unconfirmed client.  Root cause is harder to=0A=
>determine.  Did the kernel previously issue a delegation to an unconfirmed=
=0A=
>client?  Or did the client somehow change to an unconfirmed state after th=
e=0A=
>delegation was issued, perhaps due to a race?=0A=
I think the first case is more likely. Since client confirmation happens im=
mediately=0A=
for NFSv4.1 (the ExchangeID and Createsession must occur before anything=0A=
else can happen), I wouldn;t be surprised if the Linux client tries to do a=
n Open=0A=
before the SetClientIDConfirm has completed for NFSv4.0.=0A=
=0A=
>It's hard to tell, but I don't see any checks for lc_flags &=0A=
>LCL_NEEDSCONFIRM in nfsrv_openctrl (which issues the delegations), so I'm=
=0A=
>guessing that that's the problem.=0A=
The server should definitely check for a confirmed ClientID during Open and=
=0A=
fail any Open attempt where that is not the case.=0A=
--> I'll take a look at the code. I wrote it about 20years ago, but I can p=
robably=0A=
       figure out how it works.;-)=0A=
=0A=
>  If so, then the event trace would look=0A=
>like this:=0A=
>=0A=
>1) Client Alice sends SETCLIENTID.  The server creates a client state=0A=
>structure=0A=
>   for her.=0A=
>_) Client Alice should've sent SETCLIENTID_CONFIRM, but doesn't.  Bad Alic=
e!=0A=
>2) Client Alice sends OPEN for some file, and is issued a write delegation=
.=0A=
>   The server shouldn't have issued it, because Alice's client ID is=0A=
>   unconfirmed.  Bad server!=0A=
>3) Client Bob tries to do a GETATTR on that same file.=0A=
>4) In nfsrv_checkgetattr, the kernel finds a write delegation for that fil=
e,=0A=
>   owned by client Alice.=0A=
>5) The kernel tries to send a NFSV4OP_CBGETATTR callback to Alice, to see=
=0A=
>if the=0A=
>   file's attributes have changed.=0A=
>6) But Alice's client ID is unconfirmed.  Oh no!  Panic!=0A=
>=0A=
>Does this sound plausible?  Should there be a check for LCL_NEEDSCONFIRM=
=0A=
>somewhere around line 3166 in nfs_nfsdstate.c?  Grateful for any help.=0A=
Yes, it does. I would have thought that I'd have checked for the unconfirme=
d=0A=
ClientID, but maybe not.=0A=
=0A=
It is also possible that the client somehow did a SetClientID after the Ope=
n=0A=
that issued the delegation, putting it back in "unconfirmed" state.=0A=
It that was the case, maybe the panic(), intended to catch corrupted data=
=0A=
structures, was overkill.=0A=
=0A=
>-Alan=0A=
>=0A=
>P.S.: stack trace=0A=
>=0A=
>kdb_backtrace=0A=
>vpanic=0A=
>panic=0A=
>nfsrv_docallback=0A=
>nfsrv_checkgetattr=0A=
nfsrv_checkgetattr() should probably check for the case of an unconfirmed=
=0A=
   clientid and then return ignoring any delegations hanging off it instead=
=0A=
   of attempting a callback.=0A=
--> This would handle the case where the client did a SetClientID after the=
=0A=
      Open that acquired the delegation, leaving the ClientID unconfirmed.=
=0A=
     - The two RPCs doing SetClientID and SetClientIDConfirm are normally=
=0A=
        done only upon mounting or when the client thinks it has lost the=
=0A=
        ClientID due to a lease expiry, but there is also the case where it=
 is=0A=
        changing the callback address. (This could explain the SetClientID=
=0A=
        happening after the Open that acquired the delegation.)=0A=
--> Hint. Can you now see why NFSv4.1 chose to do things differently?=0A=
=0A=
nfsrvd_getattr=0A=
nfsrvd_dorpc=0A=
nfssvc_program=0A=
svc_run_internal=0A=
svc_thread_start=0A=
fork_exit=0A=
fork_trampoline=0A=
=0A=
Thanks for reporting it. I'll take a look, rick=0A=
=0A=
_______________________________________________=0A=
freebsd-hackers@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers=0A=
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"=
=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTBPR01MB396648BCDB31564AE81B59B4DD2A0>