Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Sep 2020 04:14:19 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Alan Somers <asomers@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: panic!("docallb") in nfsrv_docallback
Message-ID:  <YTBPR01MB3966E40F861048C9240A3B89DD2A0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CAOtMX2h3ZTVaZB8bdMg3QeGXCLtJLAuOCEQQm-WNTVaJz1HEDA@mail.gmail.com>
References:  <CAOtMX2h3ZTVaZB8bdMg3QeGXCLtJLAuOCEQQm-WNTVaJz1HEDA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Alan Somers wrote:=0A=
>I just saw this panic on a 12-stable machine.  Unfortunately, I don't have=
=0A=
>a core dump, just a stack trace.  It was serving NFS v4.0, with delegation=
s=0A=
>enabled.  The clients were all Debian, with Linux 3.16.0.=0A=
>=0A=
>The proximal cause of the panic seems to be that the file had a write=0A=
>delegation issued to an unconfirmed client.  Root cause is harder to=0A=
>determine.  Did the kernel previously issue a delegation to an unconfirmed=
=0A=
>client?  Or did the client somehow change to an unconfirmed state after th=
e=0A=
>delegation was issued, perhaps due to a race?=0A=
>=0A=
>It's hard to tell, but I don't see any checks for lc_flags &=0A=
>LCL_NEEDSCONFIRM in nfsrv_openctrl (which issues the delegations), so I'm=
=0A=
>guessing that that's the problem.=0A=
I guess I should have looked at the code before doing the last post.=0A=
The check is in nfsrv_getclient(), that is called by nfsrv_opencheck().=0A=
nfsrv_opencheck() - Checks to see if an Open is allowed.=0A=
nfsrv_openctrl() - Does the Open, assuming nfsrv_opencheck() determined it=
=0A=
  was allowed.=0A=
=0A=
>  If so, then the event trace would look=0A=
>like this:=0A=
>=0A=
>1) Client Alice sends SETCLIENTID.  The server creates a client state=0A=
>structure=0A=
>   for her.=0A=
>_) Client Alice should've sent SETCLIENTID_CONFIRM, but doesn't.  Bad Alic=
e!=0A=
>2) Client Alice sends OPEN for some file, and is issued a write delegation=
.=0A=
>   The server shouldn't have issued it, because Alice's client ID is=0A=
>   unconfirmed.  Bad server!=0A=
I don't think this can happen. From looking at the code, an NFSERR_EXPIRED=
=0A=
reply to the Open should have happened.=0A=
=0A=
>3) Client Bob tries to do a GETATTR on that same file.=0A=
>4) In nfsrv_checkgetattr, the kernel finds a write delegation for that fil=
e,=0A=
>   owned by client Alice.=0A=
I think the server needs to check for LCL_NEEDSCONFIRM in here.=0A=
It gets the "clp" from the FH, but it "assumes" a confirmed ClientID.=0A=
=0A=
I'll code up a patch to add this check to nfsrv_checkgetattr().=0A=
=0A=
>5) The kernel tries to send a NFSV4OP_CBGETATTR callback to Alice, to see=
=0A=
>if the=0A=
>   file's attributes have changed.=0A=
>6) But Alice's client ID is unconfirmed.  Oh no!  Panic!=0A=
>=0A=
>Does this sound plausible?  Should there be a check for LCL_NEEDSCONFIRM=
=0A=
>somewhere around line 3166 in nfs_nfsdstate.c?  Grateful for any help.=0A=
Well, it doesn't appear that the Open could occur when the ClientID was=0A=
not confirmed.=0A=
--> The obvious case you listed above is caught by nfsrv_opencheck().=0A=
      Now, could a SetClientID happen between nfsrv_opencheck() and=0A=
      nfsrv_openctrl()?=0A=
      --> I don't think so. If you look at nfsrv_setclient() which does Set=
ClientID,=0A=
             it grabs the nfsv4rootfs_lock, locking out all other nfsd thre=
ads.=0A=
             It can't acquire the lock while a "shared lock" (I called a re=
fcnt) is=0A=
             held by any other nfsd thread and the thread doing an Open wil=
l=0A=
             hold the shared lock (refcnt).=0A=
=0A=
So, I think the Open with delegation would have been issued when the=0A=
ClientID was confirmed.=0A=
--> Then I suspect the client did another SetClientID that put the ClientID=
=0A=
       back to unconfirmed (the obvious one is a client reboot).=0A=
       --> One quirk of SetClientID/SetClientIDConfirm is that old Open/Loc=
k=0A=
             state cannot be discarded until it is confirmed, so the old De=
legations=0A=
             would remain on the ClientID.=0A=
--> Then a client did a Getattr on the file that had the old Delegation sti=
ll=0A=
       allocated to it.=0A=
=0A=
I'd definitely say that nfsrv_checkgetattr() needs to check for an unconfir=
med=0A=
ClientID and return without attempting a callback.=0A=
--> The exclusive lock on nfsv4rootfs_lock acquired when doing SetClientID=
=0A=
       should also guarantee that, if the ClientID is confirmed when=0A=
       nfsrv_checkgetattr() tests for it, it will remain that way until aft=
er the=0A=
       callback is completed.=0A=
=0A=
I'll come up with a patch and stick it on phabricator, rick=0A=
=0A=
=0A=
=0A=
-Alan=0A=
=0A=
P.S.: stack trace=0A=
=0A=
kdb_backtrace=0A=
vpanic=0A=
panic=0A=
nfsrv_docallback=0A=
nfsrv_checkgetattr=0A=
nfsrvd_getattr=0A=
nfsrvd_dorpc=0A=
nfssvc_program=0A=
svc_run_internal=0A=
svc_thread_start=0A=
fork_exit=0A=
fork_trampoline=0A=
_______________________________________________=0A=
freebsd-hackers@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers=0A=
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"=
=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTBPR01MB3966E40F861048C9240A3B89DD2A0>