Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Nov 2025 13:10:15 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        J David <j.david.lists@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: NFSv4.2 hangs on 14.3
Message-ID:  <CAM5tNy65A7QzAS7Ww-dk9Eqx0_xvJAQDPnqEA4D8fWAyB%2BMU2Q@mail.gmail.com>
In-Reply-To: <CABXB=RRDABxmgZMadGManyEO3ecy2x-myBZ8bbyjx7UePn%2BcLw@mail.gmail.com>
References:  <CABXB=RQL0tqnE34G6PGLn6AmcwSpapm0-forQZ5vLBQBwcA12Q@mail.gmail.com> <CAM5tNy7eHH7qmTXLRQ9enDAwUzjUXtjugi093eUoRkDbGDCYVQ@mail.gmail.com> <CABXB=RQ6qSNp==Qa_m-=S8cKzxJU2pbuEDjeGfdr7L8Z0=dmGA@mail.gmail.com> <CABXB=RRHz20XwLDCz7qss1=0hXZK-SXz8X7pm4w8o8r2byxH2A@mail.gmail.com> <CAM5tNy6kQMtxe1Sdt_3yQv00ud-xMUsW1m52V2Gn6zy4tnka6Q@mail.gmail.com> <CABXB=RRDABxmgZMadGManyEO3ecy2x-myBZ8bbyjx7UePn%2BcLw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On Sat, Nov 29, 2025 at 11:37 AM J David <j.david.lists@gmail.com> wrote:
>
> On Fri, Nov 28, 2025 at 10:57 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > And I need you to check to see if the "Initiating..." happens before
> > the "Wrong session.." or after it.
Well, if it happens before, that at least helps explain how it happens.
The code should update nd_slotid when a retry on a new session is done.
(The code is around line#1290 in sys/fs/nfs/nfs_commonkrpc.c.)
If there is some case where nd_slotid doesn't get updated (I haven't
a clue where that is at this time) then this would explain it.

You can try the attached patch, which disables the changes that are
done when "Wrong session.." gets printed. (It still will print out the
message, but won't do anything about it.)

Also, although it would be nice to get to the bottom of this, I'll
note that, for a "ro,nolockd" mount, there is no advantage to
using NFSv4. (An NFSv3 mount will not have any session or
state that needs to be recovered, etc and will probably work fine.(

Finally, I'll note that "Initiating.." should only occur when a server
reboots. (It might happen after a long enough network partition or
a realllyyy slllooowwww server response, but that would have to
be over a minute to have any chance of causing this. If you are
getting server reboots, network partitioning or a server so slow it
doesn't reply to an RPC within 1minute, then that would explain
why you see this stuff and noone else does.)

rick

>
> The next time there is a hang I will check that before rebooting.
> However, if it happens, it will be before. After the "Wrong session,"
> the system is hung and so it's pretty much the last thing in the
> dmesg.
>
> I rechecked and still have three systems with "Initiate recovery"
> messages in dmesg, albeit not all the same ones due to ring buffer
> wrapping around. But there are no systems with "Wrong session"
> currently in the dmesg buffers. That would tend to support, albeit
> *very* weakly, the assertion that if they are correlated, then
> Initiate recovery happens first. Certainly not good enough to draw
> conclusions from, unfortunately.
>
> Maybe I can set up a remote syslog server or something for kern
> messages. That would let me give you exact timestamps so we know not
> just whether it happened before but how long before. And it would also
> let me keep digging after rebooting if we need additional information.
>
> Thanks!

[-- Attachment #2 --]
--- sys/fs/nfs/nfs_commonkrpc.c.wrongsess	2025-11-29 12:50:16.695528000 -0800
+++ sys/fs/nfs/nfs_commonkrpc.c	2025-11-29 12:51:28.595227000 -0800
@@ -1169,6 +1169,7 @@ tryagain:
 							"srvslot=%d "
 							"slot=%d\n", slot,
 							nd->nd_slotid);
+#ifdef notnow
 						    if (i == NFSV4OP_SEQUENCE) {
 							/*
 							 * Mark both slots as
@@ -1185,6 +1186,7 @@ tryagain:
 							     nd->nd_slotid);
 						    }
 						    slot = nd->nd_slotid;
+#endif
 						}
 						freeslot = slot;
 					} else if (slot != 0) {

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy65A7QzAS7Ww-dk9Eqx0_xvJAQDPnqEA4D8fWAyB%2BMU2Q>