Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 02 Oct 2021 01:39:48 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 178231] [nfs] 8.3 nfsv4 client reports "nfsv4 client/server protocol prob err=10026"
Message-ID:  <bug-178231-227-QnIzgyzX6i@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-178231-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-178231-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D178231

Chris Stephan <chris.stephan@live.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chris.stephan@live.com

--- Comment #5 from Chris Stephan <chris.stephan@live.com> ---
Seeing same logs as OP: nfsv4 client/server protocol prob err=3D10026

In an effort to prepare to migrate to our next deployment of production ser=
vers
to leverage NFSv4 over TLS, we have built a development environment on
13.0-RELEASE. All hosts were built from scratch using 13.0-RELEASE txz's,
expanded into custom ZFS datasets on Supermicro X9 series Intel. NFS Clients
are Diskless Dell workstations, booting PXE from /srv/tftpboot, root onto M=
FS,
mouting NFS shares on server on /net. MFS includes most of base.txz. Minor =
bits
have been removed mainly from /usr/share/ /boot that are better suited for
tmpfs or NFS. /usr/bin and /usr/sbin are untouched. The method for this bui=
ld
is identical to what was is done for our 12.2 deployment, short of the sour=
ce
now being the 13-RELEASE tarball and the use of NFSv4 vs NFSv3 previously.

## /etc/fstab entries tried:
192.168.10.101:/   /net    nfs    nfsv4,rw,hard,tcp             0 0
192.168.10.101:/   /net    nfs    nfsv4,rw,soft,retrycnt=3D0,tcp  0 0

All works fine, until one of the testers starts Chromium, which, after even
minor browsing causes all of the window system to freeze. Originally we tho=
ught
this was a bug in Chrome or the X server. After trying to isolate with D-tr=
ace
for the last three days, we found we can trigger the event by waiting for t=
he
window system to freeze up and setting up a dtrace on the window manager
(fluxbox) and right clicking on the desktop to open a menu, which triggers =
the
open() call to pull the menu file from the user's home directory in:

`/net/home/<user>/.fluxbox/menu`

NFS never fulfills this request. It locks up the window manager and I can
switch to another VT and troubleshoot from there. On the NFS server,
/usr/sbin/nfsdumpstate shows there are locks for each of the clients running
chrome. On the clients, /usr/bin/nfsstat shows thousands of timeout and ret=
ries
but they have stopped incrementing by the time everything has locked up. Wh=
en
checking stats even after hours of running (so long as Chrome is not starte=
d),
stats in question stay at 0. It seems apparent that the NFS client has seiz=
ed
at this point and can not recover. Rebooting a client does not clear the lo=
cks
on the server. clear_locks does not appear to resolve the server side either
(but I'm not sure clear_locks works with NFSv4.=20

Any application, CLI or GUI which accesses the NFS system locks up and never
returns. If I try to umount [-f] /net the command locks the VT. if I try to
read the subtree of /net issuance of the command locks the VT.=20

Our previous setup uses NFSv3+KRB5i/p on 12.2-RELEASE-p10 and works flawles=
sly.=20

We also have tried connecting a lab client from our 12.2 cluster as an NFSv4
client to the 13.0 server, and the same thing happens. I am not willing to
attempt to connect a client via NFSv4 to our 12.2 cluster because I really
don't want to cause some further issue in the event we lock up the server w=
ith
active testing going on.

When we change the client mounts to NFSv3, all is well. So this definately
feels like a bug in the NFSv4 client or server.

Anyways, I'm at the point where I feel like there are smarter folk than I t=
hat
might be interested in looking at this. I have an relatively idle cluster r=
eady
for all the testing anyone wants to throw at it.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-178231-227-QnIzgyzX6i>