Date: Sat, 02 Oct 2021 01:39:48 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 178231] [nfs] 8.3 nfsv4 client reports "nfsv4 client/server protocol prob err=10026" Message-ID: <bug-178231-227-QnIzgyzX6i@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-178231-227@https.bugs.freebsd.org/bugzilla/> References: <bug-178231-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D178231 Chris Stephan <chris.stephan@live.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |chris.stephan@live.com --- Comment #5 from Chris Stephan <chris.stephan@live.com> --- Seeing same logs as OP: nfsv4 client/server protocol prob err=3D10026 In an effort to prepare to migrate to our next deployment of production ser= vers to leverage NFSv4 over TLS, we have built a development environment on 13.0-RELEASE. All hosts were built from scratch using 13.0-RELEASE txz's, expanded into custom ZFS datasets on Supermicro X9 series Intel. NFS Clients are Diskless Dell workstations, booting PXE from /srv/tftpboot, root onto M= FS, mouting NFS shares on server on /net. MFS includes most of base.txz. Minor = bits have been removed mainly from /usr/share/ /boot that are better suited for tmpfs or NFS. /usr/bin and /usr/sbin are untouched. The method for this bui= ld is identical to what was is done for our 12.2 deployment, short of the sour= ce now being the 13-RELEASE tarball and the use of NFSv4 vs NFSv3 previously. ## /etc/fstab entries tried: 192.168.10.101:/ /net nfs nfsv4,rw,hard,tcp 0 0 192.168.10.101:/ /net nfs nfsv4,rw,soft,retrycnt=3D0,tcp 0 0 All works fine, until one of the testers starts Chromium, which, after even minor browsing causes all of the window system to freeze. Originally we tho= ught this was a bug in Chrome or the X server. After trying to isolate with D-tr= ace for the last three days, we found we can trigger the event by waiting for t= he window system to freeze up and setting up a dtrace on the window manager (fluxbox) and right clicking on the desktop to open a menu, which triggers = the open() call to pull the menu file from the user's home directory in: `/net/home/<user>/.fluxbox/menu` NFS never fulfills this request. It locks up the window manager and I can switch to another VT and troubleshoot from there. On the NFS server, /usr/sbin/nfsdumpstate shows there are locks for each of the clients running chrome. On the clients, /usr/bin/nfsstat shows thousands of timeout and ret= ries but they have stopped incrementing by the time everything has locked up. Wh= en checking stats even after hours of running (so long as Chrome is not starte= d), stats in question stay at 0. It seems apparent that the NFS client has seiz= ed at this point and can not recover. Rebooting a client does not clear the lo= cks on the server. clear_locks does not appear to resolve the server side either (but I'm not sure clear_locks works with NFSv4.=20 Any application, CLI or GUI which accesses the NFS system locks up and never returns. If I try to umount [-f] /net the command locks the VT. if I try to read the subtree of /net issuance of the command locks the VT.=20 Our previous setup uses NFSv3+KRB5i/p on 12.2-RELEASE-p10 and works flawles= sly.=20 We also have tried connecting a lab client from our 12.2 cluster as an NFSv4 client to the 13.0 server, and the same thing happens. I am not willing to attempt to connect a client via NFSv4 to our 12.2 cluster because I really don't want to cause some further issue in the event we lock up the server w= ith active testing going on. When we change the client mounts to NFSv3, all is well. So this definately feels like a bug in the NFSv4 client or server. Anyways, I'm at the point where I feel like there are smarter folk than I t= hat might be interested in looking at this. I have an relatively idle cluster r= eady for all the testing anyone wants to throw at it. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-178231-227-QnIzgyzX6i>