From nobody Sat Oct 2 01:39:48 2021 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DD7B217E4202 for ; Sat, 2 Oct 2021 01:39:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLqR85jP6z4SZ3 for ; Sat, 2 Oct 2021 01:39:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A1916229D for ; Sat, 2 Oct 2021 01:39:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 1921dmQ2061179 for ; Sat, 2 Oct 2021 01:39:48 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 1921dmVS061178 for bugs@FreeBSD.org; Sat, 2 Oct 2021 01:39:48 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 178231] [nfs] 8.3 nfsv4 client reports "nfsv4 client/server protocol prob err=10026" Date: Sat, 02 Oct 2021 01:39:48 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: Unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: chris.stephan@live.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D178231 Chris Stephan changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |chris.stephan@live.com --- Comment #5 from Chris Stephan --- Seeing same logs as OP: nfsv4 client/server protocol prob err=3D10026 In an effort to prepare to migrate to our next deployment of production ser= vers to leverage NFSv4 over TLS, we have built a development environment on 13.0-RELEASE. All hosts were built from scratch using 13.0-RELEASE txz's, expanded into custom ZFS datasets on Supermicro X9 series Intel. NFS Clients are Diskless Dell workstations, booting PXE from /srv/tftpboot, root onto M= FS, mouting NFS shares on server on /net. MFS includes most of base.txz. Minor = bits have been removed mainly from /usr/share/ /boot that are better suited for tmpfs or NFS. /usr/bin and /usr/sbin are untouched. The method for this bui= ld is identical to what was is done for our 12.2 deployment, short of the sour= ce now being the 13-RELEASE tarball and the use of NFSv4 vs NFSv3 previously. ## /etc/fstab entries tried: 192.168.10.101:/ /net nfs nfsv4,rw,hard,tcp 0 0 192.168.10.101:/ /net nfs nfsv4,rw,soft,retrycnt=3D0,tcp 0 0 All works fine, until one of the testers starts Chromium, which, after even minor browsing causes all of the window system to freeze. Originally we tho= ught this was a bug in Chrome or the X server. After trying to isolate with D-tr= ace for the last three days, we found we can trigger the event by waiting for t= he window system to freeze up and setting up a dtrace on the window manager (fluxbox) and right clicking on the desktop to open a menu, which triggers = the open() call to pull the menu file from the user's home directory in: `/net/home//.fluxbox/menu` NFS never fulfills this request. It locks up the window manager and I can switch to another VT and troubleshoot from there. On the NFS server, /usr/sbin/nfsdumpstate shows there are locks for each of the clients running chrome. On the clients, /usr/bin/nfsstat shows thousands of timeout and ret= ries but they have stopped incrementing by the time everything has locked up. Wh= en checking stats even after hours of running (so long as Chrome is not starte= d), stats in question stay at 0. It seems apparent that the NFS client has seiz= ed at this point and can not recover. Rebooting a client does not clear the lo= cks on the server. clear_locks does not appear to resolve the server side either (but I'm not sure clear_locks works with NFSv4.=20 Any application, CLI or GUI which accesses the NFS system locks up and never returns. If I try to umount [-f] /net the command locks the VT. if I try to read the subtree of /net issuance of the command locks the VT.=20 Our previous setup uses NFSv3+KRB5i/p on 12.2-RELEASE-p10 and works flawles= sly.=20 We also have tried connecting a lab client from our 12.2 cluster as an NFSv4 client to the 13.0 server, and the same thing happens. I am not willing to attempt to connect a client via NFSv4 to our 12.2 cluster because I really don't want to cause some further issue in the event we lock up the server w= ith active testing going on. When we change the client mounts to NFSv3, all is well. So this definately feels like a bug in the NFSv4 client or server. Anyways, I'm at the point where I feel like there are smarter folk than I t= hat might be interested in looking at this. I have an relatively idle cluster r= eady for all the testing anyone wants to throw at it. --=20 You are receiving this mail because: You are the assignee for the bug.=