Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Sep 2020 09:39:08 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 249567] NFSv4 server sometimes responds with NFSERR_INVAL to LOCK from Linux clients
Message-ID:  <bug-249567-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D249567

            Bug ID: 249567
           Summary: NFSv4 server sometimes responds with NFSERR_INVAL to
                    LOCK from Linux clients
           Product: Base System
           Version: 11.4-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: bf@cebitec.uni-bielefeld.de
 Attachment #218237 text/plain
         mime type:

Created attachment 218237
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D218237&action=
=3Dedit
fcntl F_SETLK test case

We run an NFSv4 server with ZFS as backing store based on 11.4-RELEASE-p3.
Clients are Solaris and Linux mostly. Sometimes "svn checkout" or "svn stat=
us"
fails with a disk I/O error on NFS volumes on Linux clients. We tracked down
the problem to sqlite which quite intensively uses fcntl locking. From this=
, we
created a small test case which simply does

  res =3D fcntl(fd, F_SETLK, &lock);
  [...]
  res =3D fcntl(fd, F_SETLK, &unlock);

in a loop (see attachment). This reliably triggers the problem:

  [me@linuxhost:~]$ fcntl_setlk
  F_SETLK F_RDLCK res =3D -1 (Invalid argument)
  Successful cycles: 431

So here fcntl fails after 431 succesful lock-unlock cycles. On the wire, we=
 can
see the LOCK request as:

  Opcode: LOCK (12)
      locktype: READ_LT (1)
      reclaim?: No
      offset: 0
      length: 1
      new lock owner?: Yes
      seqid: 0x00000000
      StateID
          [StateID Hash: 0xdf2d]
          StateID seqid: 1
          StateID Other: d92b565f140000008c0c0000
          [StateID Other hash: 0x15f7]
      lock_seqid: 0x00000000
      Owner
          clientid: 0xd92b565f14000000
          owner: <DATA>
              length: 20
              contents: <DATA>

Lock requests that succeed exactly look the same. On a fail case the FreeBSD
NFS server replies:

  Opcode: LOCK (12)
      Status: NFS4ERR_INVAL (22)

Using DTrace, we found the source of the NFS4ERR_INVL in nfsrv_lockctrl() at
nfs_nfsdstate.c:1810:

    if (!error)
        nfsrv_getowner(&stp->ls_open, new_stp, &lckstp);
    if (lckstp)
        /*=20=20
         * I believe this should be an error, but it
         * isn't obvious what NFSERR_xxx would be
         * appropriate, so I'll use NFSERR_INVAL for now.
         */=20=20
        error =3D NFSERR_INVAL;
    else
        lckstp =3D new_stp;

As a workaround we tried to simply comment out the setting of "error". With
this change, the test case no longer triggers the problem:

--- nfs_nfsdstate.c     2020/09/23 12:58:37     1.1
+++ nfs_nfsdstate.c     2020/09/23 14:16:19
@@ -1802,12 +1802,17 @@
                        if (!error)
                           nfsrv_getowner(&stp->ls_open, new_stp, &lckstp);
                        if (lckstp)
+#ifdef DIAGNOSTIC
+                          printf("nfs_nfsdstate.c:1805: I believe this sho=
uld
be an error\n");
+#else
+                          ;
+#endif
                           /*
                            * I believe this should be an error, but it
                            * isn't obvious what NFSERR_xxx would be
                            * appropriate, so I'll use NFSERR_INVAL for now.
-                           */
                           error =3D NFSERR_INVAL;
+                           */
                        else
                           lckstp =3D new_stp;
                   } else if (new_stp->ls_flags&(NFSLCK_LOCK|NFSLCK_UNLOCK)=
) {

While this seems to work, I have a gut feeling that lckstp should be new_stp
(unconditionally) instead of what nfsrv_getowner returns. Someone with a de=
eper
understandig of the NFS specification should look into this.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-249567-227>