Date: Thu, 24 Sep 2020 09:39:08 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 249567] NFSv4 server sometimes responds with NFSERR_INVAL to LOCK from Linux clients Message-ID: <bug-249567-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D249567 Bug ID: 249567 Summary: NFSv4 server sometimes responds with NFSERR_INVAL to LOCK from Linux clients Product: Base System Version: 11.4-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: bf@cebitec.uni-bielefeld.de Attachment #218237 text/plain mime type: Created attachment 218237 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D218237&action= =3Dedit fcntl F_SETLK test case We run an NFSv4 server with ZFS as backing store based on 11.4-RELEASE-p3. Clients are Solaris and Linux mostly. Sometimes "svn checkout" or "svn stat= us" fails with a disk I/O error on NFS volumes on Linux clients. We tracked down the problem to sqlite which quite intensively uses fcntl locking. From this= , we created a small test case which simply does res =3D fcntl(fd, F_SETLK, &lock); [...] res =3D fcntl(fd, F_SETLK, &unlock); in a loop (see attachment). This reliably triggers the problem: [me@linuxhost:~]$ fcntl_setlk F_SETLK F_RDLCK res =3D -1 (Invalid argument) Successful cycles: 431 So here fcntl fails after 431 succesful lock-unlock cycles. On the wire, we= can see the LOCK request as: Opcode: LOCK (12) locktype: READ_LT (1) reclaim?: No offset: 0 length: 1 new lock owner?: Yes seqid: 0x00000000 StateID [StateID Hash: 0xdf2d] StateID seqid: 1 StateID Other: d92b565f140000008c0c0000 [StateID Other hash: 0x15f7] lock_seqid: 0x00000000 Owner clientid: 0xd92b565f14000000 owner: <DATA> length: 20 contents: <DATA> Lock requests that succeed exactly look the same. On a fail case the FreeBSD NFS server replies: Opcode: LOCK (12) Status: NFS4ERR_INVAL (22) Using DTrace, we found the source of the NFS4ERR_INVL in nfsrv_lockctrl() at nfs_nfsdstate.c:1810: if (!error) nfsrv_getowner(&stp->ls_open, new_stp, &lckstp); if (lckstp) /*=20=20 * I believe this should be an error, but it * isn't obvious what NFSERR_xxx would be * appropriate, so I'll use NFSERR_INVAL for now. */=20=20 error =3D NFSERR_INVAL; else lckstp =3D new_stp; As a workaround we tried to simply comment out the setting of "error". With this change, the test case no longer triggers the problem: --- nfs_nfsdstate.c 2020/09/23 12:58:37 1.1 +++ nfs_nfsdstate.c 2020/09/23 14:16:19 @@ -1802,12 +1802,17 @@ if (!error) nfsrv_getowner(&stp->ls_open, new_stp, &lckstp); if (lckstp) +#ifdef DIAGNOSTIC + printf("nfs_nfsdstate.c:1805: I believe this sho= uld be an error\n"); +#else + ; +#endif /* * I believe this should be an error, but it * isn't obvious what NFSERR_xxx would be * appropriate, so I'll use NFSERR_INVAL for now. - */ error =3D NFSERR_INVAL; + */ else lckstp =3D new_stp; } else if (new_stp->ls_flags&(NFSLCK_LOCK|NFSLCK_UNLOCK)= ) { While this seems to work, I have a gut feeling that lckstp should be new_stp (unconditionally) instead of what nfsrv_getowner returns. Someone with a de= eper understandig of the NFS specification should look into this. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-249567-227>