Date: Sat, 23 May 2020 17:22:44 +0100 From: "Norman Gray" <norman.gray@glasgow.ac.uk> To: Doug McIntyre <merlyn@geeks.org>, Remy Zandwijk <remy@luckyhands.nl> Cc: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: Documentation and debugging for NFSv4 Message-ID: <057FFF2E-CB29-4A74-8F27-8ADF0AE4C202@glasgow.ac.uk> In-Reply-To: <20200522182635.GA4515@geeks.org> References: <D3388CA5-84AA-48F4-8B47-8B94EFA4305A@glasgow.ac.uk> <20200522182635.GA4515@geeks.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Doug and Remy, hello. Thanks for your additional observations. On 22 May 2020, at 19:26, Doug McIntyre wrote: > On Fri, May 22, 2020 at 03:15:01PM +0100, Norman Gray wrote: >> I'm having difficulty finding consistent documentation and debugging >> tools >> for NFSv4. Is there some handbook-like source that I'm missing? Or >> some >> layer of documentation for configuration or debugging that I've >> failed to >> find? > > I think in general, that NFSv4 is not widely deployed outside of > hetrogenous linux environments. Given the state of things, I'd imagine > it is downgraded to NFSv3 more often than not in other use cases of > mixed > OSes. That doesn't seem to be the case with me. Even an Ubuntu12 client seems happy to make an NFSv4 mount from this server (though I think it's 4.0), and I have a CentOS 7.8 machine similarly happy with 4.1. But it's another CentOS 7.8 client which refuses to make the connection. (there's an aha below...) It also turns out that a FreeBSD 11.3 client can't mount this unless -overs=4 is explicitly provided on the mount command (this is actually explained in the mount_nfs(8) manpage (!), which says that the default strategy is to try 3 then 2). Succeeding: ubuntu12# mount -tnfs server:/astro/home /mnt centos78@a# mount -tnfs server:/astro/home /mnt freebsd113# mount -tnfs -overs=4 server:/astro/norman /mnt Aha... Failing: centos78@b and Ubuntu14@b.... aha. I have FINALLY found some consistency to the machines which fail: they're all in a different DNS (sub)domain to the server, though in the same netblocks as the machines which succeed. Specifically, they fail during the ls: the client sends a PUTFH and a READDIR opcode, and the PUTFH succeeds but the READDIR fails, with a NFS4ERR_NOFILEHANDLE, which seems to suggest that the FH that the PUTFH sent wasn't saved (if I'm understanding RFC 3530 for NFSv4.0 and RFC 5661 for v4.1, correctly). In the case of the machines which succeed, in subdomain @a, the corresponding NFS request looks pretty much identical, but the response is successful. (An odd thing is that in the _successful_ cases, Wireshark shows a lot of 'TCP ACKed unseen segment' warnings, but I can't see how this might be relevant) This is the only consistency I can see, but I can't see how this is relevant to anything. * NFS works at the TCP layer, after resolution of DNS names * There are no domain names that I can see in the tcpdump traffic * The only mentions of domains in RFC 5661 are irrelevant to this. The domain names in the context of owners and groups are not, I think, relevant. >> Normally some combination of netstat and tcpdump would make some >> headway, >> but SunRPC is blacker magic than that. > > NFSv4 is a big change, most implementations I've seen operate over TCP > instead of UDP > whereas TCP was optional in v2 and v3. As I read RFC 3530, NFSv4 is TCP-only (well, TCP and SCTP), and doesn't use UDP at all. > NFSv4 doesn't need rpc portmapper, nor > other helper daemons. The IDmapper is a big change as well, no more > UID passed > through, but all UIDs have to be mapped back and forth on both sides. Also from Remy: > Is nfsuserd running? According to the man page, it is needed for NFSv4 > to work properly. I think the string-based user and group information is strictly optional, though usual/recommended. nfsv4(4) indicates that that the wire protocol can have strings or numbers-containing-strings (and cf RFC 5661 Sect.5.9). At any rate, I see the _same_ behaviour both with and without nfsuserd running on the server. When it's running, I see domain names in the responses to GETATTR requests, but nowhere in the requests that are failing. This is however the best clue so far, since I can see that `nfsidmap -d` produces different results on the two sets of client machines. I wonder if this worked by accident -- due to a default configuration -- without nfsuserd before. It appears that I'll need to learn more about what role that has in the protocol. I wonder if the relevant id is somehow encoded into the (I thought opaque) filehandles that are passed back and forth. > Make sure you use V4 definitions in /etc/exports. From what I > remember even connecting as a client needed 'V4: /' in there to > connect right to a linux NFSv4 server, but I could be misremembering. That's right. The presence of the 'V4' in the /etc/exports appears to be what enables NFSv4 service (exports(5) seems to vaguely suggest this without being explicit). Thanks for your thoughts; any others most welcome. Best wishes, Norman -- Norman Gray : http://www.astro.gla.ac.uk/users/norman/it/ Research IT Coordinator : School of Physics and Astronomy // My current template week for IT tasks is: Monday, Tuesday, and Friday
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?057FFF2E-CB29-4A74-8F27-8ADF0AE4C202>