Date: Tue, 16 Dec 2014 09:29:48 +0100 From: Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit.kuehn@aei.mpg.de> To: freebsd-net@freebsd.org Subject: Re: compiling on nfs directories Message-ID: <20141216092948.605dc8e2e0fec3fa4a5f8ec1@aei.mpg.de> In-Reply-To: <2048229686.13136235.1418677169130.JavaMail.root@uoguelph.ca> References: <CAOgwaMs%2BYLUoLSHDsu6BOYgwr_oi09xNk9yOnSNYjjXqaiDCQQ@mail.gmail.com> <2048229686.13136235.1418677169130.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 15 Dec 2014 15:59:29 -0500 (EST) Rick Macklem <rmacklem@uoguelph.ca> wrote about Re: compiling on nfs directories: RM> Also, note that he didn't see the problem with FreeBSD8.3, which would RM> have been following the same rules on the server as 10.1. RM> RM> What I suspect might cause this is one of two things: RM> 1 - The modify time of the file is now changing at a time the Linux RM> client doesn't expect, due to changes in ZFS or maybe TOD clock RM> resolution. (At one time, the TOD clock was only at a resolution RM> of 1sec, so the client wouldn't see the modify time change often. RM> I think it is now at a much higher resolution, but would have to RM> look at the code/test to be sure.) RM> 2 - I think you mention this one later in your message, in that the RM> build might be depending on file locking. If this is the case, RM> trying NFSv4, which does better file locking, might fix the RM> problem. Meanwhile I have googled around a bit more, and one of the few reasons other people see the error messages I see appears to be a broken clock that makes "make" recompile stuff on the installation stage. As I was already wondering why compilation took longer than I had actually expected, I may be seeing something similar (still need to look into that), although my clock is fine (but time stamps on the NFS might be messed up somehow like you mention above under "1"). RM> Gerrit, I would suggest that you do "nfsstat -m" on the Linux client, RM> to see what the mount options are. The Linux client might be using RM> NFSv4 already. This is what it says about my nfs-root: --- pt-nds ~ # nfsstat -m / from 192.168.32.253:/tank/diskless/nds Flags: rw,relatime,vers=3,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.32.253,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.32.253 --- This is what I set up for pxe-booting: --- label gentoo-cs2 menu label linux-3.8.13-gentoo-2 kernel bzImage-3.8.13-gentoo-2 append ip=dhcp root=/dev/nfs rw nfsroot=192.168.32.253:/tank/diskless/nds,nolock,tcp,v3 rootdelay=15 --- So I definitely run "nfsv3" and "nolock". I remember trying to use nfsv4 on the diskless machines some years ago, but back then it was not ready for prime time. RM> Also, avoid "soft, intr" especially if you are using NFSv4, since these RM> can cause slow server response to result in a failure of a read/write RM> when it shouldn't fail, due to timeout or interruption by a signal. There is "hard" in there as a default option. However, I might try turning on locking (I regarded it as superfluous up to now as I have only one client using the filesystem). RM> If you could find out more about what causes the specific build failure RM> on the Linux side, that might help. As I said above, I have some hints that indicate something might be wrong with timestamps, but I still need to dig deeper into that. RM> If you can reproduce a build failure quickly/easily, you can capture RM> packets via "tcpdump -s 0 -w <file> host <client-hostname>" on the RM> server and then look at it in wireshark to see what the server is RM> replying when the build failure occurs. (I don't mind looking at a RM> packet trace if it is relatively small, if you email it to me as an RM> attachment.) I can reproduce it 100%, but it only happens on the installation stage, after having compiled the whole stuff. So I don't know if I will be able to produce a dump of reasonable size that contains the issue, but I'll try. RM> ps: I am not familiar with the Linux mount options, but if it has RM> stuff like "nocto", you could try those. The manpage has the following: --- cto / nocto Selects whether to use close-to-open cache coherence semantics. If neither option is specified (or if cto is specified), the client uses close-to-open cache coher- ence semantics. If the nocto option is specified, the client uses a non-standard heuristic to determine when files on the server have changed. Using the nocto option may improve performance for read- only mounts, but should be used only if the data on the server changes only occasionally. The DATA AND METADATA COHERENCE section discusses the behavior of this option in more detail. --- So "cto" appears to be the default and is probably what is used right now. I'll put "nocto" on my list of things to try (although the description is not really that incouraging... :-). cu Gerrit
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141216092948.605dc8e2e0fec3fa4a5f8ec1>