From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 11:08:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 92D969DD for ; Sun, 14 Apr 2013 11:08:28 +0000 (UTC) (envelope-from paulz@vanderzwan.org) Received: from cpsmtpb-ews10.kpnxchange.com (cpsmtpb-ews10.kpnxchange.com [213.75.39.15]) by mx1.freebsd.org (Postfix) with ESMTP id E81C1ABA for ; Sun, 14 Apr 2013 11:08:27 +0000 (UTC) Received: from cpsps-ews08.kpnxchange.com ([10.94.84.175]) by cpsmtpb-ews10.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sun, 14 Apr 2013 13:08:17 +0200 Received: from CPSMTPM-TLF102.kpnxchange.com ([195.121.3.5]) by cpsps-ews08.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sun, 14 Apr 2013 13:08:17 +0200 Received: from mailvm.vanderzwan.org ([77.172.189.82]) by CPSMTPM-TLF102.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sun, 14 Apr 2013 13:08:17 +0200 Received: from gaspode.vanderzwan.org (gaspode.vanderzwan.org [192.168.178.22]) (authenticated bits=0) by mailvm.vanderzwan.org (8.14.6/8.14.6) with ESMTP id r3EB8FUR077658 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 14 Apr 2013 13:08:16 +0200 (CEST) (envelope-from paulz@vanderzwan.org) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: FreeBSD 9.1 NFSv4 client attribute cache not caching ? From: Paul van der Zwan In-Reply-To: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca> Date: Sun, 14 Apr 2013 13:08:15 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <2B576479-C83A-4D3F-B486-475625383E9C@vanderzwan.org> References: <678464111.812434.1365908434250.JavaMail.root@erie.cs.uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1503) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.3.9 (mailvm.vanderzwan.org [192.168.178.25]); Sun, 14 Apr 2013 13:08:16 +0200 (CEST) X-OriginalArrivalTime: 14 Apr 2013 11:08:17.0347 (UTC) FILETIME=[5D99D930:01CE3900] X-RcptDomain: freebsd.org Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 11:08:28 -0000 On 14 Apr 2013, at 5:00 , Rick Macklem wrote: Thanks for taking the effort to send such an extensive reply. > Paul van der Zwan wrote: >> On 12 Apr 2013, at 16:28 , Paul van der Zwan >> wrote: >>=20 >>>=20 >>> I am running a few VirtualBox VMs with 9.1 on my OpenIndiana server >>> and I noticed that make buildworld seem to take much longer >>> when the clients mount /usr/src and /usr/obj over NFS V4 than when >>> they use V3. >>> Unfortunately I have to use V4 as a buildworld on V3 hangs the >>> server completely... >>> I noticed the number of PUTFH/GETATTR/GETFH calls in in the order of >>> a few thousand per second >>> and if I snoop the traffic I see the same filenames appear over and >>> over again. >>> It looks like the client is not caching anything at all and using a >>> server request everytime. >>> I use the default mount options: >>> 192.168.178.24:/data/ports on /usr/ports (nfs, nfsv4acls) >>> 192.168.178.24:/data/src on /usr/src (nfs, nfsv4acls) >>> 192.168.178.24:/data/obj on /usr/obj (nfs, nfsv4acls) >>>=20 >>>=20 >>=20 >> I had a look with dtrace >> $ sudo dtrace -n '::getattr:start { @[stack()]=3Dcount();}' >> and it seems the vast majority of the calls to getattr are from = open() >> and close() system calls.: >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_getattr+0x280 >> kernel`VOP_GETATTR_APV+0x74 >> kernel`nfs_lookup+0x3cc >> kernel`VOP_LOOKUP_APV+0x74 >> kernel`lookup+0x69e >> kernel`namei+0x6df >> kernel`kern_execve+0x47a >> kernel`sys_execve+0x43 >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 26 >>=20 >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_close+0x3e9 >> kernel`VOP_CLOSE_APV+0x74 >> kernel`kern_execve+0x15c5 >> kernel`sys_execve+0x43 >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 26 >>=20 >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_getattr+0x280 >> kernel`VOP_GETATTR_APV+0x74 >> kernel`nfs_lookup+0x3cc >> kernel`VOP_LOOKUP_APV+0x74 >> kernel`lookup+0x69e >> kernel`namei+0x6df >> kernel`vn_open_cred+0x330 >> kernel`vn_open+0x1c >> kernel`kern_openat+0x207 >> kernel`kern_open+0x19 >> kernel`sys_open+0x18 >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 2512 >>=20 >> kernel`newnfs_request+0x631 >> kernel`nfscl_request+0x75 >> kernel`nfsrpc_getattr+0xbe >> kernel`nfs_close+0x3e9 >> kernel`VOP_CLOSE_APV+0x74 >> kernel`vn_close+0xee >> kernel`vn_closefile+0xff >> kernel`_fdrop+0x3a >> kernel`closef+0x332 >> kernel`kern_close+0x183 >> kernel`sys_close+0xb >> kernel`amd64_syscall+0x3bf >> kernel`0xffffffff80784947 >> 2530 >>=20 >> I had a look at the source of nfs_close and could not find a call to >> nfsrpc_getattr, and I am wondering why close would be calling getattr >> anyway. >> If the file is closed what do we care about it's attributes.... >>=20 > Here are some random statements w.r.t. NFSv3 vs NFSv4 that might help > with an understanding of what is going on. I do address the specific > case of nfs_close() towards the end. (It is kinda long winded, but I > threw out eveything I could think of..) >=20 > NFSv3 doesn't have any open/close RPC, but NFSv4 does have Open and > Close operations. >=20 > In NFSv3, each RPC is defined and usually includes attributes for = files > before and after the operation (implicit getattrs not counted in the = RPC > counts reported by nfsstat). >=20 > For NFSv4, every RPC is a compound built up of a list of Operations = like > Getattr. Since the NFSv4 server doesn't know what the compound is = doing, > nfsstat reports the counts of Operations for the NFSv4 server, so the = counts > will be much higher than with NFSv3, but do not reflect the number of = RPCs being done. > To get NFSv4 nfsstat output that can be compared to NFSv3, you need to > do the command on the client(s) and it still is only roughly the same. > (I just realized this should be documented in man nfsstat.) >=20 I ran nfsstat -s -v 4 on the server and saw the number of requests being = done. They were in the order of a few thousand per second for a single FreeBSD = 9.1 client=20 doing a make build world. > For the FreeBSD NFSv4 client, the compounds include Getattr operations > similar to what NFSv3 does. It doesn't do a Getattr on the directory > for Lookup, because that would have made the compound much more = complex. > I don't think this will have a significant performance impact, but = will > result in some additional Getattr RPCs. >=20 I ran snoop on port 2049 on the server and I saw a large number of = lookups. A lot of them seem to be for directories which are part of the filenames = of the compiler and include files which on the nfs mounted /usr/obj. The same names keep reappering so it looks like there is no caching = being done on=20 the client. > I suspect the slowness is caused by the extra overhead of doing the > Open/Close operations against the server. The only way to avoid doing > these against the server for NFSv4 is to enable delegations in both > client and server. How to do this is documented in "man nfsv4". = Basically > starting up the nfscbd in the client and setting: > vfs.nfsd.issue_delegations=3D1 > in the server. >=20 > Specifically for nfs_close(), the attributes (modify time) > is used for what is called "close to open consistency". This can be > disabled by the "nocto" mount option, if you don't need it for your > build environment. (You only need it if one client is writing a file > and then another client is reading the same file.) >=20 I tried the nocto option in /etc/fstab but it does not show when mount = shows the mounted filesystems so I am not sure if it is being used. On the server netstat shows an active connection to port 7745 on the = client but snoop shows no data flowing on that session. =20 > Both the attribute caching and close to open consistency algorithms > in the client are essentially the same for NFSv3 vs NFSv4. >=20 > The NFSv4 Close operation(s) are actually done when the v_usecount for > the vnode goes to 0, since mmap'd files can do I/O on pages after > the close syscall. As such, they are only loosely related to the close > syscall. They are actually closing Windows style Openlock(s). >=20 I had a look at the code of the NFS v4 client of Illumos ( which is = basically what my server is running ) and as far as I understand it they only do the = gettatr only when the close was for a file that was opened for write and when there was = actually something=20 written to the file. The FreeBSD code seems to do the getattr for all close() calls. For files that were never written, like executables or source files that = seems to cause quite a lot of overhead. > You mention that you see the same file over and over in a packet = trace. > You don't give specifics, but I'd suggest that you look at both NFSv3 > and NFSv4 for this (and file names are in lookups, not getattrs). >=20 > I'd suggest you try enabling delegations in both client and server, = plus > trying the "nocto" mount option and see if that helps. >=20 Tried it but it does not seem to make any noticable difference. I tried a make buildworld buildkernel with /usr/obj a local FS in the = Vbox VM that completed in about 2 hours. With /usr/obj on an NFS v4 filesystem = it takes about a day. A twelve fold increase is elapsed time makes using NFSv4 = unusable=20 for this use case. Too bad the server hangs when I use nfsv3 mount for /usr/obj. Having a shared /usr/obj makes it possible to run a make buildworld on a = single VM and just run make installworld on the others. Paul > rick >=20 >>=20 >> Paul >>=20 >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20