From owner-freebsd-fs@freebsd.org Sat Feb 27 04:00:50 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36C90AB50A0 for ; Sat, 27 Feb 2016 04:00:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 18BA6365 for ; Sat, 27 Feb 2016 04:00:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: by mailman.ysv.freebsd.org (Postfix) id 17BF4AB509F; Sat, 27 Feb 2016 04:00:50 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 174A6AB509E for ; Sat, 27 Feb 2016 04:00:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id BE878364 for ; Sat, 27 Feb 2016 04:00:49 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:hl3bOxZJrRpt0c800IjSBln/LSx+4OfEezUN459isYplN5qZpcm5bnLW6fgltlLVR4KTs6sC0LqJ9f68EjJdqb+681k8M7V0HycfjssXmwFySOWkMmbcaMDQUiohAc5ZX0Vk9XzoeWJcGcL5ekGA6ibqtW1aJBzzOEJPK/jvHcaK1oLsh7/0pcGYPVgArQH+SI0xBS3+lR/WuMgSjNkqAYcK4TyNnEF1ff9Lz3hjP1OZkkW0zM6x+Jl+73YY4Kp5pIYTGZn9Ko4iULdVRBk4OmYurJnhrxXOZQyX+mYHVGgK1BFPBk7M8UepcI32t37At+F+kAyTNs7yQLV8DS6n5qxoTBLtoDoAOCM09HnXzMd52vEI6Cm9rgByltaHKLqeM+BzK/vQ X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CsAgBhHtFW/61jaINehAwsQQa6SQENgWYXCoUoSgKBdRQBAQEBAQEBAWMngi2CFAEBAQMBAQEBIAQnIAsFCwIBCA4KAgINGQICJwEJJgEBBAgHBAEcBId2CA6vOo5VAQEBAQEBAQMBAQEBAQEBFQR7hReBdIJGhBAGAQEFgxiBOgWOH4hrhVmCb4IyhEaHaYUthXKIVQIeAQFCggMZgWYeLgeHCggXHX4BAQE X-IronPort-AV: E=Sophos;i="5.22,506,1449550800"; d="scan'208";a="268040829" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 26 Feb 2016 23:00:48 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id A44B315F56D; Fri, 26 Feb 2016 23:00:48 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 2N1DXjWeFFyG; Fri, 26 Feb 2016 23:00:47 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id BAAAD15F56E; Fri, 26 Feb 2016 23:00:47 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 33NgGMKy99q5; Fri, 26 Feb 2016 23:00:47 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 9DC6A15F56D; Fri, 26 Feb 2016 23:00:47 -0500 (EST) Date: Fri, 26 Feb 2016 23:00:47 -0500 (EST) From: Rick Macklem To: Bruce Evans Cc: fs@freebsd.org Message-ID: <1347742231.11086226.1456545647628.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <20160226164613.N2180@besplex.bde.org> References: <20160226164613.N2180@besplex.bde.org> Subject: Re: silly write caching in nfs3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: silly write caching in nfs3 Thread-Index: AgtscdZHaat4mY+BTTXu5KB2CMN/Lw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Feb 2016 04:00:50 -0000 Bruce Evans wrote: > nfs3 is slower than in old versions of FreeBSD. I debugged one of the > reasons today. > > Writes have apparently always done silly caching. Typical behaviour > is for iozone writing a 512MB file where the file fits in the buffer > cache/VMIO. The write is cached perfectly. But then when nfs_open() > reeopens the file, it calls vinvalbuf() to discard all of the cached > data. Thus nfs write caching usually discards useful older data to > make space for newer data that will never be never used (unless the > file is opened r/w and read using the same fd (and is not accessed > for a setattr or advlock operation -- these call vinvalbuf() too, if > NMODIFIED)). The discarding may be delayed for a long time. Then > keeping the useless data causes even more older data to be discarded. > Discarding it on close would at least prevent further loss. It would > have to be committed on close before discarding it of course. > Committing it on close does some good things even without discarding > there, and in oldnfs it gives a bug that prevents discaring in open -- > see below. > > nfs_open() does the discarding for different reasons in the NMODIFIED > and !NMODIFIED cases. In the NMODIFED case, it discard unconditionally. > This case can be avoided by fsync() before close or setting the sysctl > to commit in close. iozone does he fsync(). This helps in oldnfs but > not in newfs. With it, iozone on newfs now behaves like it did on oldnfs > 10-20 years ago. Something (perhaps just the timestamp bugs discussed > later) "fixed" the discarding on oldnfs 5-10 years ago. > > I think not committing in close is supposed to be an optimization, but > it is actually a pessimization for my kernel build tests (with object > files on nfs, which I normally avoid). Builds certainly have to reopen > files after writing them, to link them and perhaps to install them. > This causes the discarding. My kernel build tests also do a lot of > utimes() calls which cause the discarding before commit-on-close can > avoid the above cause for it it by clearing NMODIFIED. Enabling > commit-on-close gives a small optimisation with oldnfs by avoiding all > of the discarding except for utimes(). It reduces read RPCs by about > 25% without increasing write RPCs or real time. It decreases real time > by a few percent. > > The other reason for discarding is because the timestamps changed -- you > just wrote them, so the timestamps should have changed. Different bugs > in comparing the timestamps gave different misbehaviours. > You could easily test to see if second-resolution timestamps make a difference by redefining the NFS_TIMESPEC_COMPARE() macro { in sys/fs/nfsclient/nfsnode.h } so that it only compares the tv_sec field and not the tv_nsec field. --> Then the client would only think the mtime has changed when tv_sec changes. rick > In old versions of FreeBSD and/or nfs, the timestamps had seconds > granularity, so many changes were missed. This explains mysterious > behaviours by iozone 10-20 years ago: the write caching is seen to > work perfectly for most small total sizes, since all the writes take > less than 1 second so the timestamps usually don't change (but sometimes > the writes lie across a seconds boundary so the timestamps do change). > > oldnfs was fixed many years ago to use timestamps with nanoseconds > resolution, but it doesn't suffer from the discarding in nfs_open() > in the !NMODIFIED case which is reached by either fsync() before close > of commit on close. I think this is because it updates n_mtime to > the server's new timestamp in nfs_writerpc(). This seems to be wrong, > since the file might have been written to by other clients and then > the change would not be noticed until much later if ever (setting the > timestamp prevents seeing it change when it is checked later, but you > might be able to see another metadata change). > > newfs has quite different code for nfs_writerpc(). Most of it was > moved to another function in nanother file. I understand this even > less, but it doesn't seem to have fetch the server's new timestamp or > update n_mtime in the v3 case. > > There are many other reasons why nfs is slower than in old versions. > One is that writes are more often done out of order. This tends to > give a slowness factor of about 2 unless the server can fix up the > order. I use an old server which can do the fixup for old clients but > not for newer clients starting in about FreeBSD-9 (or 7?). I suspect > that this is just because Giant locking in old clients gave accidental > serialization. Multiple nfsiod's and/or nfsd's are are clearly needed > for performance if you have multiple NICs serving multiple mounts. > Other cases are less clear. For the iozone benchmark, there is only > 1 stream and multiple nfsiod's pessimize it into multiple streams that > give buffers which arrive out of order on the server if the multiple > nfsiod's are actually active. I use the following configuration to > ameliorate this, but the slowness factor is still often about 2 for > iozone: > - limit nfsd's to 4 > - limit nfsiod's to 4 > - limit nfs i/o sizes to 8K. The server fs block size is 16K, and > using a smaller block size usually helps by giving some delayed > writes which can be clustered better. (The non-nfs parts of the > server could be smarter and do this intentionally. The out-of-order > buffers look like random writes to the server.) 16K i/o sizes > otherwise work OK, but 32K i/o sizes are much slower for unknown > reasons. > > Bruce > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >