Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 May 2013 08:48:03 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "Marc G. Fournier" <scrappy@hub.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: NFS Performance issue against NetApp
Message-ID:  <1966772823.291493.1368362883964.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <518F4130.6080201@hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Marc G. Fournier wrote:
> 'k, here is on Linux ... this is right after rebooting the server,
> doing
> a mount and running the startup once:
> 
> Client rpc stats:
> calls retrans authrefrsh
> 40602 0 40609
> 
> Client nfs v3:
> null getattr setattr lookup access readlink
> 0 0% 13000 32% 5 0% 6140 15% 6741 16%
> 0 0%
> read write create mkdir symlink mknod
> 3556 8% 6711 16% 3743 9% 307 0% 0 0%
> 0 0%
> remove rmdir rename link readdir readdirplus
> 1 0% 0 0% 0 0% 0 0% 16 0%
> 380 0%
> fsstat fsinfo pathconf commit
> 0 0% 2 0% 1 0% 0 0%
> 
> One thing to note is that both Linux/FreeBSD have
> "rsize=65536,wsize=65536" ... but there are 63x as many reads / 34x as
> many writes on FreeBSD as on Linux ... ?
> 
> Just noticed this on the FreeBSD stats:
> 
> Rpc Info:
> TimedOut Invalid X Replies Retries Requests
> 0 0 0 0 818479
> 
> 818k Retries? Is that normal ... ?
> 
> Also, the NetApp volumes being used here are not shared ... there are
> no
> other clients mounting these, and the Linux/FreeBSD volumes are
> seperate
> ... same size, same jboss install, same configuration, same war file
> ...
> I could mount /vol/linux_jboss onto the FreeBSD, or /vol/freebsd_jboss
> onto the Linux, and they would load the same way ... in fact, the
> jboss
> install itself was done onto the FreeBSD and copied over to the Linux
> ... and both are using OpenJDK7 ... I tried to make it as identical as
> I
> could ...
> 
> 
> On 2013-05-11 7:27 PM, Marc G. Fournier wrote:
> >
> > With
> >
> > vfs.nfs.noconsist=3 ... 385595ms
> >
> > nfsstat -z before startup, nfsstat -c after:
> >
> > Client Info:
> > Rpc Counts:
> >   Getattr Setattr Lookup Readlink Read Write Create
> > Remove
> >    332594 5 17238 0 224426 231137
> > 3743 1
> >    Rename Link Symlink Mkdir Rmdir Readdir
> > RdirPlus Access
> >         0 0 0 307 0 71 0 8447
> >     Mknod Fsstat Fsinfo PathConf Commit
> >         0 509 0 0 0
> > Rpc Info:
> >  TimedOut Invalid X Replies Retries Requests
> >         0 0 0 0 818479
> > Cache Info:
> > Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW
> > Hits Misses
> >    608296 332596 526200 17245 -95425 224426 13178
> > 231137
> > BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs
> > Hits Misses
> >         0 0 1050 55 502 7
> > 543340 8448
> >
Ok, so disabling the mtime based cache consistency doesn't make
much difference. Forget about that one.

I've attached another patch (which you probably shouldn't use for
a production system either) to be tried instead of the last one.
(This one is basically "work in progress" by Alexander Kabaev for
 better performance during file linking. I hope he doesn't mind
 me posting it.)

rick

> >
> > ============
> >
> > vfs.nfs.noconsist=2 ... 392201ms
> >
> > Client Info:
> > Rpc Counts:
> >   Getattr Setattr Lookup Readlink Read Write Create
> > Remove
> >    332557 5 17228 0 224421 231131
> > 3743 1
> >    Rename Link Symlink Mkdir Rmdir Readdir
> > RdirPlus Access
> >         0 0 0 307 0 72 0 8430
> >     Mknod Fsstat Fsinfo PathConf Commit
> >         0 502 0 0 0
> > Rpc Info:
> >  TimedOut Invalid X Replies Retries Requests
> >         0 0 0 0 818395
> > Cache Info:
> > Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW
> > Hits Misses
> >    607834 332557 525801 17231 -95401 224421 13178
> > 231131
> > BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs
> > Hits Misses
> >         0 0 1028 56 502 0
> > 542925 8431
> >
> >
> > ============
> > vfs.nfs.noconsist=0 ... 391622ms
> >
> >
> > Client Info:
> > Rpc Counts:
> >   Getattr Setattr Lookup Readlink Read Write Create
> > Remove
> >    236122 5 17221 0 230575 230823
> > 3743 1
> >    Rename Link Symlink Mkdir Rmdir Readdir
> > RdirPlus Access
> >         0 0 0 307 0 71 0 8425
> >     Mknod Fsstat Fsinfo PathConf Commit
> >         0 516 0 0 0
> > Rpc Info:
> >  TimedOut Invalid X Replies Retries Requests
> >         0 0 0 0 727799
> > Cache Info:
> > Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW
> > Hits Misses
> >    711860 236124 526549 17225 -101525 230490 13178
> > 230823
> > BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs
> > Hits Misses
> >         0 0 1057 55 516 0
> > 543709 8425
> >
> >
> > I checked a second time with nonconsist=0, and the nfsstat -c values
> > seem to come out pretty much the same ...
> >
> > I'm going to head down to the office and try again with Solaris (I'd
> > have to re-install, since I used that system for the Solaris), and
> > see
> > what nfsstat -c results I get out of that ... will post a followup
> > on
> > this when completed ...
> >
> >
> >
> > On 2013-05-10 5:32 PM, Rick Macklem wrote:
> >> Marc G. Fournier wrote:
> >>> FYI … I just installed Solaris 11 onto the same hardware and ran
> >>> the
> >>> same test … so far, I'm seeing:
> >>>
> >>> Linux @ ~30s
> >>> Solaris @ ~44s
> >>>
> >>> OpenBSD @ ~200s
> >>> FreeBSD @ ~240s
> >>>
> >>> I've even tried FreeBSD 8.3 just to see if maybe its as 'newish'
> >>> issue
> >>> … same as 9.x … I could see Linux 'cutting corners', but
> >>> Oracle/Solaris too … ?
> >>>
> >> The three client implementations (BSD, Linux, Solaris) were
> >> developed
> >> independently and, as such, will all implement somewaht different
> >> caching algorithms (the RFCs specify what goes on the wire, but say
> >> little w.r.t. client side caching).
> >>
> >> I have a attached a patch that might be useful for determining if
> >> the client side buffer cache consistency algorithm in FreeBSD is
> >> causing the slow startup of jboss. Do not run this patch on a
> >> production system, since it pretty well disables all buffer cache
> >> coherency (ie. if another client modifies a file, the patched
> >> client
> >> won't notice and will continue to cache stale file data).
> >>
> >> If the patch does speed up startup of jboss significantly, you can
> >> use the sysctl:
> >>   vfs.nfs.noconsist
> >> to check for which coherency check is involved by decreasing the
> >> value for the sysctl by 1 and then trying a startup again. (When
> >> vfs.nfs.noconsist=0, normal cache coherency will be applied.)
> >>
> >> I have no idea if buffer cache coherency is a factor, but trying
> >> the attached patch might determine if it is.
> >>
> >> Note that you have never posted updated "nfsstat -c" values.
> >> (Remember that what you posted indicated 88 RPCs, which seemed
> >>   bogus.) Finding out if FreeBSD does a lot more of certain RPCs
> >> that Linux/Solaris might help isolate what is going on.
> >>
> >> rick
> >>
> >>> On 2013-05-03, at 04:50 , Mark Felder <feld@feld.me> wrote:
> >>>
> >>>> On Thu, 02 May 2013 18:43:17 -0500, Marc G. Fournier
> >>>> <scrappy@hub.org> wrote:
> >>>>
> >>>>> Hadn't thought to do so with Linux, but …
> >>>>> Linux ……. 20732ms, 20117ms, 20935ms, 20130ms, 20560ms
> >>>>> FreeBSD .. 28996ms, 24794ms, 24702ms, 23311ms, 24153ms
> >>>> Please make sure both platforms are using similar atime settings.
> >>>> I
> >>>> think most distros use ext4 with diratime by default. I'd just do
> >>>> noatime on both platforms to be safe.
> >>>> _______________________________________________
> >>>> freebsd-fs@freebsd.org mailing list
> >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>> To unsubscribe, send any mail to
> >>>> "freebsd-fs-unsubscribe@freebsd.org"
> >>> _______________________________________________
> >>> freebsd-fs@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>> To unsubscribe, send any mail to
> >>> "freebsd-fs-unsubscribe@freebsd.org"
> >
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to
> > "freebsd-fs-unsubscribe@freebsd.org"

[-- Attachment #2 --]
diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c
index a0ec8ee..e6a7267 100644
--- a/sys/fs/nfsclient/nfs_clbio.c
+++ b/sys/fs/nfsclient/nfs_clbio.c
@@ -1031,13 +1031,16 @@ flush_and_restart:
 		lbn = uio->uio_offset / biosize;
 		on = uio->uio_offset - (lbn * biosize);
 		n = MIN((unsigned)(biosize - on), uio->uio_resid);
+#if 0
 again:
+#endif
 		/*
 		 * Handle direct append and file extension cases, calculate
 		 * unaligned buffer size.
 		 */
 		mtx_lock(&np->n_mtx);
-		if (uio->uio_offset == np->n_size && n) {
+		if (lbn == (np->n_size / biosize) &&
+		    uio->uio_offset + n > np->n_size && n) {
 			mtx_unlock(&np->n_mtx);
 			/*
 			 * Get the buffer (in its pre-append state to maintain
@@ -1045,7 +1048,7 @@ again:
 			 * nfsnode after we have locked the buffer to prevent
 			 * readers from reading garbage.
 			 */
-			bcount = on;
+			bcount = np->n_size - (lbn * biosize);
 			bp = nfs_getcacheblk(vp, lbn, bcount, td);
 
 			if (bp != NULL) {
@@ -1058,7 +1061,7 @@ again:
 				mtx_unlock(&np->n_mtx);
 
 				save = bp->b_flags & B_CACHE;
-				bcount += n;
+				bcount = on + n;
 				allocbuf(bp, bcount);
 				bp->b_flags |= save;
 			}
@@ -1154,6 +1157,7 @@ again:
 		if (bp->b_dirtyoff >= bp->b_dirtyend)
 			bp->b_dirtyoff = bp->b_dirtyend = 0;
 
+#if 0
 		/*
 		 * If the new write will leave a contiguous dirty
 		 * area, just update the b_dirtyoff and b_dirtyend,
@@ -1179,6 +1183,14 @@ again:
 			}
 			goto again;
 		}
+#else
+		/*
+		 * Relax coherency a bit for the sake of performance and
+		 * expand the current dirty region to contain the new
+		 * write even if it means we mark some non-dirty data as
+		 * dirty.  This should probably be configurable.
+		 */
+#endif
 
 		local_resid = uio->uio_resid;
 		error = vn_io_fault_uiomove((char *)bp->b_data + on, n, uio);
diff --git a/sys/nfsclient/nfs_bio.c b/sys/nfsclient/nfs_bio.c
index 630a7ff..9d8dc7c 100644
--- a/sys/nfsclient/nfs_bio.c
+++ b/sys/nfsclient/nfs_bio.c
@@ -1133,6 +1133,7 @@ again:
 		if (bp->b_dirtyoff >= bp->b_dirtyend)
 			bp->b_dirtyoff = bp->b_dirtyend = 0;
 
+#if 0
 		/*
 		 * If the new write will leave a contiguous dirty
 		 * area, just update the b_dirtyoff and b_dirtyend,
@@ -1158,6 +1159,14 @@ again:
 			}
 			goto again;
 		}
+#else
+		/*
+		 * Relax coherency a bit for the sake of performance and
+		 * expand the current dirty region to contain the new
+		 * write even if it means we mark some non-dirty data as
+		 * dirty.  This should probably be configurable.
+		 */
+#endif
 
 		error = uiomove((char *)bp->b_data + on, n, uio);
 

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1966772823.291493.1368362883964.JavaMail.root>