Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Oct 2012 17:36:59 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Garrett Wollman <wollman@freebsd.org>
Cc:        freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org
Subject:   Re: NFS server bottlenecks
Message-ID:  <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20588.42788.103863.179701@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Garrett Wollman wrote:
> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
> <rmacklem@uoguelph.ca> said:
> 
> >> Simple: just use a sepatate mutex for each list that a cache entry
> >> is on, rather than a global lock for everything. This would reduce
> >> the mutex contention, but I'm not sure how significantly since I
> >> don't have the means to measure it yet.
> >>
> > Well, since the cache trimming is removing entries from the lists, I
> > don't
> > see how that can be done with a global lock for list updates?
> 
> Well, the global lock is what we have now, but the cache trimming
> process only looks at one list at a time, so not locking the list that
> isn't being iterated over probably wouldn't hurt, unless there's some
> mechanism (that I didn't see) for entries to move from one list to
> another. Note that I'm considering each hash bucket a separate
> "list". (One issue to worry about in that case would be cache-line
> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE
> ought to be increased to reduce that.)
> 
Yea, a separate mutex for each hash list might help. There is also the
LRU list that all entries end up on, that gets used by the trimming code.
(I think? I wrote this stuff about 8 years ago, so I haven't looked at
 it in a while.)

Also, increasing the hash table size is probably a good idea, especially
if you reduce how aggressively the cache is trimmed.

> > Only doing it once/sec would result in a very large cache when
> > bursts of
> > traffic arrives.
> 
> My servers have 96 GB of memory so that's not a big deal for me.
> 
This code was originally "production tested" on a server with 1Gbyte,
so times have changed a bit;-)

> > I'm not sure I see why doing it as a separate thread will improve
> > things.
> > There are N nfsd threads already (N can be bumped up to 256 if you
> > wish)
> > and having a bunch more "cache trimming threads" would just increase
> > contention, wouldn't it?
> 
> Only one cache-trimming thread. The cache trim holds the (global)
> mutex for much longer than any individual nfsd service thread has any
> need to, and having N threads doing that in parallel is why it's so
> heavily contended. If there's only one thread doing the trim, then
> the nfsd service threads aren't spending time either contending on the
> mutex (it will be held less frequently and for shorter periods).
> 
I think the little drc2.patch which will keep the nfsd threads from
acquiring the mutex and doing the trimming most of the time, might be
sufficient. I still don't see why a separate trimming thread will be
an advantage. I'd also be worried that the one cache trimming thread
won't get the job done soon enough.

When I did production testing on a 1Gbyte server that saw a peak
load of about 100RPCs/sec, it was necessary to trim aggressively.
(Although I'd be tempted to say that a server with 1Gbyte is no
 longer relevant, I recently recall someone trying to run FreeBSD
 on a i486, although I doubt they wanted to run the nfsd on it.)

> > The only negative effect I can think of w.r.t. having the nfsd
> > threads doing it would be a (I believe negligible) increase in RPC
> > response times (the time the nfsd thread spends trimming the cache).
> > As noted, I think this time would be negligible compared to disk I/O
> > and network transit times in the total RPC response time?
> 
> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G
> network connectivity, spinning on a contended mutex takes a
> significant amount of CPU time. (For the current design of the NFS
> server, it may actually be a win to turn off adaptive mutexes -- I
> should give that a try once I'm able to do more testing.)
> 
Have fun with it. Let me know when you have what you think is a good patch.

rick

> -GAWollman
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1666343702.1682678.1349300219198.JavaMail.root>