From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 20:58:18 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 8FEF1350;
 Mon, 15 Oct 2012 20:58:18 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 34CA68FC08;
 Mon, 15 Oct 2012 20:58:17 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EABp4fFCDaFvO/2dsb2JhbABFFoV8un6CIAEBBAEjVgUWDgoCAg0ZAlkGiBEGC6oFkwmBIYo4hSuBEgOVbIEVjxuDCYF7
X-IronPort-AV: E=Sophos;i="4.80,590,1344225600"; d="scan'208";a="186505520"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu-pri.mail.uoguelph.ca with ESMTP; 15 Oct 2012 16:58:16 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4A5B1B4039;
 Mon, 15 Oct 2012 16:58:16 -0400 (EDT)
Date: Mon, 15 Oct 2012 16:58:16 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <k5gtdh$nc0$1@ger.gmane.org>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 20:58:18 -0000

Ivan Voras wrote:
> On 13/10/2012 17:22, Nikolay Denev wrote:
> 
> > drc3.patch applied and build cleanly and shows nice improvement!
> >
> > I've done a quick benchmark using iozone over the NFS mount from the
> > Linux host.
> >
> 
> Hi,
> 
> If you are already testing, could you please also test this patch:
> 
> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
> 
I don't think (it is hard to test this) your trim cache algorithm
will choose the correct entries to delete.

The problem is that UDP entries very seldom time out (unless the
NFS server isn't seeing hardly any load) and are mostly trimmed
because the size exceeds the highwater mark.

With your code, it will clear out all of the entries in the first
hash buckets that aren't currently busy, until the total count
drops below the high water mark. (If you monitor a busy server
with "nfsstat -e -s", you'll see the cache never goes below the
high water mark, which is 500 by default.) This would delete
entries of fairly recent requests.

If you are going to replace the global LRU list with ones for
each hash bucket, then you'll have to compare the time stamps
on the least recently used entries of all the hash buckets and
then delete those. If you keep the timestamp of the least recent
one for that hash bucket in the hash bucket head, you could at least
use that to select which bucket to delete from next, but you'll still
need to:
  - lock that hash bucket
    - delete a few entries from that bucket's lru list
  - unlock hash bucket
- repeat for various buckets until the count is beloew the high
  water mark
Or something like that. I think you'll find it a lot more work that
one LRU list and one mutex. Remember that mutex isn't held for long.

Btw, the code looks very nice. (If I was being a style(9) zealot,
I'd remind you that it likes "return (X);" and not "return X;".

rick

> It should apply to HEAD without Rick's patches.
> 
> It's a bit different approach than Rick's, breaking down locks even
> more.