From owner-svn-src-all@FreeBSD.ORG  Wed Apr  8 14:41:11 2009
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A91FA106566C;
	Wed,  8 Apr 2009 14:41:11 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 7ECC28FC17;
	Wed,  8 Apr 2009 14:41:11 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 363BA46B0D;
	Wed,  8 Apr 2009 10:41:11 -0400 (EDT)
Date: Wed, 8 Apr 2009 15:41:11 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Attilio Rao <attilio@freebsd.org>
In-Reply-To: <3bbf2fe10904080724i381c36fdpb1699def955fdb6d@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.0904081527460.61921@fledge.watson.org>
References: <200904080430.n384UGWw043589@svn.freebsd.org>
	<alpine.BSF.2.00.0904081500220.61921@fledge.watson.org>
	<3bbf2fe10904080724i381c36fdpb1699def955fdb6d@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-stable@freebsd.org, svn-src-all@freebsd.org,
	src-committers@freebsd.org, svn-src-stable-7@freebsd.org,
	Stephen McKay <mckay@freebsd.org>
Subject: Re: svn commit: r190837 - in stable/7/sys: . contrib/pf
 dev/ath/ath_hal dev/cxgb kern
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Apr 2009 14:41:13 -0000

On Wed, 8 Apr 2009, Attilio Rao wrote:

>> Finally, I think it would be a good idea to do a bit of real-world 
>> profiling on memory efficiency of the name cache: how much memory is wasted 
>> when assumptions about short/long are wrong, and could we retune lengths, 
>> limits, hash bucket counts, etc, to work better in practice?
>
> Am I wrong or you were working on adding DTrace tracing to it? Do you have 
> any interesting workload/numbers you can show?

While the probes I've added could be used to do this very easily, that's not 
the thrust of the work I'm currently doing.  A useful starting point for 
someone interested in this problem would be a dtrace script like the 
following:

vfs:namecache:enter:done
{

 	@distribution = quantize(strlen((string)arg1));
}

When run across a "du" of a portion of my local subversion tree, I get:

            value  ------------- Distribution ------------- count
                0 |                                         0
                1 |                                         2
                2 |@@                                       296
                4 |@@@@@@@@@@@                              1879
                8 |@@@@@@@@@@@@@@@@                         2719
               16 |@@@@@@@@@@@                              1974
               32 |                                         69
               64 |                                         2
              128 |                                         0

The result is that for my, quite toy, workload, large bucket entries are 
rarely used, and small entries make ineffective use of the space we've 
allocated because they rarely fill the full 32+ bytes we make available to 
small entries.  Other useful types of analysis might be:

- How effective is our LRU in the cache?
- What's the distribution of "times that entries spend in the cache"
- Is there a relationship between length and reuse of cache entries?

Doing these on real workloads is what's actually required, rather than on my 
running du on a directory tree.

Robert N M Watson
Computer Laboratory
University of Cambridge