From owner-freebsd-stable@FreeBSD.ORG  Tue Jun 10 12:38:14 2003
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id B433337B409; Tue, 10 Jun 2003 12:38:14 -0700 (PDT)
Received: from omgo.iij.ad.jp (omgo.iij.ad.jp [202.232.30.157])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id A6C7B43F75; Tue, 10 Jun 2003 12:38:13 -0700 (PDT)
	(envelope-from nagao@iij.ad.jp)
Received: from jc-smtp.iij.ad.jp ([192.168.176.35])
	by omgo.iij.ad.jp (8.12.9/8.12.9) with ESMTP id h5AJcCqf023067;
	Wed, 11 Jun 2003 04:38:12 +0900 (JST)
Received: from localhost (kabosu.iij.ad.jp [192.168.188.21])
	by jc-smtp.iij.ad.jp (8.12.9/8.12.9) with ESMTP id h5AJc4sq026308;
	Wed, 11 Jun 2003 04:38:04 +0900 (JST)
Date: Wed, 11 Jun 2003 04:38:12 +0900 (JST)
Message-Id: <20030611.043812.74752191.nagao@iij.ad.jp>
To: das@freebsd.org, ishizuka@ish.org
From: Tadaaki Nagao <nagao@iij.ad.jp>
In-Reply-To: <20030609041942.GA4029@HAL9000.homeunix.com>
References: <200305280102.LAA00949@lightning.itga.com.au>
	<20030609.114033.74731601.ishizuka@ish.org>
	<20030609041942.GA4029@HAL9000.homeunix.com>
X-Mailer: Mew version 4.0.54 on Emacs 21.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
cc: stable@freebsd.org
Subject: Re: system slowdown - vnode related
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jun 2003 19:38:15 -0000

Hi there,

In "Re: system slowdown - vnode related",
    David Schultz <das@freebsd.org> wrote:
> On Mon, Jun 09, 2003, Masachika ISHIZUKA wrote:
> >   Hi, David-san.
> >   I have still vnodes problem in 4.8-stable with /sys/kern/vfs_subr.c
> > 1.249.2.30.
> > 
> >   310.locate of weekly cron make slow down or panic.  Values of sysctl
> > are shown as follows when they reached slow down.
> > 
> > (1) #1 machine (Celeron 466 with 256 mega byte rams)
> > 
> >   % sysctl kern.maxvnodes
> >   kern.maxvnodes: 17979
> >   % sysctl vm.zone | grep VNODE
> >   VNODE:           192,        0,  18004,    122,    18004
> 
> This looks pretty normal to me for a quiescent system.  Ordinarily
> I would actually suggest raising maxvnodes if you have lots of
> little files.  Does the number of vnodes shoot up when 310.locate
> runs?  Did you get a backtrace from the panics?  Perhaps the VM
> page cache is still interfering...

Excuse me for jumping in, but I just happened to see hangs on machines
at a company I'm working for (though no panic in my case), and I have
something I'd like to ask Ishizuka-san for trying.

Ishizuka-san, could you possibly try the following command line
repeatedly while slowdown is being observed?

	% vmstat -m | grep '^ *vfscache'

If the third number of its output is approaching or hitting the fourth,
the chances are your kernel is running out of memory for namecache,
which was actually the case on my machines.

And if it is the case for you, Ishizuka-san, try setting debug.vfscache
to 0 to disable namecache entirely and see if your problem completely
goes away (probably at the cost of performance, though).

		-*-				-*-

In fact, I've been reading some of the kernel source and tracking down
the cause of my case for these two weeks, and it seems that I found
something. Though I may be wrong, but I thought describing it here was
better than nothing, and so the following goes the thing I think I've
found:

There seems to be a senario where the memory area for namecache
(M_VFSCACHE) is exhausted while the number of vnodes is too small to
hit the limit (maxvnodes).

More specifically, it seems that when a lot of hard links are being
accessed, some of VDIR vnodes can have an exceedingly bunch of
namecache entries pointing to only a few VREG vnodes of hard-linked
files, which can result in exhaustion of the memory area for
M_VFSCACHE while the total number of vnodes in the kernel might not
increase too much to hit maxvnodes.

And then, if you luckily have enough total memory in the kernel, the
exhaustion of the area for M_VFSCACHE doesn't necessarily cause a
panic, but causes the process traversing those links to be blocked (in
the malloc(9) call in sys/kern/vfs_cache.c:cache_enter()).

After that, what you'll observe is almost everything you try in
userland is blocked though the kernel responds to a ping(8), because
any process will be blocked when attempting to create a namecache
entry.

As I've said I, not being a VFS expert, may be wrong, but I'd be more
than happy if this helps someone who knows much about this part of the
kernel track down the problem.

Thanks,
Tadaaki Nagao <nagao@iij.ad.jp>
Applied Technology Division, Internet Initiative Japan Inc.