From owner-freebsd-current@FreeBSD.ORG  Wed Jun 30 19:35:49 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8BF6F16A4CF
	for <current@freebsd.org>; Wed, 30 Jun 2004 19:35:49 +0000 (GMT)
Received: from smtp-gw-cl-c.dmv.com (smtp-gw-cl-c.dmv.com [216.240.97.41])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0F59643D39
	for <current@freebsd.org>; Wed, 30 Jun 2004 19:35:49 +0000 (GMT)
	(envelope-from sven@dmv.com)
Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46])
	by smtp-gw-cl-c.dmv.com (8.12.10/8.12.10) with ESMTP id i5UJZG9D029221
	for <current@freebsd.org>; Wed, 30 Jun 2004 15:35:16 -0400 (EDT)
	(envelope-from sven@dmv.com)
From: Sven Willenberger <sven@dmv.com>
To: current@freebsd.org
Content-Type: text/plain
Date: Wed, 30 Jun 2004 15:34:05 -0400
Message-Id: <1088624045.1179.25.camel@lanshark.dmv.com>
Mime-Version: 1.0
X-Mailer: Evolution 1.5.9 
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.39
Subject: Stack backtrace: how can I help?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Jun 2004 19:35:49 -0000

My abilities to dig into kernel routines, etc is very limited so I am
asking how I can help those who may be able to fix this recurring
problem.

This has been posted by myself and others with utterly no response from
anyone other than one response saying "it must be a bug".

Under heavy loads, on 5.2.1-P8 systems, I get a Stack backtrace relating
to flushing dirty buffers (ffs_fsync).

the relevant code from ffs_softdep.c ( src/sys/ufs/ffs/ffs_softdep.c,v
1.149 2003/10/23 21:14:08 jhb )

getdirtybuf(bpp, mtx, waitfor)
        struct buf **bpp;
        struct mtx *mtx;
        int waitfor;
{
        struct buf *bp;
        int error;

        /*
         * XXX This code and the code that calls it need to be reviewed
to
         * verify its use of the vnode interlock.
         */

        for (;;) {
                if ((bp = *bpp) == NULL)
                        return (0);
                if (bp->b_vp == NULL)
                        backtrace();
.....

It does seem related to the load created by perl (these machines run
spamassassin through either mimedefang or milter-spamc) and are now
running 5.8.4; the upgrade to perl made no difference ... still getting
these backtraces. Each machine handles (filters) roughly 120K email
messages per day.

a) what additional information would be of help here
b) what can I do to help troubleshoot this -- for the most part the
machines recover after the backtrace (of course they are inoperable
during the time the trace is generated creating further backlog/work for
the other machines in the cluster) although occasionally it will cause a
panic and either reboot or hang at sync.
c) is it possible to cvsup the latest ffs files and make install those
without killing the machine?