From owner-freebsd-current@FreeBSD.ORG  Wed Jun 16 15:54:19 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 60AA116A4CE
	for <freebsd-current@freebsd.org>;
	Wed, 16 Jun 2004 15:54:19 +0000 (GMT)
Received: from smtp-gw-cl-d.dmv.com (smtp-gw-cl-d.dmv.com [216.240.97.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0CE6043D49
	for <freebsd-current@freebsd.org>;
	Wed, 16 Jun 2004 15:54:19 +0000 (GMT)	(envelope-from sven@dmv.com)
Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46])
	i5GFrSRv065112	for <freebsd-current@freebsd.org>;
	Wed, 16 Jun 2004 11:53:28 -0400 (EDT)	(envelope-from sven@dmv.com)
From: Sven Willenberger <sven@dmv.com>
To: freebsd-current@freebsd.org
In-Reply-To: <1087305362.15171.8.camel@lanshark.dmv.com>
References: <1087234185.13429.19.camel@lanshark.dmv.com>
	 <1087305362.15171.8.camel@lanshark.dmv.com>
Content-Type: text/plain
Date: Wed, 16 Jun 2004 11:52:30 -0400
Message-Id: <1087401150.1437.6.camel@lanshark.dmv.com>
Mime-Version: 1.0
X-Mailer: Evolution 1.5.9 
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.39
Subject: Re: Softupdate/kernel panic ffs_fsync
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Jun 2004 15:54:19 -0000

On Tue, 2004-06-15 at 09:16 -0400, Sven Willenberger wrote:
> On Mon, 2004-06-14 at 13:29 -0400, Sven Willenberger wrote:
> > Once upon a time I wrote:
> > 
> > > I have seen a few (unresolved) questions similar to this searching
> > > (google|archives). On a 5.2.1-Release-P2 system (actually a couple with
> > > essentially identical configs) I get the following Stack Backtrace
> > > messages:
> > > 
> > > backtrace(c070cbf8,2,e5b3af60,0,22) at backtrace +0x17
> > > getdirtybuf(f7f99bbc,0,1,e5b3a,f60,1) at getdirtybuf +0x30
> > > flush_deplist(c724e64c,1,f7f99be4,f7f99be8,0) at flush_deplist +0x43
> > > flush_inode_deps(c6c35000,5c108,f7f99c10,c0510fe3,f7f99c40) at
> > > flush_inode_deps + 0xa3
> > > softdep_sync_metadata(f7f99ca8,0,c06da90f,124,0) at
> > > softdep_sync_metadata +0x87
> > > ffs_fsync(f7f99ca8,0,c06d0c8b,beb,0) at ffs_fsync +0x3b9
> > > fsync(c7c224780,f7f99d14,c06e15c0,3ee,1) at fsync +0x151
> > > syscall(80e002f,bfbf002f,bfbf0028,0,80f57e0) at syscall +0x2a0
> > > Xint0x80_syscall() at Xint0x80_syscall() +0x1d
> > > --- syscall (95), eip=0x282a89af, esp=0xbfbfa10c, ebp=0xbfbfba68 ---
> > > 
> > > 
> > > The systems in question are mail servers that act as gateways (no local
> > > delivery) running mimedefang (2.39 - 2.42) with spamassassin. The work
> > > directory is not swap/memory mounted but rather on
> > > /var/spool/MIMEDefang. The frequency of these messages increases when
> > > bayes filtering is added (as the bayes tokens db file also resides on
> > > the same filesystem/directory).
> > > 
> > > I have read that it may be that getdirtybuf() was passed a corrupt
> > > buffer header; has anything further ever been made of this and if not,
> > > where/how do I start to help contributing to finding a solution?
> > 
> > I have yet to see a resolution to this issue. I am now running all the
> > boxen using 5.2.1-Release-P8 with perl 5.8.4 and all ports upgraded.
> > 
> > I have created 256MB Ramdisks on each machine that MIMEDefang now uses
> > for it's temp files and bayesian database but, if anything, the
> > frequency of backtraces has actually increased, rather than decreased.
> > 
> > What do I need to do to further delineate this issue? For me this is a
> > showstopper as it will occasionally cause a panic/reboot. I have these
> > machines clustered so as not to interrupt services but it is slowly
> > becoming frustrating as the machines are bailing under heavy traffic.
> > Is there any output I can provide or diagnostics I can run to help find
> > a solution?
> > 
> > Sven
> > 
> 
> Would this have anything to do with background fscking? or is the bgfsck
> only run once at bootup[+delay] if the system determines if it is
> needed? I am trying to find some commmon factor here and the only thing
> I can find is that during heavy incoming mail load (when many perl
> proceses courtesy of MIMEDefang are running) the kernel creates the
> backtrace. This is still odd because all the temp files are on a RAMdisk
> (malloc-based) - is it possible that softupdates is trying to fsync
> either swap and/or other memory devices? The following is a typical
> layout of the boxes in question:
> 
> /dev/da0s1a on / (ufs, local)
> devfs on /dev (devfs, local)
> /dev/da0s1e on /tmp (ufs, local, soft-updates)
> /dev/da0s1f on /usr (ufs, local, soft-updates)
> /dev/da0s1d on /var (ufs, local, soft-updates)
> /dev/md10 on /var/spool/MIMEDefang (ufs, local)
> 
> where the ramdisk is configured with mdconfig -a -t malloc -s 256m -u 10


Doing more research on this I see that there were in fact issues with
ffs_softdep.c which were fixed by forcing a flush rather than panic the
system if an assertion (?) or call to getdirtybuf() failed. Is it
possible that a case was missed? The error refers to:

at getdirtybuf +0x30

how do I go about determining specifically what part of the code that
refers to? I am trying to debug this problem but need some help here in
terms of exactly *how* to do this. Anyone? ... Buehler?

Again I suspect this has something to do with memory devices, .snap
directories, and/or swap-based filesystems.

Sven