From owner-freebsd-current@FreeBSD.ORG Wed Jun 16 15:54:19 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 60AA116A4CE for ; Wed, 16 Jun 2004 15:54:19 +0000 (GMT) Received: from smtp-gw-cl-d.dmv.com (smtp-gw-cl-d.dmv.com [216.240.97.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0CE6043D49 for ; Wed, 16 Jun 2004 15:54:19 +0000 (GMT) (envelope-from sven@dmv.com) Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46]) i5GFrSRv065112 for ; Wed, 16 Jun 2004 11:53:28 -0400 (EDT) (envelope-from sven@dmv.com) From: Sven Willenberger To: freebsd-current@freebsd.org In-Reply-To: <1087305362.15171.8.camel@lanshark.dmv.com> References: <1087234185.13429.19.camel@lanshark.dmv.com> <1087305362.15171.8.camel@lanshark.dmv.com> Content-Type: text/plain Date: Wed, 16 Jun 2004 11:52:30 -0400 Message-Id: <1087401150.1437.6.camel@lanshark.dmv.com> Mime-Version: 1.0 X-Mailer: Evolution 1.5.9 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.39 Subject: Re: Softupdate/kernel panic ffs_fsync X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2004 15:54:19 -0000 On Tue, 2004-06-15 at 09:16 -0400, Sven Willenberger wrote: > On Mon, 2004-06-14 at 13:29 -0400, Sven Willenberger wrote: > > Once upon a time I wrote: > > > > > I have seen a few (unresolved) questions similar to this searching > > > (google|archives). On a 5.2.1-Release-P2 system (actually a couple with > > > essentially identical configs) I get the following Stack Backtrace > > > messages: > > > > > > backtrace(c070cbf8,2,e5b3af60,0,22) at backtrace +0x17 > > > getdirtybuf(f7f99bbc,0,1,e5b3a,f60,1) at getdirtybuf +0x30 > > > flush_deplist(c724e64c,1,f7f99be4,f7f99be8,0) at flush_deplist +0x43 > > > flush_inode_deps(c6c35000,5c108,f7f99c10,c0510fe3,f7f99c40) at > > > flush_inode_deps + 0xa3 > > > softdep_sync_metadata(f7f99ca8,0,c06da90f,124,0) at > > > softdep_sync_metadata +0x87 > > > ffs_fsync(f7f99ca8,0,c06d0c8b,beb,0) at ffs_fsync +0x3b9 > > > fsync(c7c224780,f7f99d14,c06e15c0,3ee,1) at fsync +0x151 > > > syscall(80e002f,bfbf002f,bfbf0028,0,80f57e0) at syscall +0x2a0 > > > Xint0x80_syscall() at Xint0x80_syscall() +0x1d > > > --- syscall (95), eip=0x282a89af, esp=0xbfbfa10c, ebp=0xbfbfba68 --- > > > > > > > > > The systems in question are mail servers that act as gateways (no local > > > delivery) running mimedefang (2.39 - 2.42) with spamassassin. The work > > > directory is not swap/memory mounted but rather on > > > /var/spool/MIMEDefang. The frequency of these messages increases when > > > bayes filtering is added (as the bayes tokens db file also resides on > > > the same filesystem/directory). > > > > > > I have read that it may be that getdirtybuf() was passed a corrupt > > > buffer header; has anything further ever been made of this and if not, > > > where/how do I start to help contributing to finding a solution? > > > > I have yet to see a resolution to this issue. I am now running all the > > boxen using 5.2.1-Release-P8 with perl 5.8.4 and all ports upgraded. > > > > I have created 256MB Ramdisks on each machine that MIMEDefang now uses > > for it's temp files and bayesian database but, if anything, the > > frequency of backtraces has actually increased, rather than decreased. > > > > What do I need to do to further delineate this issue? For me this is a > > showstopper as it will occasionally cause a panic/reboot. I have these > > machines clustered so as not to interrupt services but it is slowly > > becoming frustrating as the machines are bailing under heavy traffic. > > Is there any output I can provide or diagnostics I can run to help find > > a solution? > > > > Sven > > > > Would this have anything to do with background fscking? or is the bgfsck > only run once at bootup[+delay] if the system determines if it is > needed? I am trying to find some commmon factor here and the only thing > I can find is that during heavy incoming mail load (when many perl > proceses courtesy of MIMEDefang are running) the kernel creates the > backtrace. This is still odd because all the temp files are on a RAMdisk > (malloc-based) - is it possible that softupdates is trying to fsync > either swap and/or other memory devices? The following is a typical > layout of the boxes in question: > > /dev/da0s1a on / (ufs, local) > devfs on /dev (devfs, local) > /dev/da0s1e on /tmp (ufs, local, soft-updates) > /dev/da0s1f on /usr (ufs, local, soft-updates) > /dev/da0s1d on /var (ufs, local, soft-updates) > /dev/md10 on /var/spool/MIMEDefang (ufs, local) > > where the ramdisk is configured with mdconfig -a -t malloc -s 256m -u 10 Doing more research on this I see that there were in fact issues with ffs_softdep.c which were fixed by forcing a flush rather than panic the system if an assertion (?) or call to getdirtybuf() failed. Is it possible that a case was missed? The error refers to: at getdirtybuf +0x30 how do I go about determining specifically what part of the code that refers to? I am trying to debug this problem but need some help here in terms of exactly *how* to do this. Anyone? ... Buehler? Again I suspect this has something to do with memory devices, .snap directories, and/or swap-based filesystems. Sven