From owner-freebsd-current@FreeBSD.ORG Wed Apr 30 19:18:23 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F4DB37B401 for ; Wed, 30 Apr 2003 19:18:23 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C16743FBD for ; Wed, 30 Apr 2003 19:18:22 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0055.cvx22-bradley.dialup.earthlink.net ([209.179.198.55] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19B3eK-0006So-00; Wed, 30 Apr 2003 19:18:17 -0700 Message-ID: <3EB081EA.B32CDE2F@mindspring.com> Date: Wed, 30 Apr 2003 19:09:46 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Lars Eggert References: <20030430174441.GA22732@node1.cluster.srrc.usda.gov> <20030430194742.GA20357@schweikhardt.net> <200304301531.30642.kstewart@owt.com> <3EB05C00.E390FF9B@mindspring.com> <3EB06C57.4050801@isi.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b5eb683271c05cedb01e6c3208c15157a8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c cc: current@freebsd.org cc: Glenn Johnson Subject: Re: kernel crashes and portupgrade X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 May 2003 02:18:23 -0000 Lars Eggert wrote: > On 4/30/2003 4:28 PM, Terry Lambert wrote: > > If you are panic'ing, and it's repeatable, then you should > > minimally post: > > Done already: > > Message-ID: <3EAC5950.7040306@isi.edu> > Date: Sun, 27 Apr 2003 15:27:28 -0700 > From: Lars Eggert > Subject: Re: Kernel panic during portupdrade [ffs_blkfree: > freeing free block] > > (Panic message was in an earlier post to the same thread.) FWIW, Message-ID does me no good; it's not a searchable field for me. If you are going to give me anything other than a URL for the message in the mailing list archive, ou probably want to give me (in order of importance): 1) The mailing list it was sent to 2) The date 3) The sender 4) The subject -- That's yours, not Kent's. It's pretty obvious from looking at your message and the code what's happening there: you are trying to free a frag of a block whose bit is not set in the cylinder group bitmap. To fix it, you have to ask yourself how it's even possibe to get that situation in the first place. Theoretically, this is not permitted to happen, because the CG bitmap is supposed to be written out last. Practically, there are several ways to cause this in -current; any one of them could be your culprit (e.g. you are running with the sched_sync() patches for fsync that were posted, or you crashed and used a BG fsck instead of a full fsck, and trusted it to do the right thing, etc.). Let's assume that none of those are true at this point, and that you can repeat the problem after doing a full fsck on the FS in question from sngle user mode, and rebooting. So... The first question we need to answer is why sched_sync is your callout in fork_exit(); seems pretty daft to me. I would think this was indicative of stack corruption... or, it's indicative of something being allowed to run tat shouldn't run while a cleanup is in pogress, but not yet committed to the soft updates list (meaning the CG bit should have been set, but wasn't). Permit me to suspect 1.193 and 1.192 of /sys/kern/kern_fork.c, and 1.442 and 1.443 of /sys/kern/vfs_subr.c; particularly, the conversion from tsleep() to msleep(). A possible workaround might be to modify fork_exit(); there's code in the function that reads: if (PCPU_GET(switchtime.sec) == 0) binuptime(PCPU_PTR(switchtime)); PCPU_SET(switchticks, ticks); mtx_unlock_spin(&sched_lock); /* * cpu_set_fork_handler intercepts this function call to * have this call a non-return function to stay in kernel mode. * initproc has its own fork handler, but it does return. */ KASSERT(callout != NULL, ("NULL callout in fork_exit")); callout(arg, frame); Change it to read: if (PCPU_GET(switchtime.sec) == 0) binuptime(PCPU_PTR(switchtime)); PCPU_SET(switchticks, ticks); /* * cpu_set_fork_handler intercepts this function call to * have this call a non-return function to stay in kernel mode. * initproc has its own fork handler, but it does return. */ KASSERT(callout != NULL, ("NULL callout in fork_exit")); callout(arg, frame); mtx_unlock_spin(&sched_lock); Instead. Let me know what happens; it will probably complain about an LOR or a lock being held that's "not supposed to be held, because otherwise the kernel wouldn't panic" or whatever... -- Terry