From owner-freebsd-fs@FreeBSD.ORG Mon Jul 19 20:41:26 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02D821065678 for ; Mon, 19 Jul 2010 20:41:26 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [76.96.30.80]) by mx1.freebsd.org (Postfix) with ESMTP id DDA168FC1B for ; Mon, 19 Jul 2010 20:41:25 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta08.emeryville.ca.mail.comcast.net with comcast id jnit1e0020EPchoA8whRFw; Mon, 19 Jul 2010 20:41:25 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta01.emeryville.ca.mail.comcast.net with comcast id jwhQ1e0053LrwQ28MwhQkv; Mon, 19 Jul 2010 20:41:24 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 0D5649B425; Mon, 19 Jul 2010 13:41:24 -0700 (PDT) Date: Mon, 19 Jul 2010 13:41:24 -0700 From: Jeremy Chadwick To: "Mikhail T." Message-ID: <20100719204124.GA21573@icarus.home.lan> References: <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> <4C44758F.7080209@aldan.algebra.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4C44758F.7080209@aldan.algebra.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: panic: handle_written_inodeblock: bad size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 20:41:26 -0000 On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote: > 19.07.2010 07:31, Jeremy Chadwick написав(ла): > >If you boot the machine in single-user, and run fsck manually, are there > >any errors? > Thanks, Jeremy... I wish, there was a way to learn, /which/ > file-system is giving trouble... However, after sending the question > out last night, I tried to pkg_delete a package on the machine, and > was very lucky to see a file-system error (inode something or other) > before the panic struck. That, at least, told me, which file-system > was in trouble (/var). I dump-ed it out, re-created, and then > restored it... Although dumping went smooth, there were two errors > at which restore offered to abort. I told it not to and got (most of > the) file-system restored. (The dump is available to anyone wishing > to investigate -- contact me privately. I'm not posting it publicly > because of the passwd-file backup under /var). I see where you're going with this -- the only way you knew it was /var was based on the inode error you saw before the system crashed. > So far seems quiet -- no panics for two more hours before I went to bed. > >Only thing I can think of off the top of my head: there's a known > >situation (also applies to RELENG_7) where a background fsck doesn't > >correct all errors after a system crash/unclean shutdown. I mention > >this because I see "softdep" in the above stack trace (usually refers to > >softupdates). I don't know if this got fixed, but the workaround is to > >use background_fsck="no" in rc.conf. Yes, after a crash this means you > >have to wait for the entire fsck to run. > When setting up my main machine 4 years ago, I turned off background > fsck... But I thought, things have improved sufficiently enough > since then :-( Maybe, background fsck should still be disabled by > default? > > And, IMO, at the very least, *any panic related to a file-system > must clearly identify the file-system in question*... What do you > think? I think that's a reasonable request and would be ideal for situations like this. If it's possible or not is a completely different question. I might be able to code something like this up (would be my first time messing around in the kernel in this regard -- warning, alert, hazard, danger Will Robinson!), but it'd probably make more sense for someone already familiar with that section of code to do it. I'm much more familiar with userland stuff, the kernel is "black magic". ;-) Assuming work tonight isn't that busy for me, I'll see if I can dedicate some cycles to printing this information in the error string you saw. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |