Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jul 2010 06:49:31 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        "Mikhail T." <mi+thun@aldan.algebra.com>
Cc:        stable@freebsd.org, fs@freebsd.org
Subject:   Re: panic: handle_written_inodeblock: bad size
Message-ID:  <20100720134931.GA41352@icarus.home.lan>
In-Reply-To: <20100719204124.GA21573@icarus.home.lan>
References:  <4C43F35D.5020007@aldan.algebra.com> <20100719113147.GA4786@icarus.home.lan> <4C44758F.7080209@aldan.algebra.com> <20100719204124.GA21573@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 19, 2010 at 01:41:24PM -0700, Jeremy Chadwick wrote:
> On Mon, Jul 19, 2010 at 11:55:59AM -0400, Mikhail T. wrote:
> > 19.07.2010 07:31, Jeremy Chadwick написав(ла):
> > >If you boot the machine in single-user, and run fsck manually, are there
> > >any errors?
> > Thanks, Jeremy... I wish, there was a way to learn, /which/
> > file-system is giving trouble... However, after sending the question
> > out last night, I tried to pkg_delete a package on the machine, and
> > was very lucky to see a file-system error (inode something or other)
> > before the panic struck. That, at least, told me, which file-system
> > was in trouble (/var).
> > [...]
> > And, IMO, at the very least, *any panic related to a file-system
> > must clearly identify the file-system in question*... What do you
> > think?
>
> [...] 
> Assuming work tonight isn't that busy for me, I'll see if I can dedicate
> some cycles to printing this information in the error string you saw.

I spent some time on this tonight.  It's not as simple as it sounds, for
me anyway.  Relevant source bits:

src/sys/ufs/ffs/ffs_softdep.c
src/sys/ufs/ffs/fs.h
src/sys/ufs/ffs/softdep.h

ffs_softdep.c, which is almost 6500 lines, contains a large number of
inode-related functions which can call panic().  Functions which have
easy access to the related inodedep struct are the ones which would be
able to print this information easily.  Sort of.

struct inodedep (see softdep.h) contains a member called id_fs, which is
struct fs (see fs.h).  struct fs contains a member called fs_fsmnt (a
char buffer), which is the name of the mounted filesystem.  fs_fsmnt[0]
should be NULL ('\0') if the filesystem isn't mounted.

So in the case of your panic within handle_written_inodeblock(), it
would be as simple as something like:

	u_char *mntpt = NULL;

	if (inodedep->id_fs->fs_fsmnt[0] != '\0')
		mntpt = &inodedep->id_fs->fs_fsmnt;
	else
		/* XXX do what here? */

Then, the panic() statements later have to do something like this (taken
from real code):

	if (dp1->di_db[adp->ad_lbn]!=adp->ad_oldblkno)
		panic("%s: %s: %s #%jd mismatch %d != %jd",
			"handle_written_inodeblock",
			(mntpt ? mntpt) : "<unknown>",
			"direct pointer",
			(intmax_t)adp->ad_lbn,
			dp1->di_db[adp->ad_lbn],
			(intmax_t)adp->ad_oldblkno);

The panic message would look like one of the following:

panic: handle_written_inodeblock: /mnt: direct pointer #nnn mismatch nnn != nnn
panic: handle_written_inodeblock: <unknown>: direct pointer #nnn mismatch nnn != nnn

The "<unknown>" string there is a Bad Idea(tm); see below.

Secondly, this brings up the question: what happens if someone is doing
something like "fsck /var", where /var uses soft updates?  /var isn't
mounted when this happens.  Can these inode-related functions get called
during that time?  If so, fs_fsmnt would (in theory -- I haven't tested
in practise) be null.  So in that case, what should get printed as the
filesystem?  Well, this is where the "<unknown>" string comes into play.

My first answer was: "the name of the device/slice/etc. which the inode
is associated with".

The problem is that I couldn't find a way to get this information, as
it's not stored in struct fs anywhere.  One would have to change the
kernel ABI to pass this down the stack, which changes the ABI and is not
something I'm willing to do (plus there's performance implications as
you're passing something else on the stack per every call).  Of course
there may be a way to get this easily, but I don't see it or know of it.

Thirdly, and this is equally as important: given the repetitive nature
of this code (it would have to be repeated in numerous functions),
making a common function that populates a (global) variable with the
fsname its working on would be ideal.  But I don't know the implication
of this, nor do I see many (I think two?) global variables used within
softdep_ffs.c.

Extending one of the structs to get access to the necessary information
is not as simple as "just do it" -- there are implications when it comes
to memory usage and so on.  This is not a piece of code to bang on
lightly.

This should probably be discussed on freebsd-hackers, but cross-posting
across 3 separate mailing lists is rude.  If you want to drive this,
cool, but please start a new thread about the matter (wanting the
filesystem or device printed in panic() when things like filesystem
panics happen) on freebsd-hackers.  I'm not subscribed to that list, so
please CC me if you go this route.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100720134931.GA41352>