Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Mar 2015 00:25:19 -0400 (EDT)
From:      Benjamin Kaduk <kaduk@MIT.EDU>
To:        Da Rock <freebsd-fs@herveybayaustralia.com.au>
Cc:        freebsd-fs@freebsd.org, mckusick@freebsd.org
Subject:   Re: Delete a directory, crash the system
Message-ID:  <alpine.GSO.1.10.1503250018030.22210@multics.mit.edu>
In-Reply-To: <5511D807.3040606@herveybayaustralia.com.au>
References:  <CAHAXwYDPMrdY-TP-5T1_6M_ot4gY09jo2_Wi_REOmE=%2Bu%2B_QuQ@mail.gmail.com> <CAGwOe2byRc4LVsyxvTJgxNGCbhvOEaeDXjmFJ7DoXThPQe1bcQ@mail.gmail.com> <CAHAXwYCj9AV8ZcDffNNGx-ivL=h_TK9zLQRTPknArX25HSfEag@mail.gmail.com> <CAGwOe2YCDRqHudovDB_Kz9WHppvB8v2L%2B0gkDnWgG88bgZTKSA@mail.gmail.com> <CAHAXwYCnRDQqgRcvaEE1BmSJYYOidoQzzUoHX_QWdyJzYO3kKw@mail.gmail.com> <551007DD.5020109@herveybayaustralia.com.au> <alpine.GSO.1.10.1503231049050.22210@multics.mit.edu> <5510B995.8060307@herveybayaustralia.com.au> <alpine.GSO.1.10.1503241014270.22210@multics.mit.edu> <5511D807.3040606@herveybayaustralia.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 24 Mar 2015, Da Rock wrote:

> On 03/25/15 00:16, Benjamin Kaduk wrote:
> > On Mon, 23 Mar 2015, Da Rock wrote:
> >
> > > Unfortunately, fsck isn't helping - foreground or otherwise. All it shows
> > > on
> > > every single fs is inode 4 recovery which doesn't sound quite right. And
> > Have you posted the exact output in a previous message (could you send a
> > link)?
> Not precisely, but the message is just a flash and there is no copying of it.
> Anyway, inode 4 is the .sujournal file as expected; this means there is an
> issue with the softupdates. Could this be narrowing it down (the OP to this
> was also in this age of enlightenment, SU came in with 8.x didn't it?)?

Ah, SU+J could be quite relevant.  Soft-update journalling was enabled by
default for a period of time, but I believe it was disabled because there
were some scenarios where it was destabilizing.  CC-ing Kirk to improve on
my lousy memory.

Do you remember what version was used to install the system in question
(i.e., create the filesystem in question)?  Please show the output of
'tunefs -p <filesystem>'

> > > again, it is only showing during updates to ports being built. I'm
> > Er, what is only showing up?  The panics?
> > Surely you are not only running fsck while building ports...
> Yes, the panics.
>
> Sorry, I thought that was obvious seeing as the alternative is impossible :)
> >
> > > investigating further, but it may be just a corrupt file in pkg system.
> > >
> > > Incidentally, I'm not suggesting an absolute fix for the issue as such,
> > > but a
> > > better means of handling it rather than crashing the system. The posts on
> > > this
> > Understood.  But, there will always be some types of error which are truly
> > unrecoverable, and there is no real option other than to panic.  (Which is
> > not to say that your situation is necessarily one of them.)
> That I get, and given this may be an issue with SU it may well be warranted.
> What can we do to narrow this down, as obviously one cannot be sitting
> watching exactly what happens for the hours required while building ports.
> Your bound to look away for just a second and miss it even if you did try! :D
> >
> > > If I discover anything more I'll keep everyone posted :)
> So I did some fiddling with fsck, fsdb, find and stat; and got nowhere. I ran
> fsck again and it gave me not much again. It did hint at some files in the
> ports tree, so I cleaned up the ports tree to fresh install point, ran fsck
> again and rebooted. So far so good, but I'm keeping my fingers crossed still.

It is probably important to note that 'fsck -F' and saying 'no' to "USE
JOURNAL?" is the most relevant fsck invocation.

> This doesn't help the panics - they're still a pita when they happen. It does
> help me resolve the issue this time though. But initiating this error in
> testing is damn near impossible. What can we document here as a way to gather
> data to determine how to resolve this issue? Given my luck with this, its
> bound to happen again at some point :)

I think actual diagnostic is beyond my expertise/time committment at the
moment.  I suspect that using tunefs to disable softupdate journalling
will be a workaround, if that is what you are really interested.

I'll let Kirk decide if he wants to debug more, but the answer may well be
"no" if you're not running the latest ufs from -current.

-Ben



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.GSO.1.10.1503250018030.22210>