Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Dec 2011 00:57:55 -0700
From:      Scott Long <scottl@samsco.org>
To:        David Thiel <lx@redundancy.redundancy.org>
Cc:        freebsd-current@freebsd.org, d@delphij.net
Subject:   Re: SU+J systems do not fsck themselves
Message-ID:  <9DAD04BE-D330-4DC8-9307-597834EEA2CA@samsco.org>
In-Reply-To: <20111228073442.GM45484@redundancy.redundancy.org>
References:  <20111227215330.GI45484@redundancy.redundancy.org> <CAGMYy3t3Rv006qvBCHr4kdbM86andkr5mRkvaGYw5CETO1XHkg@mail.gmail.com> <20111227223638.GK45484@redundancy.redundancy.org> <4EFA4B4E.201@delphij.net> <20111228051404.GL45484@redundancy.redundancy.org> <6F3ACDEE-B656-46D0-AB11-FF1B23E70A27@samsco.org> <20111228073442.GM45484@redundancy.redundancy.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Dec 28, 2011, at 12:34 AM, David Thiel wrote:

> On Tue, Dec 27, 2011 at 11:54:20PM -0700, Scott Long wrote:
>> The first run of fsck, using the journal, gives results that I would=20=

>> expect.  The second run seems to imply that the fixes made on the=20
>> first run didn't actually get written to disk.  This is definitely an=20=

>> oddity.  I see that you're using geli, maybe there's some strange=20
>> side-effect there.  No idea.  Report as a bug, this is definitely=20
>> undesired behavior.
>=20
> Not impossible, but I was seeing similar issues on two non-geli =
systems=20
> as well, i.e. tons of errors fixed when doing a single-user=20
> non-journalled fsck, but journalled fsck not fixing stuff. I'll try to=20=

> replicate on a test machine, as I already lost data on the last=20
> (non-geli) machine this happened to.
>=20
>> For the love that is all good and holy, don't ever run fsck on a live=20=

>> filesystem.  It's going to report these kinds of problems!  It's=20
>> normal; filesystem metadata updates stay cached in memory, and fsck=20=

>> bypasses that cache. =20
>=20
> Ok. I expected fsck would be softupdate-aware in that way, but I=20
> understand it not doing so.
>=20
>>> - SU+J and fsck do not work correctly together to fix corruption on=20=

>>> boot, i.e. bgfsck isn't getting run when it should
>>=20
>> The point of SUJ is to eliminate the need for bgfsck.  Effectively,=20=

>> they are exclusive ideas. =20
>=20
> This is surprising to me. It is my impression that under Linux at =
least,=20
> ext3fs is checked against the journal, and gets a full e2fsck if it=20
> finds it's still dirty. Additionally, there's a periodic fsck after =
180=20
> days continuous runtime or x number of mounts (see tune2fs -i and -c). =
=20
> Is SU+J somehow implemented in such a way that this is unnecessary? =
What=20
> does it do that the ext3fs people have missed?
>=20

SUJ isn't like ext3 journaling, it doesn't do 100% metadata logging.  =
Instead, it's an extension of softupdates.  Softupdates (SU) is still =
responsible for ordering dependent writes to the disk to maintain =
consistency.  What SU can't handle is the Unix/POSIX idiom of unlinking =
a file from the namespace but keeping its inode active through =
refcounts.  When you have an unclean shutdown, you wind up with stale =
blocks allocated to orphaned inodes.  The point of bgfsck was to scan =
the filesystem for these allocations and free them, just like fsck does, =
but to do it in the background so that the boot could continue.  SUJ is =
basically just an intent log for this case; it tells fsck where to find =
these allocations so that fsck doesn't have to do the lengthy scan.  =
FWIW, this problem is present in most any journaling implementation and =
is usually solved via the use of intent records in a journal, not unlike =
SUJ.

So, there's an assumption with SUJ+fsck that SU is keeping the =
filesystem consistent.  Maybe that's a bad assumption, and I'm not =
trying to discredit your report.  But the intention with SUJ is to =
eliminate the need for anything more than a cursory check of the =
superblocks and a processing of the SUJ intent log.  If either of these =
fails then fsck reverts to a traditional scan.  In the same vein, ext3 =
and most other traditional journaling filesystems assume that the =
journal is correct and is preserving consistency, and don't do anything =
more than a cursory data structure scan and journal replay as well, but =
then revert to a full scan if that fails (zfs seems to be an exception =
here, with there being no actual fsck available for it).

As for the 180 day forced scan on ext3, I have no public comment.  SU =
has matured nicely over the last 10+ years, and I'm happy with the =
progress that SUJ has made in the last 2-3 years.  If there are bugs, =
they need to be exposed and addressed ASAP.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9DAD04BE-D330-4DC8-9307-597834EEA2CA>