Date: Wed, 28 Dec 2011 00:57:55 -0700 From: Scott Long <scottl@samsco.org> To: David Thiel <lx@redundancy.redundancy.org> Cc: freebsd-current@freebsd.org, d@delphij.net Subject: Re: SU+J systems do not fsck themselves Message-ID: <9DAD04BE-D330-4DC8-9307-597834EEA2CA@samsco.org> In-Reply-To: <20111228073442.GM45484@redundancy.redundancy.org> References: <20111227215330.GI45484@redundancy.redundancy.org> <CAGMYy3t3Rv006qvBCHr4kdbM86andkr5mRkvaGYw5CETO1XHkg@mail.gmail.com> <20111227223638.GK45484@redundancy.redundancy.org> <4EFA4B4E.201@delphij.net> <20111228051404.GL45484@redundancy.redundancy.org> <6F3ACDEE-B656-46D0-AB11-FF1B23E70A27@samsco.org> <20111228073442.GM45484@redundancy.redundancy.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 28, 2011, at 12:34 AM, David Thiel wrote: > On Tue, Dec 27, 2011 at 11:54:20PM -0700, Scott Long wrote: >> The first run of fsck, using the journal, gives results that I would=20= >> expect. The second run seems to imply that the fixes made on the=20 >> first run didn't actually get written to disk. This is definitely an=20= >> oddity. I see that you're using geli, maybe there's some strange=20 >> side-effect there. No idea. Report as a bug, this is definitely=20 >> undesired behavior. >=20 > Not impossible, but I was seeing similar issues on two non-geli = systems=20 > as well, i.e. tons of errors fixed when doing a single-user=20 > non-journalled fsck, but journalled fsck not fixing stuff. I'll try to=20= > replicate on a test machine, as I already lost data on the last=20 > (non-geli) machine this happened to. >=20 >> For the love that is all good and holy, don't ever run fsck on a live=20= >> filesystem. It's going to report these kinds of problems! It's=20 >> normal; filesystem metadata updates stay cached in memory, and fsck=20= >> bypasses that cache. =20 >=20 > Ok. I expected fsck would be softupdate-aware in that way, but I=20 > understand it not doing so. >=20 >>> - SU+J and fsck do not work correctly together to fix corruption on=20= >>> boot, i.e. bgfsck isn't getting run when it should >>=20 >> The point of SUJ is to eliminate the need for bgfsck. Effectively,=20= >> they are exclusive ideas. =20 >=20 > This is surprising to me. It is my impression that under Linux at = least,=20 > ext3fs is checked against the journal, and gets a full e2fsck if it=20 > finds it's still dirty. Additionally, there's a periodic fsck after = 180=20 > days continuous runtime or x number of mounts (see tune2fs -i and -c). = =20 > Is SU+J somehow implemented in such a way that this is unnecessary? = What=20 > does it do that the ext3fs people have missed? >=20 SUJ isn't like ext3 journaling, it doesn't do 100% metadata logging. = Instead, it's an extension of softupdates. Softupdates (SU) is still = responsible for ordering dependent writes to the disk to maintain = consistency. What SU can't handle is the Unix/POSIX idiom of unlinking = a file from the namespace but keeping its inode active through = refcounts. When you have an unclean shutdown, you wind up with stale = blocks allocated to orphaned inodes. The point of bgfsck was to scan = the filesystem for these allocations and free them, just like fsck does, = but to do it in the background so that the boot could continue. SUJ is = basically just an intent log for this case; it tells fsck where to find = these allocations so that fsck doesn't have to do the lengthy scan. = FWIW, this problem is present in most any journaling implementation and = is usually solved via the use of intent records in a journal, not unlike = SUJ. So, there's an assumption with SUJ+fsck that SU is keeping the = filesystem consistent. Maybe that's a bad assumption, and I'm not = trying to discredit your report. But the intention with SUJ is to = eliminate the need for anything more than a cursory check of the = superblocks and a processing of the SUJ intent log. If either of these = fails then fsck reverts to a traditional scan. In the same vein, ext3 = and most other traditional journaling filesystems assume that the = journal is correct and is preserving consistency, and don't do anything = more than a cursory data structure scan and journal replay as well, but = then revert to a full scan if that fails (zfs seems to be an exception = here, with there being no actual fsck available for it). As for the 180 day forced scan on ext3, I have no public comment. SU = has matured nicely over the last 10+ years, and I'm happy with the = progress that SUJ has made in the last 2-3 years. If there are bugs, = they need to be exposed and addressed ASAP. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9DAD04BE-D330-4DC8-9307-597834EEA2CA>