Date: Tue, 27 Dec 2011 23:54:44 -0700 From: Scott Long <scottl@samsco.org> To: David Thiel <lx@redundancy.redundancy.org> Cc: freebsd-current@freebsd.org, d@delphij.net Subject: Re: SU+J systems do not fsck themselves Message-ID: <6F3ACDEE-B656-46D0-AB11-FF1B23E70A27@samsco.org> In-Reply-To: <20111228051404.GL45484@redundancy.redundancy.org> References: <20111227215330.GI45484@redundancy.redundancy.org> <CAGMYy3t3Rv006qvBCHr4kdbM86andkr5mRkvaGYw5CETO1XHkg@mail.gmail.com> <20111227223638.GK45484@redundancy.redundancy.org> <4EFA4B4E.201@delphij.net> <20111228051404.GL45484@redundancy.redundancy.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 27, 2011, at 10:14 PM, David Thiel wrote: > On Tue, Dec 27, 2011 at 02:48:22PM -0800, Xin Li wrote: >>>> - use journalled fsck; - use normal fsck to check if the >>>> journalled fsck did the right thing. >=20 > Ok, here is the log of fsck with and without journal. >=20 > http://redundancy.redundancy.org/fscklog3 >=20 The first run of fsck, using the journal, gives results that I would = expect. The second run seems to imply that the fixes made on the first = run didn't actually get written to disk. This is definitely an oddity. = I see that you're using geli, maybe there's some strange side-effect = there. No idea. Report as a bug, this is definitely undesired = behavior. > That was done the very next boot, after a clean shutdown. The errors=20= > from the previous live fsck aren't there (oddly), but there are still=20= > are apparently some corrections made. The next fsck still complains, = but=20 > doesn't give any salvage prompts. >=20 > Here is jsa@'s, done on a live FS with SU+J: >=20 > http://redundancy.redundancy.org/fscklog4 >=20 For the love that is all good and holy, don't ever run fsck on a live = filesystem. It's going to report these kinds of problems! It's normal; = filesystem metadata updates stay cached in memory, and fsck bypasses = that cache. Also, what you see in your log is a file that has been = unlinked but held open. This is a common Unix idiom, and one that gets = cleaned up by fsck on reboot, whether through the SUJ intent log = processing or through a traditional fsck. > I'm not actually looking to solve my particular problem per se. The=20 > issue is that almost everyone I've checked with that's running SU+J = gets=20 > unref'd file and other errors when they check their filesystem (with = the=20 > fs live). Unless I'm missing something, a running FS should never have=20= > those kinds of errors unless you deliberately disabled fsck. >=20 Nope, you are completely incorrect here. > This leaves only a couple options: >=20 > - SU+J and fsck do not work correctly together to fix corruption on=20 > boot, i.e. bgfsck isn't getting run when it should The point of SUJ is to eliminate the need for bgfsck. Effectively, they = are exclusive ideas. It's possible that there are still problems with = SUJ and how fsck processes and commits the journal entires. However, = bgfsck has nothing to do with this, and I'd also like to know if your = use of geli is complicating the problem. > - Stuff is getting completely screwed up after boot Possibly but unlikely > - fsck is giving incorrect results Very unlikely > - I'm completely clueless about how SU+J is supposed to behave or be=20= > deployed No comment =3D-) >=20 > I'm pretty certain that the first is the issue here. It would be great=20= > if others could check their own SU+J filesystems so we could get a few=20= > more data points. >=20 Indeed, more data is needed. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6F3ACDEE-B656-46D0-AB11-FF1B23E70A27>