From owner-freebsd-current@FreeBSD.ORG Wed Dec 28 07:14:18 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 65807106564A for ; Wed, 28 Dec 2011 07:14:18 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 2B8DF8FC08 for ; Wed, 28 Dec 2011 07:14:17 +0000 (UTC) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.5/8.14.5) with ESMTP id pBS6siaq035502; Tue, 27 Dec 2011 23:54:44 -0700 (MST) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: <20111228051404.GL45484@redundancy.redundancy.org> Date: Tue, 27 Dec 2011 23:54:44 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6F3ACDEE-B656-46D0-AB11-FF1B23E70A27@samsco.org> References: <20111227215330.GI45484@redundancy.redundancy.org> <20111227223638.GK45484@redundancy.redundancy.org> <4EFA4B4E.201@delphij.net> <20111228051404.GL45484@redundancy.redundancy.org> To: David Thiel X-Mailer: Apple Mail (2.1251.1) X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: freebsd-current@freebsd.org, d@delphij.net Subject: Re: SU+J systems do not fsck themselves X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Dec 2011 07:14:18 -0000 On Dec 27, 2011, at 10:14 PM, David Thiel wrote: > On Tue, Dec 27, 2011 at 02:48:22PM -0800, Xin Li wrote: >>>> - use journalled fsck; - use normal fsck to check if the >>>> journalled fsck did the right thing. >=20 > Ok, here is the log of fsck with and without journal. >=20 > http://redundancy.redundancy.org/fscklog3 >=20 The first run of fsck, using the journal, gives results that I would = expect. The second run seems to imply that the fixes made on the first = run didn't actually get written to disk. This is definitely an oddity. = I see that you're using geli, maybe there's some strange side-effect = there. No idea. Report as a bug, this is definitely undesired = behavior. > That was done the very next boot, after a clean shutdown. The errors=20= > from the previous live fsck aren't there (oddly), but there are still=20= > are apparently some corrections made. The next fsck still complains, = but=20 > doesn't give any salvage prompts. >=20 > Here is jsa@'s, done on a live FS with SU+J: >=20 > http://redundancy.redundancy.org/fscklog4 >=20 For the love that is all good and holy, don't ever run fsck on a live = filesystem. It's going to report these kinds of problems! It's normal; = filesystem metadata updates stay cached in memory, and fsck bypasses = that cache. Also, what you see in your log is a file that has been = unlinked but held open. This is a common Unix idiom, and one that gets = cleaned up by fsck on reboot, whether through the SUJ intent log = processing or through a traditional fsck. > I'm not actually looking to solve my particular problem per se. The=20 > issue is that almost everyone I've checked with that's running SU+J = gets=20 > unref'd file and other errors when they check their filesystem (with = the=20 > fs live). Unless I'm missing something, a running FS should never have=20= > those kinds of errors unless you deliberately disabled fsck. >=20 Nope, you are completely incorrect here. > This leaves only a couple options: >=20 > - SU+J and fsck do not work correctly together to fix corruption on=20 > boot, i.e. bgfsck isn't getting run when it should The point of SUJ is to eliminate the need for bgfsck. Effectively, they = are exclusive ideas. It's possible that there are still problems with = SUJ and how fsck processes and commits the journal entires. However, = bgfsck has nothing to do with this, and I'd also like to know if your = use of geli is complicating the problem. > - Stuff is getting completely screwed up after boot Possibly but unlikely > - fsck is giving incorrect results Very unlikely > - I'm completely clueless about how SU+J is supposed to behave or be=20= > deployed No comment =3D-) >=20 > I'm pretty certain that the first is the issue here. It would be great=20= > if others could check their own SU+J filesystems so we could get a few=20= > more data points. >=20 Indeed, more data is needed. Scott