From owner-freebsd-fs@FreeBSD.ORG Wed Mar 25 17:12:11 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DEE68563 for ; Wed, 25 Mar 2015 17:12:11 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BB84A64B for ; Wed, 25 Mar 2015 17:12:11 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id t2PHC1R8090290; Wed, 25 Mar 2015 10:12:01 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201503251712.t2PHC1R8090290@chez.mckusick.com> To: Benjamin Kaduk Subject: Re: Delete a directory, crash the system In-reply-to: Date: Wed, 25 Mar 2015 10:12:01 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Mar 2015 17:12:12 -0000 > Date: Wed, 25 Mar 2015 00:25:19 -0400 (EDT) > From: Benjamin Kaduk > To: Da Rock > Subject: Re: Delete a directory, crash the system > Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org > > On Tue, 24 Mar 2015, Da Rock wrote: > >> On 03/25/15 00:16, Benjamin Kaduk wrote: >> Not precisely, but the message is just a flash and there is no copying of it. >> Anyway, inode 4 is the .sujournal file as expected; this means there is an >> issue with the softupdates. Could this be narrowing it down (the OP to this >> was also in this age of enlightenment, SU came in with 8.x didn't it?)? > > Ah, SU+J could be quite relevant. Soft-update journalling was enabled by > default for a period of time, but I believe it was disabled because there > were some scenarios where it was destabilizing. CC-ing Kirk to improve on > my lousy memory. As far as I know SU+J is still on by default. > Do you remember what version was used to install the system in question > (i.e., create the filesystem in question)? Please show the output of > 'tunefs -p ' > >> So I did some fiddling with fsck, fsdb, find and stat; and got nowhere. I ran >> fsck again and it gave me not much again. It did hint at some files in the >> ports tree, so I cleaned up the ports tree to fresh install point, ran fsck >> again and rebooted. So far so good, but I'm keeping my fingers crossed still. > > It is probably important to note that 'fsck -F' and saying 'no' to "USE > JOURNAL?" is the most relevant fsck invocation. > >> This doesn't help the panics - they're still a pita when they happen. It does >> help me resolve the issue this time though. But initiating this error in >> testing is damn near impossible. What can we document here as a way to gather >> data to determine how to resolve this issue? Given my luck with this, its >> bound to happen again at some point :) > > I think actual diagnostic is beyond my expertise/time committment at the > moment. I suspect that using tunefs to disable softupdate journalling > will be a workaround, if that is what you are really interested. > > I'll let Kirk decide if he wants to debug more, but the answer may well be > "no" if you're not running the latest ufs from -current. > > -Ben The suggestion to disable journalling is a good one. Journalling fixes only consistency errors that it knows about and cannot handle media errors. The sorts of panics you are getting are usually caused by media errors. So disabling journally and checking all metadata after crashes (which is what fsck does) should minimize your problems. Kirk McKusick