From owner-freebsd-fs@FreeBSD.ORG Sat Mar 28 12:22:39 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 08018A10 for ; Sat, 28 Mar 2015 12:22:39 +0000 (UTC) Received: from mail.unitedinsong.com.au (mail.unitedinsong.com.au [150.101.178.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A71FBC7F for ; Sat, 28 Mar 2015 12:22:37 +0000 (UTC) Received: from [192.168.0.183] (laptop1.herveybayaustralia.com.au [192.168.0.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.unitedinsong.com.au (Postfix) with ESMTPSA id BF91C620B2; Sat, 28 Mar 2015 22:22:27 +1000 (EST) Message-ID: <55169D02.8090107@herveybayaustralia.com.au> Date: Sat, 28 Mar 2015 22:22:26 +1000 From: Da Rock User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Kirk McKusick , Benjamin Kaduk Subject: Re: Delete a directory, crash the system References: <201503251712.t2PHC1R8090290@chez.mckusick.com> In-Reply-To: <201503251712.t2PHC1R8090290@chez.mckusick.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Mar 2015 12:22:39 -0000 On 26/03/2015 03:12, Kirk McKusick wrote: >> Date: Wed, 25 Mar 2015 00:25:19 -0400 (EDT) >> From: Benjamin Kaduk >> To: Da Rock >> Subject: Re: Delete a directory, crash the system >> Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org >> >> On Tue, 24 Mar 2015, Da Rock wrote: >> >>> On 03/25/15 00:16, Benjamin Kaduk wrote: >>> Not precisely, but the message is just a flash and there is no copying of it. >>> Anyway, inode 4 is the .sujournal file as expected; this means there is an >>> issue with the softupdates. Could this be narrowing it down (the OP to this >>> was also in this age of enlightenment, SU came in with 8.x didn't it?)? >> Ah, SU+J could be quite relevant. Soft-update journalling was enabled by >> default for a period of time, but I believe it was disabled because there >> were some scenarios where it was destabilizing. CC-ing Kirk to improve on >> my lousy memory. > As far as I know SU+J is still on by default. > >> Do you remember what version was used to install the system in question >> (i.e., create the filesystem in question)? Please show the output of >> 'tunefs -p ' >> >>> So I did some fiddling with fsck, fsdb, find and stat; and got nowhere. I ran >>> fsck again and it gave me not much again. It did hint at some files in the >>> ports tree, so I cleaned up the ports tree to fresh install point, ran fsck >>> again and rebooted. So far so good, but I'm keeping my fingers crossed still. >> It is probably important to note that 'fsck -F' and saying 'no' to "USE >> JOURNAL?" is the most relevant fsck invocation. >> >>> This doesn't help the panics - they're still a pita when they happen. It does >>> help me resolve the issue this time though. But initiating this error in >>> testing is damn near impossible. What can we document here as a way to gather >>> data to determine how to resolve this issue? Given my luck with this, its >>> bound to happen again at some point :) >> I think actual diagnostic is beyond my expertise/time committment at the >> moment. I suspect that using tunefs to disable softupdate journalling >> will be a workaround, if that is what you are really interested. >> >> I'll let Kirk decide if he wants to debug more, but the answer may well be >> "no" if you're not running the latest ufs from -current. >> >> -Ben > The suggestion to disable journalling is a good one. Journalling fixes > only consistency errors that it knows about and cannot handle media errors. > The sorts of panics you are getting are usually caused by media errors. > So disabling journally and checking all metadata after crashes (which is > what fsck does) should minimize your problems. So my only option for journal is gjournal (slow) or zfs (memory hog) to maintain consistency; is that it? Incidentally, why keep SU+J on as default then? Wouldn't this be considered a bug still, then?