From owner-freebsd-fs@FreeBSD.ORG Wed Mar 25 09:25:07 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6EBD0C6; Wed, 25 Mar 2015 09:25:07 +0000 (UTC) Received: from mail.unitedinsong.com.au (mail.unitedinsong.com.au [150.101.178.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C1B28E69; Wed, 25 Mar 2015 09:25:06 +0000 (UTC) Received: from laptop2.herveybayaustralia.com.au (laptop2.herveybayaustralia.com.au [192.168.0.185]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mail.unitedinsong.com.au (Postfix) with ESMTPSA id 61E15620A1; Wed, 25 Mar 2015 19:24:55 +1000 (EST) Message-ID: <55127EE6.2010506@herveybayaustralia.com.au> Date: Wed, 25 Mar 2015 19:24:54 +1000 From: Da Rock User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Benjamin Kaduk Subject: Re: Delete a directory, crash the system References: <551007DD.5020109@herveybayaustralia.com.au> <5510B995.8060307@herveybayaustralia.com.au> <5511D807.3040606@herveybayaustralia.com.au> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Mar 2015 09:25:07 -0000 On 03/25/15 14:25, Benjamin Kaduk wrote: > On Tue, 24 Mar 2015, Da Rock wrote: > >> On 03/25/15 00:16, Benjamin Kaduk wrote: >>> On Mon, 23 Mar 2015, Da Rock wrote: >>> >>>> Unfortunately, fsck isn't helping - foreground or otherwise. All it shows >>>> on >>>> every single fs is inode 4 recovery which doesn't sound quite right. And >>> Have you posted the exact output in a previous message (could you send a >>> link)? >> Not precisely, but the message is just a flash and there is no copying of it. >> Anyway, inode 4 is the .sujournal file as expected; this means there is an >> issue with the softupdates. Could this be narrowing it down (the OP to this >> was also in this age of enlightenment, SU came in with 8.x didn't it?)? > Ah, SU+J could be quite relevant. Soft-update journalling was enabled by > default for a period of time, but I believe it was disabled because there > were some scenarios where it was destabilizing. CC-ing Kirk to improve on > my lousy memory. Hmmm... not sure about that. This was set by a fresh install at the time and I haven't fiddled with that - I have set trim though (I think). To verify, I just checked my fresh 10.1 and it has the same settings, so I don't think they're disabled yet... > > Do you remember what version was used to install the system in question > (i.e., create the filesystem in question)? Version of what exactly? Do you mean the OS or the utilities for filesystem ops? The filesystem was originally setup at install (I start with a clean system when I install freebsd - exceptions happen of course, but thats the rule. Makes it easier... they are just workstations after all) so I wouldn't remember or discover exactly what utils were used. Install was using bsdinstall as per FBSD10 disk. > Please show the output of > 'tunefs -p ' root: tunefs: POSIX.1e ACLs: (-a) disabled tunefs: NFSv4 ACLs: (-N) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: soft update journaling: (-j) enabled tunefs: gjournal: (-J) disabled tunefs: trim: (-t) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 4096 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: space to hold for metadata blocks: (-k) 5240 tunefs: optimization preference: (-o) time tunefs: volume label: (-L) All the others are about the same - variations mainly in space variables due to size. > >>>> again, it is only showing during updates to ports being built. I'm >>> Er, what is only showing up? The panics? >>> Surely you are not only running fsck while building ports... >> Yes, the panics. >> >> Sorry, I thought that was obvious seeing as the alternative is impossible :) >>>> investigating further, but it may be just a corrupt file in pkg system. >>>> >>>> Incidentally, I'm not suggesting an absolute fix for the issue as such, >>>> but a >>>> better means of handling it rather than crashing the system. The posts on >>>> this >>> Understood. But, there will always be some types of error which are truly >>> unrecoverable, and there is no real option other than to panic. (Which is >>> not to say that your situation is necessarily one of them.) >> That I get, and given this may be an issue with SU it may well be warranted. >> What can we do to narrow this down, as obviously one cannot be sitting >> watching exactly what happens for the hours required while building ports. >> Your bound to look away for just a second and miss it even if you did try! :D >>>> If I discover anything more I'll keep everyone posted :) >> So I did some fiddling with fsck, fsdb, find and stat; and got nowhere. I ran >> fsck again and it gave me not much again. It did hint at some files in the >> ports tree, so I cleaned up the ports tree to fresh install point, ran fsck >> again and rebooted. So far so good, but I'm keeping my fingers crossed still. > It is probably important to note that 'fsck -F' and saying 'no' to "USE > JOURNAL?" is the most relevant fsck invocation. Ok. I only use fsck in single user mode, as its only really of use to me there and something is usually broken if I'm using it :) so -F is usually implied there. No to use journal - good to know, I'll use that next time then when it happens. > >> This doesn't help the panics - they're still a pita when they happen. It does >> help me resolve the issue this time though. But initiating this error in >> testing is damn near impossible. What can we document here as a way to gather >> data to determine how to resolve this issue? Given my luck with this, its >> bound to happen again at some point :) > I think actual diagnostic is beyond my expertise/time committment at the > moment. I suspect that using tunefs to disable softupdate journalling > will be a workaround, if that is what you are really interested. Don't know. Might be SU+J or maybe a pkgng fault in managing ports. Might just wing it - might be helpful to the project after all :) (could erk some of my users though :P) > > I'll let Kirk decide if he wants to debug more, but the answer may well be > "no" if you're not running the latest ufs from -current. > > -Ben