From owner-freebsd-fs@FreeBSD.ORG Wed Mar 25 04:25:32 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A72DCE3A; Wed, 25 Mar 2015 04:25:32 +0000 (UTC) Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu [18.7.68.35]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 16968E6E; Wed, 25 Mar 2015 04:25:31 +0000 (UTC) X-AuditID: 12074423-f79536d000000e74-a1-551238b30a25 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP id 71.73.03700.4B832155; Wed, 25 Mar 2015 00:25:24 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id t2P4PNQT028681; Wed, 25 Mar 2015 00:25:23 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id t2P4PKBU012344 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 25 Mar 2015 00:25:22 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id t2P4PJ3O011985; Wed, 25 Mar 2015 00:25:19 -0400 (EDT) Date: Wed, 25 Mar 2015 00:25:19 -0400 (EDT) From: Benjamin Kaduk To: Da Rock Subject: Re: Delete a directory, crash the system In-Reply-To: <5511D807.3040606@herveybayaustralia.com.au> Message-ID: References: <551007DD.5020109@herveybayaustralia.com.au> <5510B995.8060307@herveybayaustralia.com.au> <5511D807.3040606@herveybayaustralia.com.au> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrEIsWRmVeSWpSXmKPExsUixCmqrLvFQijUYGITi8Wxxz/ZLL7/fMFs seGsvAOzx4xP81k8Xm9cyB7AFMVlk5Kak1mWWqRvl8CVMfXFSsaCBumK5d+2MTUwThbtYuTk kBAwkTg5awEbhC0mceHeeiCbi0NIYDGTROP6k0wQzkZGid2717BAOIeYJHatng/lNDBKTLz6 AMjh4GAR0JZY80UDZBSbgIrEzDcbwcaKCBhJzL/ynAnEZhYwkGh91MUKYgsLGEqsO9HIAmJz ClhKPFr7AszmFXCUmD1pJtT82ywSXz/MA0uICuhIrN4/BapIUOLkzCcsEEO1JJZP38YygVFw FpLULCSpBYxMqxhlU3KrdHMTM3OKU5N1i5MT8/JSi3TN9HIzS/RSU0o3MYKClt1FeQfjn4NK hxgFOBiVeHh/SAiFCrEmlhVX5h5ilORgUhLlPaEOFOJLyk+pzEgszogvKs1JLT7EKMHBrCTC awBSzpuSWFmVWpQPk5LmYFES5930gy9ESCA9sSQ1OzW1ILUIJivDwaEkwfvEHKhRsCg1PbUi LTOnBCHNxMEJMpwHaPgDkBre4oLE3OLMdIj8KUZFKXHeZSAJAZBERmkeXC8sqbxiFAd6RZh3 L0gVDzAhwXW/AhrMBDT4XD4fyOCSRISUVANjJNOCMxrL3QNOW2t8OHNJ/uGFC0o3nwYJzp6v VTStzVHRkDWeI9TtT+zkI3/2PL+m3d30TWVfN7tdaCXTh4J3FZNO358R7RDVt3mN87Pkgwrc Nt9mnN+ZzqeluHFOkWaX9d6yglv/QzNrHpxVPCmj7uq8VPr9VnbruVIblhZP6Zz3I2i+Xk+t EktxRqKhFnNRcSIAydq8UAUDAAA= Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Mar 2015 04:25:32 -0000 On Tue, 24 Mar 2015, Da Rock wrote: > On 03/25/15 00:16, Benjamin Kaduk wrote: > > On Mon, 23 Mar 2015, Da Rock wrote: > > > > > Unfortunately, fsck isn't helping - foreground or otherwise. All it shows > > > on > > > every single fs is inode 4 recovery which doesn't sound quite right. And > > Have you posted the exact output in a previous message (could you send a > > link)? > Not precisely, but the message is just a flash and there is no copying of it. > Anyway, inode 4 is the .sujournal file as expected; this means there is an > issue with the softupdates. Could this be narrowing it down (the OP to this > was also in this age of enlightenment, SU came in with 8.x didn't it?)? Ah, SU+J could be quite relevant. Soft-update journalling was enabled by default for a period of time, but I believe it was disabled because there were some scenarios where it was destabilizing. CC-ing Kirk to improve on my lousy memory. Do you remember what version was used to install the system in question (i.e., create the filesystem in question)? Please show the output of 'tunefs -p ' > > > again, it is only showing during updates to ports being built. I'm > > Er, what is only showing up? The panics? > > Surely you are not only running fsck while building ports... > Yes, the panics. > > Sorry, I thought that was obvious seeing as the alternative is impossible :) > > > > > investigating further, but it may be just a corrupt file in pkg system. > > > > > > Incidentally, I'm not suggesting an absolute fix for the issue as such, > > > but a > > > better means of handling it rather than crashing the system. The posts on > > > this > > Understood. But, there will always be some types of error which are truly > > unrecoverable, and there is no real option other than to panic. (Which is > > not to say that your situation is necessarily one of them.) > That I get, and given this may be an issue with SU it may well be warranted. > What can we do to narrow this down, as obviously one cannot be sitting > watching exactly what happens for the hours required while building ports. > Your bound to look away for just a second and miss it even if you did try! :D > > > > > If I discover anything more I'll keep everyone posted :) > So I did some fiddling with fsck, fsdb, find and stat; and got nowhere. I ran > fsck again and it gave me not much again. It did hint at some files in the > ports tree, so I cleaned up the ports tree to fresh install point, ran fsck > again and rebooted. So far so good, but I'm keeping my fingers crossed still. It is probably important to note that 'fsck -F' and saying 'no' to "USE JOURNAL?" is the most relevant fsck invocation. > This doesn't help the panics - they're still a pita when they happen. It does > help me resolve the issue this time though. But initiating this error in > testing is damn near impossible. What can we document here as a way to gather > data to determine how to resolve this issue? Given my luck with this, its > bound to happen again at some point :) I think actual diagnostic is beyond my expertise/time committment at the moment. I suspect that using tunefs to disable softupdate journalling will be a workaround, if that is what you are really interested. I'll let Kirk decide if he wants to debug more, but the answer may well be "no" if you're not running the latest ufs from -current. -Ben