Date: Wed, 25 Mar 2015 19:24:54 +1000 From: Da Rock <freebsd-fs@herveybayaustralia.com.au> To: Benjamin Kaduk <kaduk@MIT.EDU> Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org Subject: Re: Delete a directory, crash the system Message-ID: <55127EE6.2010506@herveybayaustralia.com.au> In-Reply-To: <alpine.GSO.1.10.1503250018030.22210@multics.mit.edu> References: <CAHAXwYDPMrdY-TP-5T1_6M_ot4gY09jo2_Wi_REOmE=%2Bu%2B_QuQ@mail.gmail.com> <CAGwOe2byRc4LVsyxvTJgxNGCbhvOEaeDXjmFJ7DoXThPQe1bcQ@mail.gmail.com> <CAHAXwYCj9AV8ZcDffNNGx-ivL=h_TK9zLQRTPknArX25HSfEag@mail.gmail.com> <CAGwOe2YCDRqHudovDB_Kz9WHppvB8v2L%2B0gkDnWgG88bgZTKSA@mail.gmail.com> <CAHAXwYCnRDQqgRcvaEE1BmSJYYOidoQzzUoHX_QWdyJzYO3kKw@mail.gmail.com> <551007DD.5020109@herveybayaustralia.com.au> <alpine.GSO.1.10.1503231049050.22210@multics.mit.edu> <5510B995.8060307@herveybayaustralia.com.au> <alpine.GSO.1.10.1503241014270.22210@multics.mit.edu> <5511D807.3040606@herveybayaustralia.com.au> <alpine.GSO.1.10.1503250018030.22210@multics.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 03/25/15 14:25, Benjamin Kaduk wrote: > On Tue, 24 Mar 2015, Da Rock wrote: > >> On 03/25/15 00:16, Benjamin Kaduk wrote: >>> On Mon, 23 Mar 2015, Da Rock wrote: >>> >>>> Unfortunately, fsck isn't helping - foreground or otherwise. All it shows >>>> on >>>> every single fs is inode 4 recovery which doesn't sound quite right. And >>> Have you posted the exact output in a previous message (could you send a >>> link)? >> Not precisely, but the message is just a flash and there is no copying of it. >> Anyway, inode 4 is the .sujournal file as expected; this means there is an >> issue with the softupdates. Could this be narrowing it down (the OP to this >> was also in this age of enlightenment, SU came in with 8.x didn't it?)? > Ah, SU+J could be quite relevant. Soft-update journalling was enabled by > default for a period of time, but I believe it was disabled because there > were some scenarios where it was destabilizing. CC-ing Kirk to improve on > my lousy memory. Hmmm... not sure about that. This was set by a fresh install at the time and I haven't fiddled with that - I have set trim though (I think). To verify, I just checked my fresh 10.1 and it has the same settings, so I don't think they're disabled yet... > > Do you remember what version was used to install the system in question > (i.e., create the filesystem in question)? Version of what exactly? Do you mean the OS or the utilities for filesystem ops? The filesystem was originally setup at install (I start with a clean system when I install freebsd - exceptions happen of course, but thats the rule. Makes it easier... they are just workstations after all) so I wouldn't remember or discover exactly what utils were used. Install was using bsdinstall as per FBSD10 disk. > Please show the output of > 'tunefs -p <filesystem>' root: tunefs: POSIX.1e ACLs: (-a) disabled tunefs: NFSv4 ACLs: (-N) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: soft update journaling: (-j) enabled tunefs: gjournal: (-J) disabled tunefs: trim: (-t) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 4096 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: space to hold for metadata blocks: (-k) 5240 tunefs: optimization preference: (-o) time tunefs: volume label: (-L) All the others are about the same - variations mainly in space variables due to size. > >>>> again, it is only showing during updates to ports being built. I'm >>> Er, what is only showing up? The panics? >>> Surely you are not only running fsck while building ports... >> Yes, the panics. >> >> Sorry, I thought that was obvious seeing as the alternative is impossible :) >>>> investigating further, but it may be just a corrupt file in pkg system. >>>> >>>> Incidentally, I'm not suggesting an absolute fix for the issue as such, >>>> but a >>>> better means of handling it rather than crashing the system. The posts on >>>> this >>> Understood. But, there will always be some types of error which are truly >>> unrecoverable, and there is no real option other than to panic. (Which is >>> not to say that your situation is necessarily one of them.) >> That I get, and given this may be an issue with SU it may well be warranted. >> What can we do to narrow this down, as obviously one cannot be sitting >> watching exactly what happens for the hours required while building ports. >> Your bound to look away for just a second and miss it even if you did try! :D >>>> If I discover anything more I'll keep everyone posted :) >> So I did some fiddling with fsck, fsdb, find and stat; and got nowhere. I ran >> fsck again and it gave me not much again. It did hint at some files in the >> ports tree, so I cleaned up the ports tree to fresh install point, ran fsck >> again and rebooted. So far so good, but I'm keeping my fingers crossed still. > It is probably important to note that 'fsck -F' and saying 'no' to "USE > JOURNAL?" is the most relevant fsck invocation. Ok. I only use fsck in single user mode, as its only really of use to me there and something is usually broken if I'm using it :) so -F is usually implied there. No to use journal - good to know, I'll use that next time then when it happens. > >> This doesn't help the panics - they're still a pita when they happen. It does >> help me resolve the issue this time though. But initiating this error in >> testing is damn near impossible. What can we document here as a way to gather >> data to determine how to resolve this issue? Given my luck with this, its >> bound to happen again at some point :) > I think actual diagnostic is beyond my expertise/time committment at the > moment. I suspect that using tunefs to disable softupdate journalling > will be a workaround, if that is what you are really interested. Don't know. Might be SU+J or maybe a pkgng fault in managing ports. Might just wing it - might be helpful to the project after all :) (could erk some of my users though :P) > > I'll let Kirk decide if he wants to debug more, but the answer may well be > "no" if you're not running the latest ufs from -current. > > -Ben
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55127EE6.2010506>