Date: Mon, 30 Apr 2012 06:36:08 -0500 (CDT) From: Robert Bonomi <bonomi@mail.r-bonomi.com> To: freebsd-questions@freebsd.org Subject: Re: UFS Crash and directories now missing Message-ID: <201204301136.q3UBa8fj083478@mail.r-bonomi.com>
next in thread | raw e-mail | index | archive | help
Alejandro Imass <ait@p2ee.org> wrote: > On Sun, Apr 29, 2012 at 11:49 PM, Erich Dollansky wrote: > > On Monday 30 April 2012 02:02:41 jb wrote: > >> Alejandro Imass <ait <at> p2ee.org> writes: > >> > ... > > Back to theory on how the http-proxy jail 'swallowed' all the other > jails including the basejail. A "theory" that contains assumptions which are, unfortunately, unsupported by any factual evidence. Just like _every_other_ "theory" you have advanced to date. FACT: It is a virtual certainty that something operating -outside- any jail environment is what did the deed. Available evidence to date is that you 'fixate' on a particular _remote_ possibility -- *without* knowledge of what it would take for that scenario to come to pass -- making a sh*tload of 'assumptions' along the way (many of which are contrary to reality), and offer that as 'the explanation' for events. > Given that EzJail uses a single basejail and links/mounts stuff in the > child jails it would seem plausible (regression?) that somehow any > jail could access other jails' files, Demonstrating, yet again, that you do not understand how jails work. :(( > or that _maybe_ in an event of > crash the nullsfs mounts confuse the system somehow when fsck restores > or the journal is recovered. Demonstrating, yet again, that you do not understand what nullfs is, how it works, or that it is totally -irrelevant- to fsck and/or journaling. Hint: nullfs is merely a 'path translation' mechanism -- it affects _only_ 'file open' syscalls. fsck doesn't _touch_ nullfs. Hint; journaling is an add-on to the UFS filesystem. nullfs doesn't know what journaling is. "Journal recovery" doesn't _touch_ a nullfs. A competennt, "not stupid", sysadmin would know these things. And not 'remove all doubt' (in the words of Abraham Lincoln), by raising such nonsense questions. > Whatever the cause, it actually happened and I have already ruled out > just about anything. It doesn't seem to have been an attack, it surely > wasn't me, and EzJail author agrees it was not the EzJail scripts. So > maybe nullfs and journaling, or crash + nullfs + journaling, could > cause something like this to happen? Postulating the "right" combination of _unrelated_ failures, virtually *anything* can happen. cf. "Nasal Monnkeys". It has already been demonstrated how the (im-)probability of such an event relates to the age of the universe. > Maybe journal has some confusion > on restoring the nullfs view of the directories or something after bad > crash like this one?? Short answer: "No chance." Again, if you had any understandinng of how UFS, and nullfs for that matter, works -- not to mention how disk I/O works inside the kernel, you wouldn't be embarassing yourself by your _continued_ raising of what are, to put it charitiably, such 'patently ridiculous' questions. You can engage in all the 'unfounded speculation' you want to, but you are simply -not- going to determine "what happened". IF there was a systemic fault, you have already destroyed the forensic evidence trail that _might_ have allowed a qualified expert to run it down, *if* you could afford to have such an analysis done. (middle five figures is a starting point for such an analysis.) Absent _multiple_ reports of like events, *WITH* enough detail in the reports to have a reasonable chance of identifying a 'pattern' of events leading to the failure, *OR* the existance of a -reliable-, =repeadable=, method of inducing the failure, this simply isn't going to go anywere. Absent any of those things, it is a 'freak' event, *PROBABLY* (read 'virtually certain') caused by human error (despite your claim of the 'impossibility' of that factor) in some form. If you insist on 'knowing' what happened in any future instance of single putatively 'abnormal' events, you will need to change to a MIL-SPEC 'B2' (or higher) rated O/S, with active mandatory access controls, 'security labels' with multi-level, non-hierarchical, security enabled, audit logging of -every- system call, etc. This also requires a staff position of 'security officer', which is _separate_and_distinct_ from 'system administrtor'. I strongly suspect that you cannot afford the required hardware and software for this type of 'solution'. The 'underlying cause' almost certainly falls into the class known as PEBKAC. (The current admin has demonstrated an inability to accurately report the state of his system -- that at least one thing he previously asserted to be true was _not_, in fact, the case. It is *HIGHLY*LIKELY* that _that_ 'exception' to the claimed state is =not= the only such violation on that system.) That there was an action where there was a difference between 'that which was intended', and 'what it really did'. Such things are almost -impossible- for the perpetrator of the action to identify -- they 'know' what they did, and "read" the act as 'doing what they meant it to do', even though it actually did 'something else'. I cannot count the number of times _I_ have fallen into that particular trap. You insist on speculating about 'failure modes' in the way that THINGS YOU DO NOT UNDERSAND THE FUNCTIONING OF work. You are wasting your time, and that of those whom you inflict those 'nonsensical' speculations on. You are 'convinced' it could not have been human error, and have conclued that it therefore *must* have been machine error. You are looking for someone to 'validate' that conclusion. That simply *ISN'T* going to happen -- not without a -lot- more evidence than any individual can provide from a single =unrepeadable= incident.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201204301136.q3UBa8fj083478>