From owner-freebsd-questions@FreeBSD.ORG Tue May 1 05:57:15 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 665CE106566B for ; Tue, 1 May 2012 05:57:15 +0000 (UTC) (envelope-from bonomi@mail.r-bonomi.com) Received: from mail.r-bonomi.com (mx-out.r-bonomi.com [204.87.227.120]) by mx1.freebsd.org (Postfix) with ESMTP id 143848FC0C for ; Tue, 1 May 2012 05:57:14 +0000 (UTC) Received: (from bonomi@localhost) by mail.r-bonomi.com (8.14.4/rdb1) id q415wAFu091478 for freebsd-questions@freebsd.org; Tue, 1 May 2012 00:58:10 -0500 (CDT) Date: Tue, 1 May 2012 00:58:10 -0500 (CDT) From: Robert Bonomi Message-Id: <201205010558.q415wAFu091478@mail.r-bonomi.com> To: freebsd-questions@freebsd.org In-Reply-To: Subject: Re: UFS Crash and directories now missing X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 May 2012 05:57:15 -0000 Eitan Adler wrote: > On 30 April 2012 07:36, Robert Bonomi wrote: > > A competennt, "not stupid", sysadmin would know these things. And not > > 'remove all doubt' (in the words of Abraham Lincoln), by raising such > > nonsense questions. > > A competent sysadmin would ask questions when they don't know the > answer bringing up possibilities they thought about. > A stupid sysadmin would yell at someone asking a question claiming > they should have known the answer. An informed critic would have recognized that the 'lack of knowledge' issue, and the 'nonsense questions' were two -entirely- different matters. One who lacks knowledge of system fundamentals and asks questions _about_ _the_fundammentals_ that they do not understand is not subject to criticizm -- they are educatable. Those who make grossly false-to-fact assumptions about the behavior of those fundamentals, and extrapolate wildly from those erroneous assumptions cannot be engaged in rational conversation -without- hauling them back to the initial erroneous assumptions, and correcting those errors. And, when that is done, it invaliates everything extrapolated from the false premise. Those who continue to extrapolate wildly in such manner cannot be helped. It was also established that the OP's descriptions were woefully incomplete and unreliable. A second disk was involved. 'dangerously dedicated' or otherwise? partitioning? slices? label type? There is indirect indication 'everything of interest' was on a single slice, but that is only an inference. There's no indication of where _in_the_filesystem_ on the slice that the jails '/' directories were located, or by what names they were known to the system outside the jail. The 'pattern' of the names, and placement in the hierarchy _is_ likely of some significance. As is (a) ownership, (b) permissions, and (c) 'flags', of (1) the original 'containing' directory, (b) the external view of the jail '/' directories in that directory, and (c) 'where they ended up'. It is likely that that 'external view' (pre- problem) of the jail '/'s does not exist -- unless one had historical data from before the problem. "Everything" was running in jails. Except for things that weren't. For any constructive analysis of "what happened", one needed to capture *all* the bits in the directory (itself) where the jails ended up -- a directory 'listing', e.g. 'ls' (regardless of options), is not sufficient -- and the same for the directory where they 'should have been', plus a copy of the slice's complete inode table -- i.e., from _all_ the cylinder groups. Then one would examine the 'last modified' timestamp on the directory where the jails were found, and -then- the timestamps on the jail directories themselves. Among other things, this data allows one to establish whether or not the jail directories were ever _really_ where one thought they were, or whether they just 'appeared' to be there, e.g. due to nullfs, or a 'link'. And an 'initial estimate' of -when- it may have happened. (if 'malice' is involved, or certain kinds of backup/restore activities, the timestamps _may_ not be accurate, but they are a 'best available' guess.) Capturing -all- the data from the 'where they were' directory, allows one to examine the 'deleted' entries -- where one _should_ find entries for the jails, and 'last accessed' timestamps which put a lower bound on when the 'move' occured. When the 'apparently impossible' happens, it is *VERY*OFTEN* the case that 'reality' is *NOT* what someone 'knows' it is. No matter how 'obvious' it is, one has to =verify=. It is also _FAR_ 'easier to believe' that (especially) a nullfs mount (or, less likely, a hard link) disappeared, than directories actually got moved. The move may well have happened, but one must 'positively' eliminate the 'more plausible' alternatives first. Things that would 'give the appearance' of what was reported, but from -very- different causations. Of course, to capture this kind of information, one have to know "what's where" in the filesystem metadata, and have means to capture it _without_ changing any of that data. And _that_ means that you have to have a fair understanding of the mechanics of how the filesystem works. Which rapidly leads into gory details of how the O/S does disk I/O, and the various performance optimizations (and trade-offs) employed. Reading _both_ of McKusick's "Design of .." books, and the 'Unix System Admininstration Handbook', by Nemeth, et al. is a good _start_. Having a bunch of the books from O'Reilley & Assoc. (), especially for 'standard' tools that you need to get the most out of, is also highly recommended. Disclaimer: I know a lot of the authors of those books, persoally.