From owner-freebsd-questions@FreeBSD.ORG Sat Apr 28 16:36:25 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C0B71106566B for ; Sat, 28 Apr 2012 16:36:25 +0000 (UTC) (envelope-from aimass@yabarana.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 83A058FC08 for ; Sat, 28 Apr 2012 16:36:25 +0000 (UTC) Received: by iahk25 with SMTP id k25so3200407iah.13 for ; Sat, 28 Apr 2012 09:36:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=hEWku3VUoWf8Ql3gTuQ39NGxMDbURHFefKMZlKHiW/g=; b=nTiM1kpjkm2iPOgqS7PfA61u3hFCiWCx5EDy3vn8iQECgfEBvriWFts36Nj2jepRle Ta5ac5locYVc0/byzlpS1KklHsQi10UI9c4dwdE+s8ZFwHwkejsGu7CxFDW7iCBWDotg QEHH8yQfohnwok72B/ckoDKtpEtsVvvUvLIHG71lhvvMQSHazjsZ1/b3l7m5nz/kbp9v UvFdRVW06KcRr3A3Z6y6n1yO39TACdikCFl06J8Ln2bX/7Z53ZWEXM0zTchufLE7Q69P h+Ib1mVbsFi9CMjPo0zCNMrYVzCbYCAxYSK6wkI13cGmqIlWOUq0/Aeb+n4/BhAdHbBT GUsg== MIME-Version: 1.0 Received: by 10.50.197.233 with SMTP id ix9mr6110108igc.26.1335630984834; Sat, 28 Apr 2012 09:36:24 -0700 (PDT) Received: by 10.231.74.138 with HTTP; Sat, 28 Apr 2012 09:36:24 -0700 (PDT) In-Reply-To: <201204281539.q3SFdtir061045@mail.r-bonomi.com> References: <201204281539.q3SFdtir061045@mail.r-bonomi.com> Date: Sat, 28 Apr 2012 12:36:24 -0400 Message-ID: From: Alejandro Imass To: Robert Bonomi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlj6klNp0RdjRgwyiz7K8n44j7nTYhKsobuqrhP5YA4jeYnz7GhgfQ08E9R0/2ORCBKmV3m Cc: wojtek@wojtek.tensor.gdynia.pl, freebsd-questions@freebsd.org Subject: Re: UFS Crash and directories now missing X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Apr 2012 16:36:25 -0000 On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi wrote: > > =A0Alejandro Imass wrote: >> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar >> wrote: >> >> I somewhat agree, but it wasn't a person. I am the only administrator= , >> >> the only one with root access. The jails were effectively moved to th= e >> >> /usr/local/etc/apache22 of the single that survived at the top level. >> >> I'm thinking something between mount, EzJail, the journal and the way >> >> MySQL created a great deal of head contention, so something must have >> >> gotten corrupted at the directory level like you state, but the >> >> strange part is no _data_ corruption as such, because I was able to >> >> physically archive the jails, move them to the correct directory and >> > >> > >> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if y= ou are >> > sure you didn't move it yourself then it must be machine hardware prob= lem >> > but still unlikely. >> >> After a little more research, ___it it NOT unlikely at all___ that >> under high distress and a hard boot, UFS could have somehow corrupted >> the directory structure, whilst maintaining the data intact. > > This is techically accurate, *BUT* the specifics of the quote "corruption= " > unquote in the case under discussion make it *EXTREMELY* unlikely that th= is > is what happened. > > 99.99+++% of all UFS filesystem "corruption' issues are the result of a > system crash _between_ the time cached 'meta-data' is updated in memory > and that data is flushed to disk (a deferred write). > > The second most common (and vanishingly rare) failure mode is a powerfail > _as_ a sector of disk is being written -- resulting in 'garbage data' > being written to disk. > > The next possibility is 'cosmic rays'. =A0If running on 'cheap' hardware = (i.e., > without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being > output. > > The fact that the 'corrupted' filesystem passed fsck -without- any report= ed > errors shows that everything in the filesystem meta-data was consistent > > Given *that*, there are precisely *TWO* ways that the 'results' that have > been reported could have happened. > > =A01) "Something" did a mv(2) of the various jail directories 'from' thei= r > =A0 =A0 original location to the 'apache' diretory. =A0This involves simp= ly > =A0 =A0 *copying* the diretory entry from the jail's 'parent directory' t= o > =A0 =A0 the apache directory, and then marking the entry in the original > =A0 =A0 parent as 'unused'. =A0Nothing other than the =A0directory whre t= he jail > =A0 =A0 'used to live', and the directory 'where it was found' are touche= d. > =A0 =A0 This occured _through_ the system 'mv' function, so all the norma= l > =A0 =A0 'housekeeping' was done properly. > > =A02) it was -not- done though mv(2) -- but that requires that a whole > =A0 =A0 *series* of "corruptions" of the filesystem, _ALL_ of which had t= o > =A0 =A0 occur in 'exactly' the right way. =A0They are: [...] > I think it is safe to conclude that the probabilities -greatly- favor > alternative #1. > OK. So after your comments and further research I concur with you on the mv but if it wasn't a human, then this might be exposing a serious security flaw in the jail system or the way EzJail implements it. The whole point of using jails is to protect things like this from happening. Given that the only jail that survived was the front-end Apache Web server/reverse proxy, then it is also safe to suspect the apache (or other) process running on it was able to perform a mv of the rest of the jails to it's own /usr/local/etc/apache22 directory. Is there no possibility is that after the system crash, the journal recocery process and/or fsck could have moved this directories ? Thanks, --=20 Alejandro