From owner-freebsd-questions@FreeBSD.ORG Mon Apr 30 18:39:24 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4A3D106566B for ; Mon, 30 Apr 2012 18:39:23 +0000 (UTC) (envelope-from aimass@yabarana.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 68BC38FC12 for ; Mon, 30 Apr 2012 18:39:23 +0000 (UTC) Received: by yhgm50 with SMTP id m50so1838469yhg.13 for ; Mon, 30 Apr 2012 11:39:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=PNwSOF45WVHN8x9DUU/ftqiL9Ec3AORUlcpwz3Xb3rE=; b=acdlVQVtQWjbZBBLtVqT7p8sTPx6QxOABsB7A7xolFE99/kMGBtq87i8q5n2XAhtbY Klb6RBdiZg8joVPh0AgFxczSbCdYJP4PJPonUgYqkD6i7NyiV9MDWxQ6j04M9+D+mMu0 EtuDSUWOXhYvtKjMlOZeqjMAu700Ps0miQ1L1X2lTcmPwCAcFNvmyP8UJuxeApb2eWPv Y07A125OCHscy5l8KtNtY/CYKNtn8REP3QBtEhUDFgV7vavNs4Uay0hgeDiyd18rnZWE 36TTnUzdFBM+3VRVgqYmtU+BavawC6HGogI9e9yRby1sKR0JxkwdWdQcspsXWtCvMhGJ 7Bww== MIME-Version: 1.0 Received: by 10.50.197.233 with SMTP id ix9mr11157745igc.26.1335811162324; Mon, 30 Apr 2012 11:39:22 -0700 (PDT) Sender: aimass@yabarana.com Received: by 10.231.74.138 with HTTP; Mon, 30 Apr 2012 11:39:22 -0700 (PDT) In-Reply-To: References: <201204301136.q3UBa8fj083478@mail.r-bonomi.com> <4F9EC5E9.6060604@gmail.com> Date: Mon, 30 Apr 2012 14:39:22 -0400 X-Google-Sender-Auth: 6Mu5rpqfSO9riHqI6cjAytBOGE8 Message-ID: From: Alejandro Imass To: jb Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQm0oyWnuRQfEVZjuQ584s8wh754rT1cY23YqGjyvT+7cjASRvD4ys4GgWY4hHi7Vqcwyufr Cc: freebsd-questions@freebsd.org Subject: Re: UFS Crash and directories now missing X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Apr 2012 18:39:24 -0000 On Mon, Apr 30, 2012 at 1:57 PM, jb wrote: > Alejandro Imass p2ee.org> writes: > >>... >> If you have really followed the thread, all I have done is try to find >> some explanation for a strange behavior of the system under normal >> use. It hung, and some directories were moved, period. I have posted >> some ideas to share with other people expecting some insight and maybe >> similar experience from other users, which there probably are many, >> but many times afraid to speak up and avoid getting insulted. >> ... > > > I looked at problem reports for nullfs and there are quite few. > Hierarchical Jails > NOTES > > You said you have your jail env on a separate disk. > Yes. > I looked at problem reports for nullfs and there are quite few. > http://www.freebsd.org/cgi/query-pr-summary.cgi?category=3D&severity=3D&p= riority=3D&cl > ass=3D&state=3D&sort=3Dnone&text=3Dnullfs&responsible=3D&multitext=3D&ori= ginator=3D&release=3D > > As a matter of fact I just mounted a nullfs but was not able to unmount i= t > (device busy) - a Google search shows it as a problem reported for many m= any > years. > Nullfs does not seem to be stable. > Dirk Engling guessed that somehow nullfs was involved. > Anyway, I found one PR > http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/147420 > > that is about troubles with jails, nullfs, UFS, and NFS. > Synopsis: =A0 =A0 =A0 [ufs] [panic] ufs_dirbad, nullfs, jail panic (corru= pt inode) > > Take a look at this paragraphs: > "... > After two more failures, I now found the offending inode ..." > "... > As one point, I found the inode in a directory which usually is mounted f= or > an (ez-) jail via nullfs." > > This proves that problems with jails, nullfs, and fs corruption are possi= ble. > So, they can not be excluded up front in your case too because nullfs is = just > a simple "path translation". > Up until yesterday (and Dirk's answer) I didn't look for specific references to nullfs, and today I was busy getting vicious myself ;) Thanks for pointing a plausible cause. What I have done so far is limit the offending jail to a specific cpuset and I wanted to add another disk to avoid contention with other jails. MySQL not only consumes the whole CPUs but also limits the whole drive, while it's doing some crazy full scan query on a very large database. I don't have any control of the code or the MySQL myself and the client said it's known problem with VTiger CRM and the way it implements some reports that I guess were not designed for the amount of data they are handling. I have already recommended they move to a dedicated server altogether because their system simply outgrew what we sold them. I really appreciate the time you dedicated to search for a possible explanation and at the very least it helps in taking some immediate steps to avoid it from happening again. Hopefully, someone with deep knowledge will find the root cause and a long-term fix. What is true, that if it happened to me, it can happen to anyone, so maybe your findings will help someone pin-point the problem and fix it. Thanks, --=20 Alejandro