From owner-freebsd-fs@FreeBSD.ORG  Fri Mar  1 18:00:54 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id C33ED3C1;
 Fri,  1 Mar 2013 18:00:54 +0000 (UTC)
 (envelope-from mckusick@mckusick.com)
Received: from chez.mckusick.com (chez.mckusick.com
 [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452])
 by mx1.freebsd.org (Postfix) with ESMTP id 9DEE5D75;
 Fri,  1 Mar 2013 18:00:54 +0000 (UTC)
Received: from chez.mckusick.com (localhost [127.0.0.1])
 by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r21I0pBD034998;
 Fri, 1 Mar 2013 10:00:51 -0800 (PST)
 (envelope-from mckusick@chez.mckusick.com)
Message-Id: <201303011800.r21I0pBD034998@chez.mckusick.com>
To: lev@freebsd.org
Subject: Re: Panic in ffs_valloc (Was: Unexpected SU+J inconsistency AGAIN --
 please, don't shift topic to ZFS!) 
In-reply-to: <352538988.20130301102237@serebryakov.spb.ru> 
Date: Fri, 01 Mar 2013 10:00:51 -0800
From: Kirk McKusick <mckusick@mckusick.com>
X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY
 autolearn=failed version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com
Cc: freebsd-fs@freebsd.org, Don Lewis <truckman@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Mar 2013 18:00:54 -0000

> Date: Fri, 1 Mar 2013 10:22:37 +0400
> From: Lev Serebryakov <lev@freebsd.org>
> To: Don Lewis <truckman@freebsd.org>
> Subject: Re: Panic in ffs_valloc (Was: Unexpected SU+J inconsistency AGAIN --
> Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
> 
> DL> The fact that the filesystem code called panic() indicates that the
> DL> filesystem was already corrupt by that point.  That's a likely reason
> DL> for fsck complaining about the unexpected SU+J inconsistency.
> 
> DL> Incorrect write ordering that allowed the filesystem to become
> DL> inconsistent because some pending writes were lost because of the panic
> DL> might not be necessary, but this might have allowed an earlier crash
> DL> where a full fsck was skipped to leave the filesystem in this state.
>   As far, as I understand, if this theory is right (file system
>  corruption which left unnoticed by "standard" fsck), it is bug in FFS
>  SU+J too, as it should not be corrupted by reordered writes (if
>  writes is properly reported as completed even if they were
>  reordered).

If the bitmaps are left corrupted (in particular if blocks are marked
free that are actually in use), then that panic can occur. Such a state
should never be possible when running with SU even if you have crashed
multiple times and restarted without running fsck.

To reduce the number of possible points of failure, I suggest that
you try running with just SU (i.e., turn off the SU+J jornalling).
you can do this with `tunefs -j disable /dev/fsdisk'. This will
turn off journalling, but not soft updates. You can verify this
by then running `tunefs -p /dev/fsdisk' to ensure that soft updates
are still enabled.

As you have already stated, the filesystem is fine with reordered
writes provided that they are not completed (iodone) until they are
well and truely on the disk.

> DL> This panic might also be a result of the bug fixed in 246877, but I have
> DL> my doubts about that.
>   It was not MFCed :(
> 
> --
> // Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>

I will MFC 246876 and 246877 once they have been in head long enough
to have confidence that they will not cause trouble. That means at
least a month (well more than the two weeks they have presently been
there).

Note these changes only pass the barrier request down to the GEOM
layer. I don't know whether it actually makes it to the drive layer
and if it does whether the drive layer actually implements it. My
goal was to get the ball rolling.

	Kirk McKusick