From owner-freebsd-fs@FreeBSD.ORG  Wed Apr 10 20:03:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D2A4078E
 for <freebsd-fs@freebsd.org>; Wed, 10 Apr 2013 20:03:01 +0000 (UTC)
 (envelope-from toasty@dragondata.com)
Received: from mail-ia0-x22a.google.com (mail-ia0-x22a.google.com
 [IPv6:2607:f8b0:4001:c02::22a])
 by mx1.freebsd.org (Postfix) with ESMTP id A41B3774
 for <freebsd-fs@freebsd.org>; Wed, 10 Apr 2013 20:03:01 +0000 (UTC)
Received: by mail-ia0-f170.google.com with SMTP id j38so776236iad.29
 for <freebsd-fs@freebsd.org>; Wed, 10 Apr 2013 13:03:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=dragondata.com; s=google;
 h=x-received:from:content-type:content-transfer-encoding:subject
 :message-id:date:to:mime-version:x-mailer;
 bh=y2WhLqDBEZBbTbg+3DvexkctOVVJXTmdvBiFqKRZNTA=;
 b=MCU8pkQbcpjhcvQ4Zzr4tbWv8hGP+2tmHDH9gFl4livu2EeERJGcATCM2tokhkt+8v
 kcw6/MNd+4y788bDffjjxocGWggn69wtyeFsB5LDZM0fFLP90iRGVmRAIh3pKg0dy8iu
 cA7Ir6iNw8OeAgTzqPtfVuteuNeT+/3uGkP6g=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:from:content-type:content-transfer-encoding:subject
 :message-id:date:to:mime-version:x-mailer:x-gm-message-state;
 bh=y2WhLqDBEZBbTbg+3DvexkctOVVJXTmdvBiFqKRZNTA=;
 b=elkFXlRiRCBhDBPl1GvqX3BsO4N2tkiNKWV7X6wbMHOp9Rlvv+HWdxpSlUHAREwSWZ
 L77JN/ZaWQOc52ryuy2sjeMqQYjixPxjemp4y40Lm8I9K0XvDrLK1zt1m6SNo3JeFki3
 2AwfRsDzhdgF5OB9VzspTaCwFwHlgY3ZnTF6xPI/lRZt7toncionTGsNeR23otbazPWD
 Nu5iGwvOky9DCX/1Mwiq42+VPnhCmJFZzjJusfEIZjSFYRGWHGDyf+k3T4HHEZf43kBA
 b6kXwY30DdodbPpG847oAKm9pkwAyNRzGsuZ6elNzFO+2zf+thJThXJuYpOxsFZBG7j1
 NnkQ==
X-Received: by 10.50.62.66 with SMTP id w2mr2393191igr.81.1365624180882;
 Wed, 10 Apr 2013 13:03:00 -0700 (PDT)
Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132])
 by mx.google.com with ESMTPS id vb15sm1490041igb.9.2013.04.10.13.02.59
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 10 Apr 2013 13:02:59 -0700 (PDT)
From: Kevin Day <toasty@dragondata.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: Does sync(8) really flush everything? Lost writes with journaled SU
 after sync+power cycle
Message-Id: <87CC14D8-7DC6-481A-8F85-46629F6D2249@dragondata.com>
Date: Wed, 10 Apr 2013 15:02:56 -0500
To: "freebsd-fs@FreeBSD.org Filesystems" <freebsd-fs@freebsd.org>
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
X-Mailer: Apple Mail (2.1503)
X-Gm-Message-State: ALoCoQlF3XKWKT29XUz7Phkn6qEmwN88x2hrhMBu31rqlhUIe8AHQ+tRJkLCAkzeJNue3A3aDqgI
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Apr 2013 20:03:01 -0000


Working with an environment where a system (with journaled soft-updates) =
is going to be notified that it's going to be losing power shortly, and =
needs to shut down daemons and flush everything to disk. It doesn't =
actually shutdown though, because the "power down now" command may get =
cancelled and we need to bring things back up. My understanding was that =
we could call sync(8), then just wait for the power to drop.

The problem is that we were frequently losing the last 30-60 seconds =
worth of filesystem changes prior to the shutdown. i.e. newly created =
directories would disappear or fsck would reclaim them and throw them =
into lost+found.

I confirmed that there is no caching disk controller, and write caching =
is disabled on the drives themselves, and the problem continued.

On a whim, after running sync(8) once and waiting 10 seconds, I did =
"mount -u -o ro -f /" to force the filesystem into read-only mode. It =
took about 8 seconds to finish, gstat showed a lot of write activity, =
and SIGINFO on the mount command showed:

load: 0.01  cmd: mount 15775 [biowr] 3.62r 0.00u 0.55s 5% 1644k
load: 0.03  cmd: mount 15775 [runnable] 4.41r 0.00u 0.65s 6% 1644k
load: 0.03  cmd: mount 15775 [biowr] 5.00r 0.00u 0.72s 6% 1644k
load: 0.03  cmd: mount 15775 [biowr] 5.70r 0.00u 0.80s 6% 1644k
load: 0.03  cmd: mount 15775 [biowr] 6.03r 0.00u 0.84s 6% 1644k
load: 0.03  cmd: mount 15775 [running] 6.27r 0.00u 0.87s 6% 1644k
load: 0.03  cmd: mount 15775 [biowr] 6.51r 0.00u 0.90s 7% 1644k
load: 0.03  cmd: mount 15775 [biowr] 6.69r 0.00u 0.92s 6% 1644k
load: 0.03  cmd: mount 15775 [biowr] 6.90r 0.00u 0.94s 6% 1644k
load: 0.03  cmd: mount 15775 [biowr] 7.04r 0.00u 0.96s 7% 1644k
load: 0.03  cmd: mount 15775 [biowr] 7.20r 0.00u 0.98s 7% 1644k

If sync's man page is true (force completion of pending disk writes =
(flush cache)), and there is zero filesystem activity occurring, =
shouldn't that be enough to ensure no corruption after a power cycle? If =
sync really is flushing everything, what's all the write activity =
happening in when degrading from rw to ro?

Is there a better way to get things into a stable state on disk, yet not =
fully shutdown so that we can recover from this if the shutdown order is =
cancelled?


For me, this is easily reproducible with:

mkdir /root/test
sync
sleep 10
(hit reset button)

The problem doesn't happen with:

mkdir /root/test
mount -u -o ro -f /
(hit reset button)


It's great that we're not ending up in an inconsistent state, but i was =
expecting sync to prevent this.

-- Kevin