From owner-freebsd-fs@FreeBSD.ORG Wed Apr 10 20:03:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D2A4078E for ; Wed, 10 Apr 2013 20:03:01 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-ia0-x22a.google.com (mail-ia0-x22a.google.com [IPv6:2607:f8b0:4001:c02::22a]) by mx1.freebsd.org (Postfix) with ESMTP id A41B3774 for ; Wed, 10 Apr 2013 20:03:01 +0000 (UTC) Received: by mail-ia0-f170.google.com with SMTP id j38so776236iad.29 for ; Wed, 10 Apr 2013 13:03:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=x-received:from:content-type:content-transfer-encoding:subject :message-id:date:to:mime-version:x-mailer; bh=y2WhLqDBEZBbTbg+3DvexkctOVVJXTmdvBiFqKRZNTA=; b=MCU8pkQbcpjhcvQ4Zzr4tbWv8hGP+2tmHDH9gFl4livu2EeERJGcATCM2tokhkt+8v kcw6/MNd+4y788bDffjjxocGWggn69wtyeFsB5LDZM0fFLP90iRGVmRAIh3pKg0dy8iu cA7Ir6iNw8OeAgTzqPtfVuteuNeT+/3uGkP6g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:content-type:content-transfer-encoding:subject :message-id:date:to:mime-version:x-mailer:x-gm-message-state; bh=y2WhLqDBEZBbTbg+3DvexkctOVVJXTmdvBiFqKRZNTA=; b=elkFXlRiRCBhDBPl1GvqX3BsO4N2tkiNKWV7X6wbMHOp9Rlvv+HWdxpSlUHAREwSWZ L77JN/ZaWQOc52ryuy2sjeMqQYjixPxjemp4y40Lm8I9K0XvDrLK1zt1m6SNo3JeFki3 2AwfRsDzhdgF5OB9VzspTaCwFwHlgY3ZnTF6xPI/lRZt7toncionTGsNeR23otbazPWD Nu5iGwvOky9DCX/1Mwiq42+VPnhCmJFZzjJusfEIZjSFYRGWHGDyf+k3T4HHEZf43kBA b6kXwY30DdodbPpG847oAKm9pkwAyNRzGsuZ6elNzFO+2zf+thJThXJuYpOxsFZBG7j1 NnkQ== X-Received: by 10.50.62.66 with SMTP id w2mr2393191igr.81.1365624180882; Wed, 10 Apr 2013 13:03:00 -0700 (PDT) Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132]) by mx.google.com with ESMTPS id vb15sm1490041igb.9.2013.04.10.13.02.59 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Apr 2013 13:02:59 -0700 (PDT) From: Kevin Day Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Does sync(8) really flush everything? Lost writes with journaled SU after sync+power cycle Message-Id: <87CC14D8-7DC6-481A-8F85-46629F6D2249@dragondata.com> Date: Wed, 10 Apr 2013 15:02:56 -0500 To: "freebsd-fs@FreeBSD.org Filesystems" Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) X-Mailer: Apple Mail (2.1503) X-Gm-Message-State: ALoCoQlF3XKWKT29XUz7Phkn6qEmwN88x2hrhMBu31rqlhUIe8AHQ+tRJkLCAkzeJNue3A3aDqgI X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Apr 2013 20:03:01 -0000 Working with an environment where a system (with journaled soft-updates) = is going to be notified that it's going to be losing power shortly, and = needs to shut down daemons and flush everything to disk. It doesn't = actually shutdown though, because the "power down now" command may get = cancelled and we need to bring things back up. My understanding was that = we could call sync(8), then just wait for the power to drop. The problem is that we were frequently losing the last 30-60 seconds = worth of filesystem changes prior to the shutdown. i.e. newly created = directories would disappear or fsck would reclaim them and throw them = into lost+found. I confirmed that there is no caching disk controller, and write caching = is disabled on the drives themselves, and the problem continued. On a whim, after running sync(8) once and waiting 10 seconds, I did = "mount -u -o ro -f /" to force the filesystem into read-only mode. It = took about 8 seconds to finish, gstat showed a lot of write activity, = and SIGINFO on the mount command showed: load: 0.01 cmd: mount 15775 [biowr] 3.62r 0.00u 0.55s 5% 1644k load: 0.03 cmd: mount 15775 [runnable] 4.41r 0.00u 0.65s 6% 1644k load: 0.03 cmd: mount 15775 [biowr] 5.00r 0.00u 0.72s 6% 1644k load: 0.03 cmd: mount 15775 [biowr] 5.70r 0.00u 0.80s 6% 1644k load: 0.03 cmd: mount 15775 [biowr] 6.03r 0.00u 0.84s 6% 1644k load: 0.03 cmd: mount 15775 [running] 6.27r 0.00u 0.87s 6% 1644k load: 0.03 cmd: mount 15775 [biowr] 6.51r 0.00u 0.90s 7% 1644k load: 0.03 cmd: mount 15775 [biowr] 6.69r 0.00u 0.92s 6% 1644k load: 0.03 cmd: mount 15775 [biowr] 6.90r 0.00u 0.94s 6% 1644k load: 0.03 cmd: mount 15775 [biowr] 7.04r 0.00u 0.96s 7% 1644k load: 0.03 cmd: mount 15775 [biowr] 7.20r 0.00u 0.98s 7% 1644k If sync's man page is true (force completion of pending disk writes = (flush cache)), and there is zero filesystem activity occurring, = shouldn't that be enough to ensure no corruption after a power cycle? If = sync really is flushing everything, what's all the write activity = happening in when degrading from rw to ro? Is there a better way to get things into a stable state on disk, yet not = fully shutdown so that we can recover from this if the shutdown order is = cancelled? For me, this is easily reproducible with: mkdir /root/test sync sleep 10 (hit reset button) The problem doesn't happen with: mkdir /root/test mount -u -o ro -f / (hit reset button) It's great that we're not ending up in an inconsistent state, but i was = expecting sync to prevent this. -- Kevin