Date: Thu, 11 Apr 2013 16:30:52 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Kevin Day <toasty@dragondata.com> Cc: "freebsd-fs@FreeBSD.org Filesystems" <freebsd-fs@FreeBSD.org> Subject: Re: Does sync(8) really flush everything? Lost writes with journaled SU after sync+power cycle Message-ID: <20130411160253.V1041@besplex.bde.org> In-Reply-To: <87CC14D8-7DC6-481A-8F85-46629F6D2249@dragondata.com> References: <87CC14D8-7DC6-481A-8F85-46629F6D2249@dragondata.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 10 Apr 2013, Kevin Day wrote: > Working with an environment where a system (with journaled soft-updates) is going to be notified that it's going to be losing power shortly, and needs to shut down daemons and flush everything to disk. It doesn't actually shutdown though, because the "power down now" command may get cancelled and we need to bring things back up. My understanding was that we could call sync(8), then just wait for the power to drop. > > The problem is that we were frequently losing the last 30-60 seconds worth of filesystem changes prior to the shutdown. i.e. newly created directories would disappear or fsck would reclaim them and throw them into lost+found. > > I confirmed that there is no caching disk controller, and write caching is disabled on the drives themselves, and the problem continued. > > On a whim, after running sync(8) once and waiting 10 seconds, I did "mount -u -o ro -f /" to force the filesystem into read-only mode. It took about 8 seconds to finish, gstat showed a lot of write activity, and SIGINFO on the mount command showed: sync(2) only schedules all writing of all modified buffers to disk. Its man page even says this. It doesn't wait for any of the writes to complete. Its man page says that this is a BUG, but it is intentional and sync() has always done this. There is no way for sync() to guarantee that all modified buffers have been written to disk when it returns, since even if it waited, buffers might be modified while it is returning. Perhaps even ones that would take 8 seconds to complete can be written in the few nanoseconds that it takes to return. sync(8) is just a wrapper around sync(2). One that doesn't even check for errors. Not that it could handle sync() failure. Its man page bogusly first claims that it "forces completion". This is not completely wrong, since it doesn't claim that the completion occurs before sync(8) exits. But then it claims that sync(8) is suitable "to ensure that all disk writes have been completed in a way not suitably done by reboot(8) or halt(8). This wording is poor, unless it is intentionally weaselishly worded so that it doesn't actually claim full completion. It only claims more suitable completion than with reboot or halt. Actually, completion is not guaranteed, and what sync(8) provides is just less unsuitable than what reboot and halt provide. To ensure completion, you have to freeze the file systems of interest before rebooting. I don't know of any ways to do this from userland except mount -u -o ro or unmount. There should be a syscall to cause syncing with waiting. The kernel has a wait option for syncing, but doesn't use it for sync(2). But using this would only reduce the races. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130411160253.V1041>