From owner-freebsd-hackers  Mon Oct 19 22:26:45 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id WAA07903
          for freebsd-hackers-outgoing; Mon, 19 Oct 1998 22:26:45 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from knecht.Sendmail.ORG (knecht.sendmail.org [209.31.233.160])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA07897
          for <hackers@freebsd.org>; Mon, 19 Oct 1998 22:26:43 -0700 (PDT)
          (envelope-from mckusick@flamingo.McKusick.COM)
Received: from flamingo.McKusick.COM (root@flamingo.mckusick.com [209.31.233.178])
	by knecht.Sendmail.ORG (8.9.1/8.9.1) with ESMTP id WAA17660;
	Mon, 19 Oct 1998 22:26:18 -0700 (PDT)
Received: from flamingo.McKusick.COM (mckusick@localhost [127.0.0.1])
	by flamingo.McKusick.COM (8.8.5/8.8.5) with ESMTP id UAA12850;
	Mon, 19 Oct 1998 20:42:35 -0700 (PDT)
Message-Id: <199810200342.UAA12850@flamingo.McKusick.COM>
To: Warner Losh <imp@village.org>
Subject: Re: softupdates and sync 
cc: Peter Jeremy <peter.jeremy@auss2.alcatel.com.au>, dg@root.com,
        hackers@FreeBSD.ORG
In-reply-to: Your message of "Sun, 18 Oct 1998 23:48:32 PDT."
             <199810190648.XAA11043@implode.root.com> 
Date: Mon, 19 Oct 1998 20:42:30 -0700
From: Kirk McKusick <mckusick@McKusick.COM>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

	In message <98Oct19.070445est.40346@border.alcanet.com.au>
		Peter Jeremy writes:
	: > Flush the dirty buffers to disk?
	: sync(2) requests that all dirty buffers get flushed, it just doesn't
	: wait for the flush to complete.

	No, it doesn't schedule the writes even.  I get no disk traffic after
	the sync happens.  The disk just sits there, but when I do an umount,
	lots and lot of traffic happens.  I've waited as long as 5 minutes for
	the sync to complete, but no disk traffic happens in this time, but
	when I umount the disk, I get 30+seconds of solid disk activity.

	Eg:
		rm -rf /fred/some-big-dir
		sync
	<wait 5 minutes, not very little or no disk activity>
		umount /fred
	<wait 30 seconds for the disk to stop updating>

	Shouldn't sync schedule those 30 seconds of write to happen after I
	hit return, but before I get my prompt back?  I don't think that it
	is...

	Warner

The sync system call goes through all the mounted filesystems
calling VFS_SYNC. In the case of UFS, this gets us to ffs_sync
which walks the vnode list doing VOP_FSYNC with MNT_NOWAIT set.
VOP_FSYNC will walk the dirty list associated with the vnode doing
bawrite (or bdwrite/vfs_bio_awrite if B_CLUSTEROK is set). I
suspect that the problem has to do with the interaction with the
new VM's system desire to dissolve buffers, leaving the dirty page
identified only in the page cache. Thus it is not found by the
above sequence of events. It is not until the unmount occurs that
the VM system flushes out the dirty pages associated with the mount
point. If true, the fix is to augment VOP_FSYNC to also call the
VM system to flush out any dirty pages that it is holding for the
vnode. It should be doing this anyway since VOP_FSYNC is supposed 
to ensure that all the dirty pages are written to disk. My other
hypothesis on what is happening is that the bdwrite/vfs_bio_awrite
is somehow deciding not to write the dirty pages. I have not traced
down through the vfs_bio_awrite code to discern its decision making
algorithm on when to write and when not to write. It may be that
the fix is as simple as deleting the call to the immediately
preceeding bdwrite (as is done in the MNT_WAIT case).

	Kirk McKusick

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message