From owner-freebsd-hackers Mon Oct 19 22:26:45 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA07903 for freebsd-hackers-outgoing; Mon, 19 Oct 1998 22:26:45 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from knecht.Sendmail.ORG (knecht.sendmail.org [209.31.233.160]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA07897 for ; Mon, 19 Oct 1998 22:26:43 -0700 (PDT) (envelope-from mckusick@flamingo.McKusick.COM) Received: from flamingo.McKusick.COM (root@flamingo.mckusick.com [209.31.233.178]) by knecht.Sendmail.ORG (8.9.1/8.9.1) with ESMTP id WAA17660; Mon, 19 Oct 1998 22:26:18 -0700 (PDT) Received: from flamingo.McKusick.COM (mckusick@localhost [127.0.0.1]) by flamingo.McKusick.COM (8.8.5/8.8.5) with ESMTP id UAA12850; Mon, 19 Oct 1998 20:42:35 -0700 (PDT) Message-Id: <199810200342.UAA12850@flamingo.McKusick.COM> To: Warner Losh Subject: Re: softupdates and sync cc: Peter Jeremy , dg@root.com, hackers@FreeBSD.ORG In-reply-to: Your message of "Sun, 18 Oct 1998 23:48:32 PDT." <199810190648.XAA11043@implode.root.com> Date: Mon, 19 Oct 1998 20:42:30 -0700 From: Kirk McKusick Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <98Oct19.070445est.40346@border.alcanet.com.au> Peter Jeremy writes: : > Flush the dirty buffers to disk? : sync(2) requests that all dirty buffers get flushed, it just doesn't : wait for the flush to complete. No, it doesn't schedule the writes even. I get no disk traffic after the sync happens. The disk just sits there, but when I do an umount, lots and lot of traffic happens. I've waited as long as 5 minutes for the sync to complete, but no disk traffic happens in this time, but when I umount the disk, I get 30+seconds of solid disk activity. Eg: rm -rf /fred/some-big-dir sync umount /fred Shouldn't sync schedule those 30 seconds of write to happen after I hit return, but before I get my prompt back? I don't think that it is... Warner The sync system call goes through all the mounted filesystems calling VFS_SYNC. In the case of UFS, this gets us to ffs_sync which walks the vnode list doing VOP_FSYNC with MNT_NOWAIT set. VOP_FSYNC will walk the dirty list associated with the vnode doing bawrite (or bdwrite/vfs_bio_awrite if B_CLUSTEROK is set). I suspect that the problem has to do with the interaction with the new VM's system desire to dissolve buffers, leaving the dirty page identified only in the page cache. Thus it is not found by the above sequence of events. It is not until the unmount occurs that the VM system flushes out the dirty pages associated with the mount point. If true, the fix is to augment VOP_FSYNC to also call the VM system to flush out any dirty pages that it is holding for the vnode. It should be doing this anyway since VOP_FSYNC is supposed to ensure that all the dirty pages are written to disk. My other hypothesis on what is happening is that the bdwrite/vfs_bio_awrite is somehow deciding not to write the dirty pages. I have not traced down through the vfs_bio_awrite code to discern its decision making algorithm on when to write and when not to write. It may be that the fix is as simple as deleting the call to the immediately preceeding bdwrite (as is done in the MNT_WAIT case). Kirk McKusick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message