From owner-freebsd-current Thu Feb 8 20:34:28 2001 Delivered-To: freebsd-current@freebsd.org Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by hub.freebsd.org (Postfix) with ESMTP id 9A80937B65D; Thu, 8 Feb 2001 20:34:05 -0800 (PST) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id XAA18587; Thu, 8 Feb 2001 23:33:32 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.1/8.9.1) id f194X1u42681; Thu, 8 Feb 2001 23:33:01 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14979.29437.518299.842853@grasshopper.cs.duke.edu> Date: Thu, 8 Feb 2001 23:33:01 -0500 (EST) To: Dag-Erling Smorgrav Cc: Julian Elischer , Josef Karthauser , Robert Watson , Brian Somers , Bruce Evans , freebsd-current@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: Re: What's changed recently with vmware/linuxemu/file I/O In-Reply-To: References: <20010208113519.A789@tao.org.uk> <3A828C2C.F7CDA809@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Dag-Erling Smorgrav writes: > Julian Elischer writes: > > I believe that vmware mmaps a region of memory and then somehow syncs > > it to disk. (It is certainly doing something like it here). > > Theory: VMWare mmaps a region of memory corresponding to the virtual > machine's "physical" RAM, then touches every page during startup. > Unless some form of clustering is done, this causes 16384 write > operations for a 64 MB virtual machine... > Pretty much. But the issue is that this should never hit the disk unless we're under memory pressure because it is mapped MAP_NOSYNC (actually the file is unlinked prior to the mmap() and a heuristic in vm_mmap() detects this and sets MAP_NOSYNC). The real problem is that our MAP_NOSYNC doesn't fully work in at least one major case. As I understand it, the technique we use is to set the MAP_ENTRY_NOSYNC in the map entry at mmap time. On a write fault, PG_NOSYNC is set in the page's flags. A lazy msync will skip PG_NOSYNC pages. The problem comes when a page is read from prior to being written to. The page gets mapped in read/write and we don't take a write fault, so the PG_NOSYNC flag never gets set. (This accounts for the flurry of disk i/o shortly after vmware starts). When the pages get sunk to disk, the vnode is locked and the application will freeze in a "vmpfw" The following patch sets PG_NOSYNC on faults other than write faults. This seems to work for my test program, and for vmware (I've only very briefly tested it). Assuming that it is correct, the code around it should be reorganized somewhat. This is against -stable, as I don't have any -current i386s.. Index: vm_fault.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_fault.c,v retrieving revision 1.108.2.2 diff -u -r1.108.2.2 vm_fault.c --- vm_fault.c 2000/08/04 22:31:11 1.108.2.2 +++ vm_fault.c 2001/02/08 23:04:02 @@ -804,6 +804,10 @@ } vm_page_dirty(fs.m); vm_pager_page_unswapped(fs.m); + } else { + if ((fs.entry->eflags & MAP_ENTRY_NOSYNC) && + (fs.m->dirty == 0)) + vm_page_flag_set(fs.m, PG_NOSYNC); } } Cheers, Drew ------------------------------------------------------------------------------ Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: gallatin@cs.duke.edu Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message