From owner-freebsd-hackers@FreeBSD.ORG Wed Aug 6 09:31:07 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6ADEB37B401 for ; Wed, 6 Aug 2003 09:31:07 -0700 (PDT) Received: from marblerye.cs.uga.edu (marblerye.cs.uga.edu [128.192.101.172]) by mx1.FreeBSD.org (Postfix) with SMTP id 493DC43FA3 for ; Wed, 6 Aug 2003 09:31:06 -0700 (PDT) (envelope-from ecashin@uga.edu) Received: (qmail 14456 invoked from network); 6 Aug 2003 16:31:05 -0000 Received: from localhost (HELO uga.edu) (127.0.0.1) by 0 with SMTP; 6 Aug 2003 16:31:05 -0000 To: hackers@freebsd.org From: Ed L Cashin Date: Wed, 06 Aug 2003 12:31:04 -0400 Message-ID: <87he4uzr1z.fsf@uga.edu> User-Agent: Gnus/5.090014 (Oort Gnus v0.14) Emacs/21.2 (i386-debian-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: COW and mprotect on non-shared memory X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2003 16:31:07 -0000 Hi. I've noticed that in FreeBSD, the struct vm_map_entry has an eflags member that can have the MAP_ENTRY_COW bit set. In the vm_map_protect function, which is used by mprotect, it looks like this bit is used to determine whether or not to set the page table entries for write access or not: if (current->protection != old_prot) { #define MASK(entry) (((entry)->eflags & MAP_ENTRY_COW) ? ~VM_PROT_WRITE : \ VM_PROT_ALL) pmap_protect(map->pmap, current->start, current->end, current->protection & MASK(current)); #undef MASK } ... so if this vm_map_entry describes a VM region that is set for COW, then the page table entries will not allow writes. If it's not COW, though, the page table entries will be set to allow writes. Is that correct so far? The reason I'm interested in this is that I'm doing some VM work on Linux, where they might have COW pages sprinkled throughout a VM region. The Linux VM region descriptor analogous to FreeBSD's vm_map_entry is the vm_area_struct, and it doesn't have any special bit for COW. In Linux, COW is recognizable only by the situation where for a given page in a region of VM, the vm_area_struct has the VM_WRITE bit set and the page table entry is write protected. For that reason, when you mprotect an area of non-shared, anonymous memory to no access and then back to writable, Linux has no way of knowing that the memory wasn't set for COW before you make it unwritable. It goes ahead and makes all the pages in the area COW. That means that if I do this: for (i = 0; i < n; ++i) { assert(!mprotect(p, pgsiz, PROT_NONE)); assert(!mprotect(p, pgsiz, PROT_READ|PROT_WRITE|PROT_EXEC)); p[i] = i & 0xff; } ... I get n minor page faults! Pretty amazing, but I guess they figured nobody does that. More surprising is that the same test program has the same behavior on FreeBSD. (At least, the "/usr/bin/time -l ..." output shows the number of page reclaims increasing at the same rate that I increase the value of n in the loop.) I thought that in FreeBSD any COW area would have its own vm_map_entry with the MAP_ENTRY_COW bit set. That way, you could run this test without any minor faults at all. Now I suspect I was incorrect. Could anyone help clarify the situation for me? Thanks. -- --Ed L Cashin | PGP public key: ecashin@uga.edu | http://noserose.net/e/pgp/