From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 5 12:38:39 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7E885591; Fri, 5 Sep 2014 12:38:39 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 079D61E83; Fri, 5 Sep 2014 12:38:38 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s85CcQBN086373 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 5 Sep 2014 15:38:27 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s85CcQBN086373 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s85CcQ8C086372; Fri, 5 Sep 2014 15:38:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 5 Sep 2014 15:38:26 +0300 From: Konstantin Belousov To: Pieter de Goeje Subject: Re: mmap MAP_NOSYNC regression in 10.x Message-ID: <20140905123826.GP2737@kib.kiev.ua> References: <540903FF.6010602@degoeje.nl> <20140905080633.GM2737@kib.kiev.ua> <5409A4F8.6020204@degoeje.nl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="W2+iMEDMST8vYcDN" Content-Disposition: inline In-Reply-To: <5409A4F8.6020204@degoeje.nl> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: alc@freebsd.org, hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Sep 2014 12:38:39 -0000 --W2+iMEDMST8vYcDN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Sep 05, 2014 at 01:56:40PM +0200, Pieter de Goeje wrote: > Thanks, works for me! I realized that the patch contains yet another bug. The oflags page flags update is protected by the exclusive vm object lock, which is only held in shared mode on the fast path. Below is the fixed patch, where I take the page lock around setting VPO_NOSYNC (exclusive lock owners cannot race with fast path since we own shared lock, and two parallel fast path execution would be handled by the page lock). diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c index 30b0456..f327ab0 100644 --- a/sys/vm/vm_fault.c +++ b/sys/vm/vm_fault.c @@ -174,6 +174,54 @@ unlock_and_deallocate(struct faultstate *fs) } } =20 +static void +vm_fault_dirty(vm_map_entry_t entry, vm_page_t m, vm_prot_t prot, + vm_prot_t fault_type, int fault_flags, boolean_t set_wd) +{ + + if (((prot & VM_PROT_WRITE) !=3D 0 || + (fault_flags & VM_FAULT_DIRTY) !=3D 0) && + (m->oflags & VPO_UNMANAGED) =3D=3D 0) { + if (set_wd) + vm_object_set_writeable_dirty(m->object); + + /* + * If this is a NOSYNC mmap we do not want to set VPO_NOSYNC + * if the page is already dirty to prevent data written with + * the expectation of being synced from not being synced. + * Likewise if this entry does not request NOSYNC then make + * sure the page isn't marked NOSYNC. Applications sharing + * data should use the same flags to avoid ping ponging. + */ + if (entry->eflags & MAP_ENTRY_NOSYNC) { + if (m->dirty =3D=3D 0) { + if (!set_wd) + vm_page_lock(m); + m->oflags |=3D VPO_NOSYNC; + if (!set_wd) + vm_page_unlock(m); + } + } else { + m->oflags &=3D ~VPO_NOSYNC; + } + + /* + * If the fault is a write, we know that this page is being + * written NOW so dirty it explicitly to save on + * pmap_is_modified() calls later. + * + * Also tell the backing pager, if any, that it should remove + * any swap backing since the page is now dirty. + */ + if (((fault_type & VM_PROT_WRITE) !=3D 0 && + (fault_flags & VM_FAULT_CHANGE_WIRING) =3D=3D 0) || + (fault_flags & VM_FAULT_DIRTY) !=3D 0) { + vm_page_dirty(m); + vm_pager_page_unswapped(m); + } + } +} + /* * TRYPAGER - used by vm_fault to calculate whether the pager for the * current object *might* contain the page. @@ -321,11 +369,8 @@ RetryFault:; vm_page_hold(m); vm_page_unlock(m); } - if ((fault_type & VM_PROT_WRITE) !=3D 0 && - (m->oflags & VPO_UNMANAGED) =3D=3D 0) { - vm_page_dirty(m); - vm_pager_page_unswapped(m); - } + vm_fault_dirty(fs.entry, m, prot, fault_type, fault_flags, + FALSE); VM_OBJECT_RUNLOCK(fs.first_object); if (!wired) vm_fault_prefault(&fs, vaddr, 0, 0); @@ -898,42 +943,7 @@ vnode_locked: if (hardfault) fs.entry->next_read =3D fs.pindex + faultcount - reqpage; =20 - if (((prot & VM_PROT_WRITE) !=3D 0 || - (fault_flags & VM_FAULT_DIRTY) !=3D 0) && - (fs.m->oflags & VPO_UNMANAGED) =3D=3D 0) { - vm_object_set_writeable_dirty(fs.object); - - /* - * If this is a NOSYNC mmap we do not want to set VPO_NOSYNC - * if the page is already dirty to prevent data written with - * the expectation of being synced from not being synced. - * Likewise if this entry does not request NOSYNC then make - * sure the page isn't marked NOSYNC. Applications sharing - * data should use the same flags to avoid ping ponging. - */ - if (fs.entry->eflags & MAP_ENTRY_NOSYNC) { - if (fs.m->dirty =3D=3D 0) - fs.m->oflags |=3D VPO_NOSYNC; - } else { - fs.m->oflags &=3D ~VPO_NOSYNC; - } - - /* - * If the fault is a write, we know that this page is being - * written NOW so dirty it explicitly to save on=20 - * pmap_is_modified() calls later. - * - * Also tell the backing pager, if any, that it should remove - * any swap backing since the page is now dirty. - */ - if (((fault_type & VM_PROT_WRITE) !=3D 0 && - (fault_flags & VM_FAULT_CHANGE_WIRING) =3D=3D 0) || - (fault_flags & VM_FAULT_DIRTY) !=3D 0) { - vm_page_dirty(fs.m); - vm_pager_page_unswapped(fs.m); - } - } - + vm_fault_dirty(fs.entry, fs.m, prot, fault_type, fault_flags, TRUE); vm_page_assert_xbusied(fs.m); =20 /* diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h index f12b76c..a45648d 100644 --- a/sys/vm/vm_page.h +++ b/sys/vm/vm_page.h @@ -147,7 +147,7 @@ struct vm_page { uint16_t hold_count; /* page hold count (P) */ uint16_t flags; /* page PG_* flags (P) */ uint8_t aflags; /* access is atomic */ - uint8_t oflags; /* page VPO_* flags (O) */ + uint8_t oflags; /* page VPO_* flags (OM) */ uint8_t queue; /* page queue index (P,Q) */ int8_t psind; /* pagesizes[] index (O) */ int8_t segind; @@ -163,8 +163,9 @@ struct vm_page { /* * Page flags stored in oflags: * - * Access to these page flags is synchronized by the lock on the object - * containing the page (O). + * Access to these page flags is synchronized by the exclusive lock on + * the object containing the page, or combination of shared object + * lock and the page lock (OM). * * Note: VPO_UNMANAGED (used by OBJT_DEVICE, OBJT_PHYS and OBJT_SG) * indicates that the page is not under PV management but --W2+iMEDMST8vYcDN Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJUCa7CAAoJEJDCuSvBvK1Bka4P/iPGCr9I5LVmIkugcj4QzELB UwzBrQlK42bNiW5sAiU1qRbkMrLVfZyjPTBHPsFjsiVmVb8b5Xi+KbWFZesyFnxj SMynOb6msXm0NJItlZNuWXFF6Uf8LAFdR7LJypNbgQb1Rgu7ztmzpugNs/8I2A8n J8XG/W4+Jcb6QEZ9yfhCGDWy4YnHqEOh9WzjNYPyE//GN+ommNaDOOcbsFBKBw6G 5vYfiP0tIXHOVRPAzvUl6jrkXYT4S9CqFvVDGdLQoTeAfm7jj49D67bf1CMy4uUN jVINMhtTPmzqkcLSHFnzRQmy7AtAQI+VBgoJxg3OCk6pVuUs4w5WB1by9ecA7ZcF 8q49QLcsxGjIVNrNB551djl2eu2hGl1rp0IgYZb0wO1zHuDYMlJZc5X8KEh1XQUy ckmutFCdEt8piqCzIK7cShisbMyMY+rmFiBuwWM9X/7NYjjXU9M43tfByBqGeh65 XSpz9mHvckEG0BCouhu6UIs9JdipRLKRLu+8w233gDf31FCPRTf38T6zAvVHdG4l 9BINyz4vt9hJl0q7M5S5KxUGVtW+0nDvApA2a52h3RfL5QtYO/JHC6dVIdzOnHHJ eb9gazv0gS10LZQ23igTfFpZ6zTixk3q73orvqG3rJN62uSz/iFIjHqjCHeH6z36 FV0TpFkI495w4C7YlTS7 =Z8iA -----END PGP SIGNATURE----- --W2+iMEDMST8vYcDN--