Date: Mon, 8 Sep 2014 11:41:26 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Alan Cox <alc@rice.edu> Cc: alc@freebsd.org, Pieter de Goeje <pieter@degoeje.nl>, hackers@freebsd.org Subject: Re: mmap MAP_NOSYNC regression in 10.x Message-ID: <20140908084126.GX2737@kib.kiev.ua> In-Reply-To: <540BA416.7010106@rice.edu> References: <540903FF.6010602@degoeje.nl> <CAJUyCcNiLwLuL9crpQBjSdg4ED5kR53fPjyJG3HNmP5Roor8RQ@mail.gmail.com> <20140905080633.GM2737@kib.kiev.ua> <5409A4F8.6020204@degoeje.nl> <20140905123826.GP2737@kib.kiev.ua> <540BA416.7010106@rice.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
--cZMJGra2+uNVq8Ts
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Sat, Sep 06, 2014 at 07:17:26PM -0500, Alan Cox wrote:
> On 09/05/2014 07:38, Konstantin Belousov wrote:
> > On Fri, Sep 05, 2014 at 01:56:40PM +0200, Pieter de Goeje wrote:
> >> Thanks, works for me!
> > I realized that the patch contains yet another bug. The oflags page
> > flags update is protected by the exclusive vm object lock, which is only
> > held in shared mode on the fast path. Below is the fixed patch, where
> > I take the page lock around setting VPO_NOSYNC (exclusive lock owners
> > cannot race with fast path since we own shared lock, and two parallel
> > fast path execution would be handled by the page lock).
>=20
>=20
> Suppose that the page is clean and two threads are executing this code
> concurrently.  One's map entry has MAP_NOSYNC set, and the other's
> doesn't.  Let's call these threads NOSYNC and SYNC, respectively.
>=20
> Suppose that the thread SYNC is slightly ahead.  It has already
> performed "m->oflags &=3D ~VPO_NOSYNC;" and now it's about to perform
> "vm_page_dirty(fs.m);".  However, just before the thread SYNC calls
> vm_page_dirty(), the thread NOSYNC evaluates "m->dirty =3D=3D 0", which is
> still true, and it performs "m->oflags |=3D VPO_NOSYNC; "
>=20
> This can't happen on the slow path.  That is, a fault by a thread
> without MAP_NOSYNC set on its map entry will reliably clear VPO_NOSYNC.
As I understand things, it is indeed not possible on the slow path, due
to PG_RW only set from pmap_enter(), am I right ?  I.e. this is another
place where the rule 'no PG_RW without PG_M' is important.
Let me formulate my question another way:  what are the guarantees we
provide to the applications when the same page is mapped with and
without MAP_NOSYNC simultaneously ?  Is it contractually guaranteed that
any write from !MAP_NOSYNC entry triggers write in the syncer activity
period ?
>=20
> The best course of action may be to fall back to the slow path if you
> actually need to change VPO_NOSYNC's state.  Usually, you won't need to.
>=20
Let me first try to improve the original patch to handle
MAP_ENTRY_NOSYNC on fast path as well. It seems to be one of the cases
when the parallel faults are actually useful.
One more note: the previous patch handled m->oflags inconsistency for
setting VPO_NOSYNC operation, but missed the clear one line later.
I think that increasing the page lock to cover also the vm_page_dirty()
would fix the race you described, and the second manipulation with
oflags.
diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index 30b0456..944b479 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -174,6 +174,70 @@ unlock_and_deallocate(struct faultstate *fs)
 	}
 }
=20
+static void
+vm_fault_dirty(vm_map_entry_t entry, vm_page_t m, vm_prot_t prot,
+    vm_prot_t fault_type, int fault_flags, boolean_t set_wd)
+{
+	boolean_t need_dirty;
+
+	if (((prot & VM_PROT_WRITE) =3D=3D 0 &&
+	    (fault_flags & VM_FAULT_DIRTY) =3D=3D 0) ||
+	    (m->oflags & VPO_UNMANAGED) !=3D 0)
+		return;
+
+	VM_OBJECT_ASSERT_LOCKED(m->object);
+
+	need_dirty =3D ((fault_type & VM_PROT_WRITE) !=3D 0 &&
+	    (fault_flags & VM_FAULT_CHANGE_WIRING) =3D=3D 0) ||
+	    (fault_flags & VM_FAULT_DIRTY) !=3D 0;
+
+	if (set_wd)
+		vm_object_set_writeable_dirty(m->object);
+	else
+		/*
+		 * If two callers of vm_fault_dirty() with set_wd =3D=3D
+		 * FALSE, one for the map entry with MAP_ENTRY_NOSYNC
+		 * flag set, other with flag clear, race, it is
+		 * possible for the no-NOSYNC thread to see m->dirty
+		 * !=3D 0 and not clear VPO_NOSYNC.  Take vm_page lock
+		 * around manipulation of VPO_NOSYNC and
+		 * vm_page_dirty() call, to avoid the race and keep
+		 * m->oflags consistent.
+		 */
+		vm_page_lock(m);
+
+	/*
+	 * If this is a NOSYNC mmap we do not want to set VPO_NOSYNC
+	 * if the page is already dirty to prevent data written with
+	 * the expectation of being synced from not being synced.
+	 * Likewise if this entry does not request NOSYNC then make
+	 * sure the page isn't marked NOSYNC.  Applications sharing
+	 * data should use the same flags to avoid ping ponging.
+	 */
+	if ((entry->eflags & MAP_ENTRY_NOSYNC) !=3D 0) {
+		if (m->dirty =3D=3D 0) {
+			m->oflags |=3D VPO_NOSYNC;
+		}
+	} else {
+		m->oflags &=3D ~VPO_NOSYNC;
+	}
+
+	/*
+	 * If the fault is a write, we know that this page is being
+	 * written NOW so dirty it explicitly to save on
+	 * pmap_is_modified() calls later.
+	 *
+	 * Also tell the backing pager, if any, that it should remove
+	 * any swap backing since the page is now dirty.
+	 */
+	if (need_dirty)
+		vm_page_dirty(m);
+	if (!set_wd)
+		vm_page_unlock(m);
+	if (need_dirty)
+		vm_pager_page_unswapped(m);
+}
+
 /*
  * TRYPAGER - used by vm_fault to calculate whether the pager for the
  *	      current object *might* contain the page.
@@ -321,11 +385,8 @@ RetryFault:;
 			vm_page_hold(m);
 			vm_page_unlock(m);
 		}
-		if ((fault_type & VM_PROT_WRITE) !=3D 0 &&
-		    (m->oflags & VPO_UNMANAGED) =3D=3D 0) {
-			vm_page_dirty(m);
-			vm_pager_page_unswapped(m);
-		}
+		vm_fault_dirty(fs.entry, m, prot, fault_type, fault_flags,
+		    FALSE);
 		VM_OBJECT_RUNLOCK(fs.first_object);
 		if (!wired)
 			vm_fault_prefault(&fs, vaddr, 0, 0);
@@ -898,42 +959,7 @@ vnode_locked:
 	if (hardfault)
 		fs.entry->next_read =3D fs.pindex + faultcount - reqpage;
=20
-	if (((prot & VM_PROT_WRITE) !=3D 0 ||
-	    (fault_flags & VM_FAULT_DIRTY) !=3D 0) &&
-	    (fs.m->oflags & VPO_UNMANAGED) =3D=3D 0) {
-		vm_object_set_writeable_dirty(fs.object);
-
-		/*
-		 * If this is a NOSYNC mmap we do not want to set VPO_NOSYNC
-		 * if the page is already dirty to prevent data written with
-		 * the expectation of being synced from not being synced.
-		 * Likewise if this entry does not request NOSYNC then make
-		 * sure the page isn't marked NOSYNC.  Applications sharing
-		 * data should use the same flags to avoid ping ponging.
-		 */
-		if (fs.entry->eflags & MAP_ENTRY_NOSYNC) {
-			if (fs.m->dirty =3D=3D 0)
-				fs.m->oflags |=3D VPO_NOSYNC;
-		} else {
-			fs.m->oflags &=3D ~VPO_NOSYNC;
-		}
-
-		/*
-		 * If the fault is a write, we know that this page is being
-		 * written NOW so dirty it explicitly to save on=20
-		 * pmap_is_modified() calls later.
-		 *
-		 * Also tell the backing pager, if any, that it should remove
-		 * any swap backing since the page is now dirty.
-		 */
-		if (((fault_type & VM_PROT_WRITE) !=3D 0 &&
-		    (fault_flags & VM_FAULT_CHANGE_WIRING) =3D=3D 0) ||
-		    (fault_flags & VM_FAULT_DIRTY) !=3D 0) {
-			vm_page_dirty(fs.m);
-			vm_pager_page_unswapped(fs.m);
-		}
-	}
-
+	vm_fault_dirty(fs.entry, fs.m, prot, fault_type, fault_flags, TRUE);
 	vm_page_assert_xbusied(fs.m);
=20
 	/*
diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h
index f12b76c..a45648d 100644
--- a/sys/vm/vm_page.h
+++ b/sys/vm/vm_page.h
@@ -147,7 +147,7 @@ struct vm_page {
 	uint16_t hold_count;		/* page hold count (P) */
 	uint16_t flags;			/* page PG_* flags (P) */
 	uint8_t aflags;			/* access is atomic */
-	uint8_t oflags;			/* page VPO_* flags (O) */
+	uint8_t oflags;			/* page VPO_* flags (OM) */
 	uint8_t	queue;			/* page queue index (P,Q) */
 	int8_t psind;			/* pagesizes[] index (O) */
 	int8_t segind;
@@ -163,8 +163,9 @@ struct vm_page {
 /*
  * Page flags stored in oflags:
  *
- * Access to these page flags is synchronized by the lock on the object
- * containing the page (O).
+ * Access to these page flags is synchronized by the exclusive lock on
+ * the object containing the page, or combination of shared object
+ * lock and the page lock (OM).
  *
  * Note: VPO_UNMANAGED (used by OBJT_DEVICE, OBJT_PHYS and OBJT_SG)
  * 	 indicates that the page is not under PV management but
--cZMJGra2+uNVq8Ts
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBAgAGBQJUDWu2AAoJEJDCuSvBvK1B+bwP/2jAQVhay1dGGP8PjHH3zqFk
zkgPe1zKrKJvybXe76fCy9yPxxcIyp1lLONqYSjenkvPVigvFPwPdKo37H/H5Lyr
oK+6f/LGSOx0m2Vpd1h5HKtYfnuBEs3W/etFX9ZacZpR3EpTj7BTZaShzqFKNjKP
8nhT5XkJV1YItn8Rjft5f9MC8UhwG+fGkZ+UXjmQ1elSE4e6HhLFzfgQ8mfM8Zls
p0PnZ9qVp35M+tqOklkEMhzb+pIgORL7t2eqqWms5Y+VDfDdtFv956Pc8O/N5zB8
6zmaKfKaDNRnD1cS3HAFgHbozLz/KOZMphDO88yymkdIhakld5cusPYJeWz09MXm
rD9Bg9bfvpMjo6TOPZ7DV40CfO3u1SC4SLEt/Czimt5ivKdTKrIXVJCCqY+JOdN3
2b8WkvMcIgMTe6z7SDr4g+j7jReJtKv8eBpQMfA5uKpf/zxtyBEx4RiYkCjaIaXL
KBYYcc8eZjZ4C0JyVcnpsWSdvCADiwlCf9W2wwFRbMRr3Fh9cHLmpJLTGzJYtRF3
VU4kQjTMrQt91j/tfh84vDR/EvFtMLyxCLo1X/MrHCQG5/SrzlGtLwZFqDoJXV+E
3E6izBTdqeWz6Pel4XtNperaMGEIFyPRpCkhVvMH3FhGpBkh9ba+qFxlLiJfpsa6
tS/iofuRSvZd2wzLyDAC
=1VQ0
-----END PGP SIGNATURE-----
--cZMJGra2+uNVq8Ts--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140908084126.GX2737>
