Date: Fri, 2 Aug 2013 08:41:58 -0700 From: Alan Cox <alc@rice.edu> To: Bruce Evans <brde@optusnet.com.au> Cc: svn-src-projects@FreeBSD.org, src-committers@FreeBSD.org, Jung-uk Kim <jkim@FreeBSD.org> Subject: Re: svn commit: r253877 - in projects/atomic64/sys: amd64/include i386/include Message-ID: <ADCD3A87-EE8B-4B88-83EB-9CA3D00DA159@rice.edu> In-Reply-To: <20130802233616.D1711@besplex.bde.org> References: <201308020020.r720K5Gu099845@svn.freebsd.org> <20130802233616.D1711@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 2, 2013, at 6:51 AM, Bruce Evans wrote: > On Fri, 2 Aug 2013, Jung-uk Kim wrote: >=20 >> Log: >> Reimplement atomic operations on PDEs and PTEs in pmap.h. This = change >> significantly reduces duplicate code. Also, it may improve and even = correct >> some questionable implementations. >=20 > Do they all (or any) need to be atomic with respect to multiple CPUs? > It's hard to see how concurrent accesses to page tables can work worh > without higher-level locking than is provided by atomic ops. >=20 Some do, so that we do not lose a PG_M ("dirty") bit being set = concurrently by another processor. However. none of these accesses need = to be labeled as acquires or releases. =20 >> Modified: projects/atomic64/sys/amd64/include/pmap.h >> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >> --- projects/atomic64/sys/amd64/include/pmap.h Fri Aug 2 = 00:08:00 2013 (r253876) >> +++ projects/atomic64/sys/amd64/include/pmap.h Fri Aug 2 = 00:20:04 2013 (r253877) >> @@ -185,41 +185,13 @@ extern u_int64_t KPML4phys; /* physical >> pt_entry_t *vtopte(vm_offset_t); >> #define vtophys(va) pmap_kextract(((vm_offset_t) (va))) >>=20 >> -static __inline pt_entry_t >> -pte_load(pt_entry_t *ptep) >> -{ >> - pt_entry_t r; >> - >> - r =3D *ptep; >> - return (r); >> -} >=20 > This function wasn't atomic with respect to multiple CPUs. Except on > i386 with PAE, but then it changes a 64-bit object on a 32-bit CPU, > so it needs some locking just to be atomic with respect to a single = CPU. >=20 >> -static __inline pt_entry_t >> -pte_load_store(pt_entry_t *ptep, pt_entry_t pte) >> -{ >> - pt_entry_t r; >> - >> - __asm __volatile( >> - "xchgq %0,%1" >> - : "=3Dm" (*ptep), >> - "=3Dr" (r) >> - : "1" (pte), >> - "m" (*ptep)); >> - return (r); >> -} >=20 > This was the main one that was atomic with respect to multiple CPUs on > both amd64 and i386. This seems to be accidental -- xchg to memory = gives > a lock prefix and slowness whether you want it or not. >=20 >> - >> -#define pte_load_clear(pte) atomic_readandclear_long(pte) >> - >> -static __inline void >> -pte_store(pt_entry_t *ptep, pt_entry_t pte) >> -{ >> +#define pte_load(ptep) = atomic_load_acq_long(ptep) >> +#define pte_load_store(ptep, pte) atomic_swap_long(ptep, = pte) >> +#define pte_load_clear(pte) atomic_swap_long(pte, 0) >> +#define pte_store(ptep, pte) = atomic_store_rel_long(ptep, pte) >> +#define pte_clear(ptep) = atomic_store_rel_long(ptep, 0) >>=20 >> - *ptep =3D pte; >> -} >=20 > pte_store() was also not atomic with respect to multiple CPUs. So = almost > everything was not atomic with respect to multiple CPUs, except for = PAE > on i386. >=20 > Bruce >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ADCD3A87-EE8B-4B88-83EB-9CA3D00DA159>