Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jul 2012 15:31:38 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Mark Linimon <linimon@lonesome.com>, freebsd-amd64@freebsd.org
Subject:   Re: amd64/169927: siginfo, si_code for fpe errors when error occurs using the SSE math processor
Message-ID:  <20120719123138.GG2676@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120719111846.G780@besplex.bde.org>
References:  <201207172150.q6HLoFvB096742@freefall.freebsd.org> <20120718103210.E834@besplex.bde.org> <20120718035220.GO2676@deviant.kiev.zoral.com.ua> <20120718140523.X1862@besplex.bde.org> <20120718050958.GQ2676@deviant.kiev.zoral.com.ua> <20120718153554.O2195@besplex.bde.org> <20120718081003.GT2676@deviant.kiev.zoral.com.ua> <20120719111846.G780@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--ijywFOGgtBfiIjQx
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 19, 2012 at 01:10:29PM +1000, Bruce Evans wrote:
> On Wed, 18 Jul 2012, Konstantin Belousov wrote:
>=20
> >On Wed, Jul 18, 2012 at 04:59:38PM +1000, Bruce Evans wrote:
> >>On Wed, 18 Jul 2012, Konstantin Belousov wrote:
> >>Putting the definiton in machine/pcpu.h should avoid changing the
> >>prerequistes for machine/pcb.h.
> >>
> >>>#ifndef _KERNEL
> >>>/* stuff that *used* to be included by user.h, or is now needed */
> >>>
> >>>Please note the location in pcb.h an not in machine/pcpu.h, where it
> >>>cannot work for technical reasons (struct pcpu is not defined yet).
> >>
> >>Not applicable -- see above.
> >No, this cannot work. machine/pcpu.h defines PCPU_MD_FIELDS which is used
> >to provide md part of the struct pcpu.
>=20
> I see.  Oops.
>=20
> But this problem is easy to avoid.  There are at least the following
> alternatives:
> (1) hard-code the offset, as for curthread.  This is ugly, but not too
>     hard to maintain.  The offset has been 4 * sizeof(struct thread *),
>     for more than 10 years, and you can't change this without breaking
>     the ABI.
I did this, adding CTASSERTs. See the patch below.

> (2) #define the offset, as in raw asm code.  The offset is #define'd in
>     assym.s.  This would give too much namespace pollution and bloat
>     (unless a short assym.non-s is used).
> (3) use a macro instead of an inline function.  Inline functions tend to
>     require much more namespace pollution than macros, and this is an
>     example of that.  This is essentially the method that we already
>     use.  We define define the macro PCPU_GET().  We could define curpcb
>     as PCPU_GET(curpcb).  Instead, we write PCPU_GET(curpcb)
>     everywhere that we want the value.
>=20
>     I didn't like the roto-tilling of the source code for this and
>     other pcpu fields, and we didn't do it for curproc (we roto-tilled
>     curproc to curthread instead).  Both curthread and curpcb are not
>     as volatile as the other fields, so it is technically slightly
>     incorrect to use the general volatile PCPU_GET() for them.  We
>     already use a de-volatalized PCPU_GET() for curthread.  This was
>     originally just an optimization and a de-obfuscation.  PCPU_GET()
>     expands to really horrible, hard-to debug code.  It was so horrible
>     and bloated that it broke compiling, so peter changed curthread
>     to use an inline function in 2003.  This is only done for some
>     arches (just amd64 and i386?), and sys/pcpu.h still defines
>     curthread as PCPU_GET(curthread) if it is not already defined.
>     The inline function was originally equivalent to PCPU_GET().  It
>     used a volatile asm and wasn't pure2.  But since it was independent
>     of PCPU_GET(), it was easier to optimize.
>=20
>     To do the same thing for curpcb, provide a PCPU_NONVOLATILE_GET()
>     which is the same as PCPU_GET() without the volatile in the asm,
>     and #define curpcb as PCPU_NONVOLATILE_GET(curpcb).  Or maybe
>     start with the inline function used for curthread, and macroize it.
>=20
>     This could be used for curthread too, but I think we want to keep it
>     as an inline function since the macro expansions are still horrible.
I do not like this, prefer the inline functions approach.

>=20
> BTW, can you explain the locking for the pcpu access functions?  It seems
> to be quite broken, with the volatile in the asms neither necessary nor
> sufficient.  PCPU_PTR() seems to be almost unusable, but it is used.
> Here is the i386 version of __PCPU_GET():
>=20
> % #define	__PCPU_GET(name) __extension__ ({			 \
> % 	__pcpu_type(name) __res;					\
> % 	struct __s {							\
> % 		u_char	__b[MIN(sizeof(__res), 4)];			\
> % 	} __s;								\
> % 									\
> % 	if (sizeof(__res) =3D=3D 1 || sizeof(__res) =3D=3D 2 ||			\
> % 	    sizeof(__res) =3D=3D 4) {					\
> % 		__asm __volatile("mov %%fs:%1,%0"			\
> % 		    : "=3Dr" (__s)					\
> % 		    : "m" (*(struct __s *)(__pcpu_offset(name))));	\
> % 		*(struct __s *)(void *)&__res =3D __s;			\
>=20
> This part is more or less OK.  It is subtle.  The pcpu data starting at
> %fs:0 is volatile (or at least variable), and so is its address.  Its
> address is not a C address, but is the constant 0 in %fs-space.  %fs is
> equivalent to a volatile C pointer.  When %fs changes, it is normally
> the responsibility of the caller to lock things so that this is not a
> problem or so that %fs can't change.  (The %fs descriptor doesn't
> actually change, but the %fs data does). There are some important special
> cases where no locking is required:
> - for pc_curthread.  This cannot change (except in context-switching
>   code).  The CPU running the code can change unless the caller locks
>   things.  Then %fs changes, but pc_curthread is switched so that it
>   is the same for the new CPU as for the old CPU.
> - for pc_curpcb.  This is switched like pc_curthread.
> - I don't know of any other pcpu variables that are as easy to use or
>   as obviously correctly locked as the above 2.
> - counter variables.  Now you want the %fs pointer to change to track
>   the CPU.
They are 'locked' by the fact that the increment is atomic regarind
context switches and interrupts, since CPU guarantees that interrupts
only happen on exact instruction boundaries.

>=20
> % 	} else {							\
> % 		__res =3D *__PCPU_PTR(name);				\
>=20
> __PCPU_PTR() seems to be almost unusable, and this is not one of the
> few cases where it is usable.  Here it is used for wide or unusually-sized
> fields where atomic access is especially problematic, yet it does even
> less that the above to give atomicity.  First consider using it in all
> cases to replace the above:
> - using it for fields like pc_curthead would be very broken.  The pointer
>   doesn't change, so if there is a context switch between reading the
>   pointer and following it, the pointer reads the data for the wrong CPU.
>   For pc_cuthread, the new pointer will give the old thread which is
>   what curthread should return, but the old pointer will give some unrela=
ted
>   thread (whatever the old CPU is running now).
> - for counter variables, there is no problem.  We don't care whether the
>   pointer is for the new CPU or the old one, provided the access is atomi=
c,
>   and here I am only considering the case where accesses are natuarally
>   atomic.  (We actually use different access functions for counters so
>   that things like incrementing counters is atomic.)
> Now consider it for wide data.  The pointer doesn't change so the multiple
> accesses needed for the wide data are at least for the same CPU.  This gi=
ves
> the same atomicity problems as for C pointers.  The caller needs to lock.
> But PCPU_GET() is supposed to be a general interface.  Its atomicity
> shouldn't depend on the size of the data.
>=20
> % 	}								\
> % 	__res;								\
> % })
For x86-oids, if you use PCPU_GET for pointers, you euther need to brace
the block with something that disables preemption, e.g. bind the thread
to CPU, or use spinlock, or enter critical section, or be ready to handle
possible scheduling. Example of the later would be use of atomic operation
on the resulting pointer destination.

>=20
> What happens for arches where PCPU_GET() uses a C pointer?  I don't see
> how this can work for fields that are switched like pc_curthread.  Well,
> here is what happens in some cases:
> - on ia64, PCPU_GET() uses the C pointer pcpu which is a fixed register.
>   curthread works because it doesn't use PCPU_GET() (it uses a __pure2
>   non-volatile asm similarly to i386).  curpcb works beause it is not
>   used.
> - sparc64 is similar to ia64 except it gives even more examples of the
>   special switched fields: pcpu is C pointer in a fixed register;
>   curpcb is a C pointer in another fixed register; curthread uses a
>   __pure2 non-volatile asm similarly to ia64.  Why is asm used for
>   curthread on ia64 and sparc64?  The asm seems to have no magic, but
>   just follows the C pointer.  Ah, this seems to be just bogus.  Old
>   versions did have magic -- they used a volatile asm and didn't use
>   __pure2, but they were optimized like the x86 versions 2 years ago.
>=20
> So the problem seems to be well understood for the switched fields, and
> to not exist for counters.  Other cases are hopefully handled by locking.
> Here are some that aren't:
> - PCPU_GET/SET(switchtime) in kern_resource.c.  The accesses are only
>   proc locked AFAIK.  switchtime is uint64_t, so accesses to it are
>   non-atomic on 32-bit arches.
Note that access to switchtime is covered by the process spinlock.
Spinlocks disable preemption, so this is fine.

> - PCPU_GET/SET(switchtime) in kern_synch.c and sched_*.c.  Now the
>   accesses are thread locked or sched_locked, which had better be enough
>   (it is all that locks switching pc_curthread too?).
Ditto.

> - in clock code, there is a lot of thread locking, but this is localized
>   and not all of the pcpu accesses are covered by it.  Apparently, clock
>   code depends on being called in fast interrupt handle context to give
>   locking for pcpu.
>=20
> PCPU accesses are surprisingly rare except for counters, so there don't
> seem to be many problems.
>=20
> Most uses of PCP_PTR() are reasonable, since they are in places like
> subr_witness.c that should know to lock.  I would remove the one in
> PCPU_GET().  This would break mainly the switchtime accesses on i386
> and force callers to use PCPU_GET() and know that they need locking.

Below is the updated patch, with added assert that offset of pc_curthread
is 0.

diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c
index f88aba7..fe4bcf1 100644
--- a/sys/amd64/amd64/fpu.c
+++ b/sys/amd64/amd64/fpu.c
@@ -327,7 +327,7 @@ fpuexit(struct thread *td)
 	critical_enter();
 	if (curthread =3D=3D PCPU_GET(fpcurthread)) {
 		stop_emulating();
-		fpusave(PCPU_GET(curpcb)->pcb_save);
+		fpusave(curpcb->pcb_save);
 		start_emulating();
 		PCPU_SET(fpcurthread, 0);
 	}
@@ -547,7 +547,7 @@ fputrap_x87(void)
 	 * wherever they are.
 	 */
 	if (PCPU_GET(fpcurthread) !=3D curthread) {
-		pcb_save =3D PCPU_GET(curpcb)->pcb_save;
+		pcb_save =3D curpcb->pcb_save;
 		control =3D pcb_save->sv_env.en_cw;
 		status =3D pcb_save->sv_env.en_sw;
 	} else {
@@ -567,7 +567,7 @@ fputrap_sse(void)
=20
 	critical_enter();
 	if (PCPU_GET(fpcurthread) !=3D curthread)
-		mxcsr =3D PCPU_GET(curpcb)->pcb_save->sv_env.en_mxcsr;
+		mxcsr =3D curpcb->pcb_save->sv_env.en_mxcsr;
 	else
 		stmxcsr(&mxcsr);
 	critical_exit();
@@ -609,7 +609,7 @@ fpudna(void)
 	 * Record new context early in case frstor causes a trap.
 	 */
 	PCPU_SET(fpcurthread, curthread);
-	pcb =3D PCPU_GET(curpcb);
+	pcb =3D curpcb;
=20
 	fpu_clean_state();
=20
@@ -970,7 +970,7 @@ fpu_kern_thread(u_int flags)
 {
 	struct pcb *pcb;
=20
-	pcb =3D PCPU_GET(curpcb);
+	pcb =3D curpcb;
 	KASSERT((curthread->td_pflags & TDP_KTHREAD) !=3D 0,
 	    ("Only kthread may use fpu_kern_thread"));
 	KASSERT(pcb->pcb_save =3D=3D get_pcb_user_save_pcb(pcb),
@@ -987,5 +987,5 @@ is_fpu_kern_thread(u_int flags)
=20
 	if ((curthread->td_pflags & TDP_KTHREAD) =3D=3D 0)
 		return (0);
-	return ((PCPU_GET(curpcb)->pcb_flags & PCB_KERNFPU) !=3D 0);
+	return ((curpcb->pcb_flags & PCB_KERNFPU) !=3D 0);
 }
diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index 8044fe5..cd3ebc5 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -996,7 +996,7 @@ exec_setregs(struct thread *td, struct image_params *im=
gp, u_long stack)
 		pcb->pcb_dr3 =3D 0;
 		pcb->pcb_dr6 =3D 0;
 		pcb->pcb_dr7 =3D 0;
-		if (pcb =3D=3D PCPU_GET(curpcb)) {
+		if (pcb =3D=3D curpcb) {
 			/*
 			 * Clear the debug registers on the running
 			 * CPU, otherwise they will end up affecting
diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 57d1cc2..d035a7e 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -520,9 +520,8 @@ trap(struct trapframe *frame)
 				frame->tf_rip =3D (long)fsbase_load_fault;
 				goto out;
 			}
-			if (PCPU_GET(curpcb)->pcb_onfault !=3D NULL) {
-				frame->tf_rip =3D
-				    (long)PCPU_GET(curpcb)->pcb_onfault;
+			if (curpcb->pcb_onfault !=3D NULL) {
+				frame->tf_rip =3D (long)curpcb->pcb_onfault;
 				goto out;
 			}
 			break;
@@ -708,7 +707,7 @@ trap_pfault(frame, usermode)
 		 * it normally, and panic immediately.
 		 */
 		if (!usermode && (td->td_intr_nesting_level !=3D 0 ||
-		    PCPU_GET(curpcb)->pcb_onfault =3D=3D NULL)) {
+		    curpcb->pcb_onfault =3D=3D NULL)) {
 			trap_fatal(frame, eva);
 			return (-1);
 		}
@@ -764,8 +763,8 @@ trap_pfault(frame, usermode)
 nogo:
 	if (!usermode) {
 		if (td->td_intr_nesting_level =3D=3D 0 &&
-		    PCPU_GET(curpcb)->pcb_onfault !=3D NULL) {
-			frame->tf_rip =3D (long)PCPU_GET(curpcb)->pcb_onfault;
+		    curpcb->pcb_onfault !=3D NULL) {
+			frame->tf_rip =3D (long)curpcb->pcb_onfault;
 			return (0);
 		}
 		trap_fatal(frame, eva);
diff --git a/sys/amd64/amd64/vm_machdep.c b/sys/amd64/amd64/vm_machdep.c
index 103fa0d..070d8c9 100644
--- a/sys/amd64/amd64/vm_machdep.c
+++ b/sys/amd64/amd64/vm_machdep.c
@@ -90,6 +90,10 @@ static u_int	cpu_reset_proxyid;
 static volatile u_int	cpu_reset_proxy_active;
 #endif
=20
+CTASSERT((struct thread **)OFFSETOF_CURTHREAD =3D=3D
+    &((struct pcpu *)NULL)->pc_curthread);
+CTASSERT((struct pcb **)OFFSETOF_CURPCB =3D=3D &((struct pcpu *)NULL)->pc_=
curpcb);
+
 struct savefpu *
 get_pcb_user_save_td(struct thread *td)
 {
diff --git a/sys/amd64/include/pcpu.h b/sys/amd64/include/pcpu.h
index d07dbac..5d1fd4d 100644
--- a/sys/amd64/include/pcpu.h
+++ b/sys/amd64/include/pcpu.h
@@ -216,16 +216,29 @@ extern struct pcpu *pcpup;
 #define	PCPU_PTR(member)	__PCPU_PTR(pc_ ## member)
 #define	PCPU_SET(member, val)	__PCPU_SET(pc_ ## member, val)
=20
+#define	OFFSETOF_CURTHREAD	0
 static __inline __pure2 struct thread *
 __curthread(void)
 {
 	struct thread *td;
=20
-	__asm("movq %%gs:0,%0" : "=3Dr" (td));
+	__asm("movq %%gs:%1,%0" : "=3Dr" (td)
+	    : "m" (*(char *)OFFSETOF_CURTHREAD));
 	return (td);
 }
 #define	curthread		(__curthread())
=20
+#define	OFFSETOF_CURPCB		32
+static __inline __pure2 struct pcb *
+__curpcb(void)
+{
+	struct pcb *pcb;
+
+	__asm("movq %%gs:%1,%0" : "=3Dr" (pcb) : "m" (*(char *)OFFSETOF_CURPCB));
+	return (pcb);
+}
+#define	curpcb		(__curpcb())
+
 #define	IS_BSP()	(PCPU_GET(cpuid) =3D=3D 0)
=20
 #else /* !lint || defined(__GNUCLIKE_ASM) && defined(__GNUCLIKE___TYPEOF) =
*/

--ijywFOGgtBfiIjQx
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlAH/ioACgkQC3+MBN1Mb4g47QCdExFYfgmsb7VN8pWA4w+IeCLe
oaYAnixdFtzkLHW0+Fc63F/OLjr+13e2
=LheE
-----END PGP SIGNATURE-----

--ijywFOGgtBfiIjQx--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120719123138.GG2676>