From owner-freebsd-smp  Wed Mar 26 10:51:53 1997
Return-Path: <owner-smp>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id KAA08427
          for smp-outgoing; Wed, 26 Mar 1997 10:51:53 -0800 (PST)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id KAA08418
          for <smp@FreeBSD.ORG>; Wed, 26 Mar 1997 10:51:47 -0800 (PST)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA28377; Wed, 26 Mar 1997 11:37:52 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199703261837.LAA28377@phaeton.artisoft.com>
Subject: Re: APIC_IO and the fpu
To: cr@jcmax.com (Cyrus Rahman)
Date: Wed, 26 Mar 1997 11:37:52 -0700 (MST)
Cc: smp@FreeBSD.ORG
In-Reply-To: <9703261059.AA21640@corona.jcmax.com> from "Cyrus Rahman" at Mar 26, 97 05:59:53 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-smp@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> >> >It looks to me that the problem with the fpu stuff might be more a 
> >> >function of FPU error handling (irq13 vs. IDT trap #16) rather than just 
> >> >plain floating point operations...
> >> 
> >> I agree, and think that someone could find it by examining what goes on
> >> in i386/isa/npx.c:mpxintr().
> >
> >A further thought, its possible that it could be APIC_IO related, could
> >someone run this program on both an APIC_IO and non APIC_IO kernel
> >to see if it locks both (I don't have an SMP machine anymore)?

[ ... code ... ]

> The code runs fine with APIC_IO off, but locks the machine up quite nicely
> with it on.  I'm not sure what causes the problem yet...

The FPU stuff is broken.

Since we do not use TSS, we should probably *explicitly* set CR0 bit 3
on a task switch, don't you think?

Then when a WAIT or ESC instruction is hit, an exception 7 will be
raised for:

	"The floating point unit is about to execute an instruction
	 associated with another task and a task switch has occurred."

Since we may have only two processes running FPU, it is possible with
a lot of processes that the process that lazy caching will result in the
cache state of the FPU on one processor not being the state for another
and the task being resumed on the wrong processor and the exception
therefore going to the wrong process (or worse, an error condition
remaining cached but unflagged for the process that was switched out
and is now running with no knowledge of the anticipated flagging on
another processor).

Probably we should explicitly flush FPU state unless we establish a
CPU affinity for the process we aren't flushing?


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.