From owner-freebsd-smp Wed Mar 26 10:51:53 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id KAA08427 for smp-outgoing; Wed, 26 Mar 1997 10:51:53 -0800 (PST) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id KAA08418 for ; Wed, 26 Mar 1997 10:51:47 -0800 (PST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA28377; Wed, 26 Mar 1997 11:37:52 -0700 From: Terry Lambert Message-Id: <199703261837.LAA28377@phaeton.artisoft.com> Subject: Re: APIC_IO and the fpu To: cr@jcmax.com (Cyrus Rahman) Date: Wed, 26 Mar 1997 11:37:52 -0700 (MST) Cc: smp@FreeBSD.ORG In-Reply-To: <9703261059.AA21640@corona.jcmax.com> from "Cyrus Rahman" at Mar 26, 97 05:59:53 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-smp@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > >> >It looks to me that the problem with the fpu stuff might be more a > >> >function of FPU error handling (irq13 vs. IDT trap #16) rather than just > >> >plain floating point operations... > >> > >> I agree, and think that someone could find it by examining what goes on > >> in i386/isa/npx.c:mpxintr(). > > > >A further thought, its possible that it could be APIC_IO related, could > >someone run this program on both an APIC_IO and non APIC_IO kernel > >to see if it locks both (I don't have an SMP machine anymore)? [ ... code ... ] > The code runs fine with APIC_IO off, but locks the machine up quite nicely > with it on. I'm not sure what causes the problem yet... The FPU stuff is broken. Since we do not use TSS, we should probably *explicitly* set CR0 bit 3 on a task switch, don't you think? Then when a WAIT or ESC instruction is hit, an exception 7 will be raised for: "The floating point unit is about to execute an instruction associated with another task and a task switch has occurred." Since we may have only two processes running FPU, it is possible with a lot of processes that the process that lazy caching will result in the cache state of the FPU on one processor not being the state for another and the task being resumed on the wrong processor and the exception therefore going to the wrong process (or worse, an error condition remaining cached but unflagged for the process that was switched out and is now running with no knowledge of the anticipated flagging on another processor). Probably we should explicitly flush FPU state unless we establish a CPU affinity for the process we aren't flushing? Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.