From owner-freebsd-current Thu Oct 24 7:58:33 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EA57C37B4F1 for ; Thu, 24 Oct 2002 07:58:30 -0700 (PDT) Received: from mail05.svc.cra.dublin.eircom.net (mail05.svc.cra.dublin.eircom.net [159.134.118.21]) by mx1.FreeBSD.org (Postfix) with SMTP id 0D2D543E75 for ; Thu, 24 Oct 2002 07:58:30 -0700 (PDT) (envelope-from pmedwards@eircom.net) Received: (qmail 4074 messnum 261546 invoked from network[159.134.237.78/wendell.eircom.net]); 24 Oct 2002 14:57:06 -0000 Received: from wendell.eircom.net (HELO webmail.eircom.net) (159.134.237.78) by mail05.svc.cra.dublin.eircom.net (qp 4074) with SMTP; 24 Oct 2002 14:57:06 -0000 From: "Peter Edwards" To: Bruce Evans Cc: current@freebsd.org Subject: Re: Floating point problems Date: Thu, 24 Oct 2002 15:57:06 +0100 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-Originating-IP: 62.17.151.61 X-Mailer: Eircom Net CRC Webmail (http://www.eircom.net/) Organization: Eircom Net (http://www.eircom.net/) Message-Id: <20021024145830.0D2D543E75@mx1.FreeBSD.org> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Well, that's certainly fixed the problems my test app had. As for X: I was regularly able to hurt X by clicking randomly on the "transfers" window in Opera, and switching between it and other internal frames: symptoms included SEGVs, minute-long hangs, etc. Invoking such rain-dances failed to produce any positive results after about 5 mins., which is 3-4 times longer than its ever been up before under the same stress. I'll report back in about 24 hours either way, but I think that's cured it. -- Peter. Bruce Evans wrote: > > On Thu, 24 Oct 2002, Peter Edwards wrote: > > > There was some discussion about issues with interactions between the floating > > point context and signal handling in a thread a week or so ago, and a suggestion > > that someone try and get a simple test that would fail. I was surprised how > > easy it was: The following program just spins calculating the value of 6.0 / > > 3.0, and traps SIGINT. > > > > If you run it on -current (as of a few hours ago), 99% of the time, hitting > > ctl-C will cause the program to exit with an error. A 4.5 kernel never causes > > any problems. > > > > I'm pretty sure this is what's causing the stalls and crashes in X. I've taken > > stack traces of crashes, and from "spinning" processes, and I can spot NaNs on > > the stack that shouldn't be there, etc. > > Thanks. This makes the main bug clear. The PCB_NPXINITDONE bit in the > state was not being restored. This was confusing to debug because gdb > doesn't understand this bug so it shows the state that should have been > restored until npxdna() unrestores it consistently. Try this fix. > > %%% > Index: npx.c > =================================================================== > RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v > retrieving revision 1.133 > diff -u -2 -r1.133 npx.c > --- npx.c 20 Oct 2002 17:30:30 -0000 1.133 > +++ npx.c 24 Oct 2002 14:20:33 -0000 > @@ -1004,4 +1007,5 @@ > bcopy(addr, &td->td_pcb->pcb_save, sizeof(*addr)); > } > + curthread->td_pcb->pcb_flags |= PCB_NPXINITDONE; > } > > %%% > > Bruce > > -- Peter Edwards. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message