From owner-freebsd-current@FreeBSD.ORG  Thu Jun 17 13:25:11 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3E15A16A4CE
	for <current@freebsd.org>; Thu, 17 Jun 2004 13:25:11 +0000 (GMT)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A804043D49
	for <current@freebsd.org>; Thu, 17 Jun 2004 13:25:10 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i5HCRL5v021305;	Thu, 17 Jun 2004 22:27:21 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i5HCRHaU015992;	Thu, 17 Jun 2004 22:27:19 +1000
Date: Thu, 17 Jun 2004 22:27:16 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Simon Barner <barner@in.tum.de>
In-Reply-To: <20040617134101.V1345@gamplex.bde.org>
Message-ID: <20040617215851.V1012@gamplex.bde.org>
References: <20040616105706.GC1140@zi025.glhnet.mhn.de>
	<20040617134101.V1345@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: current@freebsd.org
Subject: Re: Bogus signal handler causes kernel panic (5.2.1-p8/i386)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jun 2004 13:25:11 -0000

On Thu, 17 Jun 2004, Bruce Evans wrote:

> On Wed, 16 Jun 2004, Simon Barner wrote:
>
> > I tried the local denial of service attack described in [1], that was
> > reported for Linux 2.4 and 2.6 some days ago (see [2] for the original
> > thread in linux.kernel)  on my FreeBSD 5.2.1-p8 system.
> >
> > The result is a kernel panic (back trace attached).
> >
> > Since des@ told me in a private mail, that he could not reprocduce the
> > panic on -CURRENT, I'd like to ask how to proceed from here.
>
> I couldn't reproduce it either, but I think this is just accidental.
> It takes a particular combination of FPU exceptions and masks to cause
> the panic.

I can now reproduce it.  The case of it involving signal handlers takes
all of the following:
(a) doing FP calculations in a signal handler.  This is not normally
    useful, and is not supported in RELENG_4.
(b) generating an unmasked pending but not trapped on exception in (a).
(c) using an old CPU, or using CPU_DISABLE_SSE.  The bug doesn't affect
    the SSE (FXSR) case like I first thought.
(d) running 5.x.  The bug is not in RELENG_4 like I first thought.
(e) running 5.x unmodified.  I use 4.x signal handlers in my 5.x userland
    for backwards compatibility.  This unsupports doing FP in signal
    handlers as in (a) and happens to avoid going near the bug.

> Try the following quick fix.  It is for -current but should work for
> RELENG_5 and RELENG_4 too.  (Note that it changes fpurstor(), not
> fpusave().  The patch context is not large enough to be unambigous.)
> It might be incomplete.

Try the following not so quick fix.  It is for -current but should work
for RELENG_5 too.

%%%
Index: npx.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v
retrieving revision 1.149
diff -u -2 -r1.149 npx.c
--- npx.c	6 Jun 2004 15:17:44 -0000	1.149
+++ npx.c	17 Jun 2004 11:28:13 -0000
@@ -873,4 +924,13 @@
 	struct thread *td;

+	/*
+	 * Discard pending exceptions in the !cpu_fxsr case so that unmasked
+	 * ones don't cause a panic on the next frstor.
+	 */
+#ifdef CPU_ENABLE_SSE
+	if (!cpu_fxsr)
+#endif
+		fnclex();
+
 	td = PCPU_GET(fpcurthread);
 	PCPU_SET(fpcurthread, NULL);
%%%

> I think RELENG_4 has the problem too in the (CPU_ENABLE_SSE && cpu_fxsr)
> case.  Then fpusave() doesn't reset the npx state, so there may be a
> pending exception from the previous process.  fpurstor() then traps if
> it happens to restore a state that has the exception unmasked.  There
> used to be no problem because the previous state was always put away
> using the fnsave instruction, and fnsave has the side effect of
> initializing a clean state, in particular a state that doesn't have
> any pending exceptions.  This has been broken in 2 ways:
> - in RELENG_4 and -current, fxsave is used instead of fnsave in the
>   (CPU_ENABLE_SSE && cpu_fxsr) case).  fxsave doesn't have the side
>   effect.
> - in -current, the previous state is sometimes dropped instead of
>   saved.  This is entirely in software, so it doesn't have the side
>   effect.

Actually, there is only a problem from dropping the state, only in the
!fxsr case.  There is no problem using fxsave+fxrstor because fxrstor
works right.

> > [1] http://linuxreviews.org/news/2004-06-11_kernel_crash/#toc1
> > [2] http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&frame=right&th=f7580d647408b95b&seekm=26hGq-Zr-31%40gated-at.bofh.it#link1
>
> The bug is a little different in Linux.  Linux uses more synchronization
> instructions, perhaps unnecessarily.  Its version of dropping the state
> isn't entirely in software.  It has an fwait that was not preceded by an
> fnclex, so it paniced if there were any pending exceptions in the state
> being dropped.

The above fix makes the difference littler.  It puts the fnclex in
npxdrop() instead of in fpurstor() because the later is called more
often.

Bruce