Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jun 2004 22:27:16 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Simon Barner <barner@in.tum.de>
Cc:        current@freebsd.org
Subject:   Re: Bogus signal handler causes kernel panic (5.2.1-p8/i386)
Message-ID:  <20040617215851.V1012@gamplex.bde.org>
In-Reply-To: <20040617134101.V1345@gamplex.bde.org>
References:  <20040616105706.GC1140@zi025.glhnet.mhn.de> <20040617134101.V1345@gamplex.bde.org>

index | next in thread | previous in thread | raw e-mail

On Thu, 17 Jun 2004, Bruce Evans wrote:

> On Wed, 16 Jun 2004, Simon Barner wrote:
>
> > I tried the local denial of service attack described in [1], that was
> > reported for Linux 2.4 and 2.6 some days ago (see [2] for the original
> > thread in linux.kernel)  on my FreeBSD 5.2.1-p8 system.
> >
> > The result is a kernel panic (back trace attached).
> >
> > Since des@ told me in a private mail, that he could not reprocduce the
> > panic on -CURRENT, I'd like to ask how to proceed from here.
>
> I couldn't reproduce it either, but I think this is just accidental.
> It takes a particular combination of FPU exceptions and masks to cause
> the panic.

I can now reproduce it.  The case of it involving signal handlers takes
all of the following:
(a) doing FP calculations in a signal handler.  This is not normally
    useful, and is not supported in RELENG_4.
(b) generating an unmasked pending but not trapped on exception in (a).
(c) using an old CPU, or using CPU_DISABLE_SSE.  The bug doesn't affect
    the SSE (FXSR) case like I first thought.
(d) running 5.x.  The bug is not in RELENG_4 like I first thought.
(e) running 5.x unmodified.  I use 4.x signal handlers in my 5.x userland
    for backwards compatibility.  This unsupports doing FP in signal
    handlers as in (a) and happens to avoid going near the bug.

> Try the following quick fix.  It is for -current but should work for
> RELENG_5 and RELENG_4 too.  (Note that it changes fpurstor(), not
> fpusave().  The patch context is not large enough to be unambigous.)
> It might be incomplete.

Try the following not so quick fix.  It is for -current but should work
for RELENG_5 too.

%%%
Index: npx.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v
retrieving revision 1.149
diff -u -2 -r1.149 npx.c
--- npx.c	6 Jun 2004 15:17:44 -0000	1.149
+++ npx.c	17 Jun 2004 11:28:13 -0000
@@ -873,4 +924,13 @@
 	struct thread *td;

+	/*
+	 * Discard pending exceptions in the !cpu_fxsr case so that unmasked
+	 * ones don't cause a panic on the next frstor.
+	 */
+#ifdef CPU_ENABLE_SSE
+	if (!cpu_fxsr)
+#endif
+		fnclex();
+
 	td = PCPU_GET(fpcurthread);
 	PCPU_SET(fpcurthread, NULL);
%%%

> I think RELENG_4 has the problem too in the (CPU_ENABLE_SSE && cpu_fxsr)
> case.  Then fpusave() doesn't reset the npx state, so there may be a
> pending exception from the previous process.  fpurstor() then traps if
> it happens to restore a state that has the exception unmasked.  There
> used to be no problem because the previous state was always put away
> using the fnsave instruction, and fnsave has the side effect of
> initializing a clean state, in particular a state that doesn't have
> any pending exceptions.  This has been broken in 2 ways:
> - in RELENG_4 and -current, fxsave is used instead of fnsave in the
>   (CPU_ENABLE_SSE && cpu_fxsr) case).  fxsave doesn't have the side
>   effect.
> - in -current, the previous state is sometimes dropped instead of
>   saved.  This is entirely in software, so it doesn't have the side
>   effect.

Actually, there is only a problem from dropping the state, only in the
!fxsr case.  There is no problem using fxsave+fxrstor because fxrstor
works right.

> > [1] http://linuxreviews.org/news/2004-06-11_kernel_crash/#toc1
> > [2] http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&frame=right&th=f7580d647408b95b&seekm=26hGq-Zr-31%40gated-at.bofh.it#link1
>
> The bug is a little different in Linux.  Linux uses more synchronization
> instructions, perhaps unnecessarily.  Its version of dropping the state
> isn't entirely in software.  It has an fwait that was not preceded by an
> fnclex, so it paniced if there were any pending exceptions in the state
> being dropped.

The above fix makes the difference littler.  It puts the fnclex in
npxdrop() instead of in fpurstor() because the later is called more
often.

Bruce


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040617215851.V1012>