From owner-freebsd-current@FreeBSD.ORG Thu Jun 17 13:25:11 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E15A16A4CE for ; Thu, 17 Jun 2004 13:25:11 +0000 (GMT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id A804043D49 for ; Thu, 17 Jun 2004 13:25:10 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i5HCRL5v021305; Thu, 17 Jun 2004 22:27:21 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i5HCRHaU015992; Thu, 17 Jun 2004 22:27:19 +1000 Date: Thu, 17 Jun 2004 22:27:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Simon Barner In-Reply-To: <20040617134101.V1345@gamplex.bde.org> Message-ID: <20040617215851.V1012@gamplex.bde.org> References: <20040616105706.GC1140@zi025.glhnet.mhn.de> <20040617134101.V1345@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org Subject: Re: Bogus signal handler causes kernel panic (5.2.1-p8/i386) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jun 2004 13:25:11 -0000 On Thu, 17 Jun 2004, Bruce Evans wrote: > On Wed, 16 Jun 2004, Simon Barner wrote: > > > I tried the local denial of service attack described in [1], that was > > reported for Linux 2.4 and 2.6 some days ago (see [2] for the original > > thread in linux.kernel) on my FreeBSD 5.2.1-p8 system. > > > > The result is a kernel panic (back trace attached). > > > > Since des@ told me in a private mail, that he could not reprocduce the > > panic on -CURRENT, I'd like to ask how to proceed from here. > > I couldn't reproduce it either, but I think this is just accidental. > It takes a particular combination of FPU exceptions and masks to cause > the panic. I can now reproduce it. The case of it involving signal handlers takes all of the following: (a) doing FP calculations in a signal handler. This is not normally useful, and is not supported in RELENG_4. (b) generating an unmasked pending but not trapped on exception in (a). (c) using an old CPU, or using CPU_DISABLE_SSE. The bug doesn't affect the SSE (FXSR) case like I first thought. (d) running 5.x. The bug is not in RELENG_4 like I first thought. (e) running 5.x unmodified. I use 4.x signal handlers in my 5.x userland for backwards compatibility. This unsupports doing FP in signal handlers as in (a) and happens to avoid going near the bug. > Try the following quick fix. It is for -current but should work for > RELENG_5 and RELENG_4 too. (Note that it changes fpurstor(), not > fpusave(). The patch context is not large enough to be unambigous.) > It might be incomplete. Try the following not so quick fix. It is for -current but should work for RELENG_5 too. %%% Index: npx.c =================================================================== RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v retrieving revision 1.149 diff -u -2 -r1.149 npx.c --- npx.c 6 Jun 2004 15:17:44 -0000 1.149 +++ npx.c 17 Jun 2004 11:28:13 -0000 @@ -873,4 +924,13 @@ struct thread *td; + /* + * Discard pending exceptions in the !cpu_fxsr case so that unmasked + * ones don't cause a panic on the next frstor. + */ +#ifdef CPU_ENABLE_SSE + if (!cpu_fxsr) +#endif + fnclex(); + td = PCPU_GET(fpcurthread); PCPU_SET(fpcurthread, NULL); %%% > I think RELENG_4 has the problem too in the (CPU_ENABLE_SSE && cpu_fxsr) > case. Then fpusave() doesn't reset the npx state, so there may be a > pending exception from the previous process. fpurstor() then traps if > it happens to restore a state that has the exception unmasked. There > used to be no problem because the previous state was always put away > using the fnsave instruction, and fnsave has the side effect of > initializing a clean state, in particular a state that doesn't have > any pending exceptions. This has been broken in 2 ways: > - in RELENG_4 and -current, fxsave is used instead of fnsave in the > (CPU_ENABLE_SSE && cpu_fxsr) case). fxsave doesn't have the side > effect. > - in -current, the previous state is sometimes dropped instead of > saved. This is entirely in software, so it doesn't have the side > effect. Actually, there is only a problem from dropping the state, only in the !fxsr case. There is no problem using fxsave+fxrstor because fxrstor works right. > > [1] http://linuxreviews.org/news/2004-06-11_kernel_crash/#toc1 > > [2] http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&frame=right&th=f7580d647408b95b&seekm=26hGq-Zr-31%40gated-at.bofh.it#link1 > > The bug is a little different in Linux. Linux uses more synchronization > instructions, perhaps unnecessarily. Its version of dropping the state > isn't entirely in software. It has an fwait that was not preceded by an > fnclex, so it paniced if there were any pending exceptions in the state > being dropped. The above fix makes the difference littler. It puts the fnclex in npxdrop() instead of in fpurstor() because the later is called more often. Bruce