Date: Fri, 16 Nov 2001 02:10:01 EET From: Maxim Sobolev <sobomax@FreeBSD.org> To: eischen@pcnet1.pcnet.com Cc: marcus@marcuscom.com, freebsd-ports@FreeBSD.org, hackers@FreeBSD.org Subject: Re: Using bit 21 of EFLAGS in user-mode [was: Re: sigreturn: eflags creash (fixed!)] Message-ID: <200111160010.CAA15164@ipcard.iptcom.net> In-Reply-To: <Pine.SUN.3.91.1011115173611.10851A-100000@pcnet1.pcnet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 15 Nov 2001 17:41:32 -0500 (EST), Daniel Eischen wrote: > On Thu, 15 Nov 2001, Maxim Sobolev wrote: > > On Thu, 15 Nov 2001 14:56:31 -0500 (EST), Joe Clarke wrote: > > > > > > I learned about this by reading through some of the -hackers archives. > > > One person complained of similar errors trying to get xine to work on > > > FreeBSD. Removing the MMX detection code fixed it. I remembered libpng > > > also used MMX, so I removed the pnggccrd.c source, and voila! > > > > > > Based on core dumps, strace output, and a lot of code surfing, this makes > > > sense to me. Basically, any png-dependent app's thread that runs longer > > > than what ITIMER_PROF is set to gets hit with a SIGPROF. When that > > > happens, things context switch. eflags must have been corrupted by the > > > MMX code, thus sigreturn() bombs out, and causes uthread_kern to die as > > > well. Here's what strace looks like when balsa tries to read a 33 MB > > > mailbox: > > > > > > 74202 sigreturn(0x81f2c64 > > > > > > When this happens, strace politely dies with a bus error. > > > > > > Thanks for testing this, Maxim. Hopefully someone can find the problem > > > and fix it for good. > > > > That explains... After a quick glance at png code I found that > > the only place where EFLAGS is altered is CPUID code, where > > the library flips bit 21 of EFLAGS in order to ensure that the > > CPUID instruction is supported (otherwise it will get SIGILL > > on older processors). Unfortunately, for some reason FreeBSB > > Does it need to keep bit 21 of EFLAGS flipped, or can libpng > set it back and keep knowledge that CPUID is supported? Or > does that bit need to remain set for CPUID to work? No it doesn't need it to be in any specific state. The only knowelege a program gains from the bit 21 is that its state could be changed, which means that CPUID instruction is supported. Unfortunately original libpng doesn't bother to set the state of the bit back, which exposed this problem. > If at all possible, a fix should be committed that wouldn't > necessitate a new kernel be built for -stable. Yes, I was also thinking about that. I've committed a patch, which restores state of the bit 21 as soon as possible. There is still a chance that the program will get a signal during that time, but this change is rather slim. The "unsafe" piece of code in question looks like: popfl <-load eflags with bit 21 flipped pushfl <-save resulting eflags + popl %%eax <-load resulting eflags into eax + pushl %%ecx <-save original eflags popfl <-restore original eflags Of course, it is possible to either mask all signals during detection period, or rip out detection code based around eflags and replace it with SIGILL handler, but this will cannibalize on speed improvement from MMX optimisations because of the additonal overhead associated with doing syscall necessary to set-up signal handler or signal mask. In any case, tomorrow I will test this workaround extensively, and if it appears that it is not sufficient to prevent `sigreturn: eflags...' errors, then I'll just disable MMX code in the libpng. -Maxim To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200111160010.CAA15164>