Date: Mon, 05 Aug 2002 11:50:59 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: "John S. Bucy" <bucy@ece.cmu.edu> Cc: freebsd-hackers@freebsd.org Subject: Re: weird npxintr Message-ID: <3D4EC913.528452C2@mindspring.com> References: <20020805182753.GD494@catalepsy.pdl.cmu.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
"John S. Bucy" wrote: > We're playing with disk request scheduling as part of a research > project; we've introduced a lot of new code to 4.4 and are now getting > a weird npxintr that's killing us. My understanding is that npxintr > has to do with the x87 fpu interface for ia32s and that you get it > when fp instructions issued from the kernel are interrupted and then > restarted. > > We are pretty sure that all of our code is fp free and are trying to > figure out what's going on. We're using long long a lot and I've > heard that gcc generates buggy code for long long sometimes. But I'd > expect an integer arithmetic exception instead for a problem there. The "multimedia" instructions also use the FPU registers, because they overlay their regsters on tp of the FPU. If you are using the CPU specific bcopy code, this choulc be the source of your problem. On a hunch: are you using an AMD K6 or similar and enabling the CPU specific options within the config file? Copies occurring at interrupt time can result in this behaviour due to an inability to obtain a process context for a current process that's the real current process when the FPU state is switched out via late-binding. > We mask some interrupts for a relatively long period of time doing > some computation; could that cause this? I don't own the piece of the > code that manipulates interrupts; is there some way to misuse > splx/... that might cause this? > > We're getting > > npxintr: npxproc = 0, curproc = 0, npx_exists = 1 > panic: npxintr from nowhere > > right after we do an splbio() (I think) The copy you are doing at that point is attempting a lazy bind without a process context (because it's happening at interrupt). If you can, move the large data manipulation, etc., out of the interrupt handler itself, and do it via pullup instead. That type of thing should only ever be in the upper level interrupt handler (e.g. via software interrupt, or in the user process context on behalf of which the work is being done, after the wakeup of the user process which is waiting on an operation). It's a bad idea to do a lot of work in the interrupt handler, in any case, unless there is a technical reason for it, like quenching interrupts on purpose for network cards to avoid receiver livelock. An example (pseudocode) would be: bad: user process makes request sleep user process ... take interrupt copy data from card memory to user memory ack interrupt wake user process user process request complete good: user process makes request sleep user process ... take interrupt ack interrupt wake user process copy data from card memory to user memory user process request complete Not always possible, but the best bet, if the card doesn't support prper DMA, like God intended (most hardware designers are heretics). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D4EC913.528452C2>