Date: Tue, 7 Dec 1999 23:10:39 +0100 (MET) From: Gerard Roudier <groudier@club-internet.fr> To: Peter Wemm <peter@netplex.com.au> Cc: Ed Hall <edhall@screech.weirdnoise.com>, Matthew Dillon <dillon@apollo.backplane.com>, "Jonathan M. Bresler" <jmb@hub.freebsd.org>, kris@hub.freebsd.org, freebsd-hackers@FreeBSD.ORG Subject: Re: PCI DMA lockups in 3.2 (3.3 maybe?) Message-ID: <Pine.LNX.3.95.991207224951.943A-100000@localhost> In-Reply-To: <19991207120139.869F01CC6@overcee.netplex.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 7 Dec 1999, Peter Wemm wrote: > I might add that others have found that using sym + fxp on the N440BX > motherboards didn't solve their problems, or moved the problem elsewhere, > eg: to the sbdrop() etc routines. One other interesting variable.. an ah= c > + pn driver combination on a 440BX motherboard under -current in late may > 99 had the exact same problems we saw a number of times with ncr + fxp (i= e: > sbdrop, sbflush, m_copym etc panics). The same motherboard with ahc + de= or > fxp did not have the problems. (ncr || sym || ahc) && fxp =3D TRUE makes the fxp a better culprit. :-) If the corruption comes from some DMA from the BUS, then it may well have happened that some chip did grab some stale address or length value and did DMA inside the corresponding area. This may happen, for example if a BUS address is not passed atomically to a device, or for numerous other reasons. Note that for an atomicity problem, the chip could have been the cause by performing non DWORD access or have been victimized by a PCI transaction terminated with data between 2 DWORDS (note that in this latter case, something is wrong regarding alignement in memory). The 'link_addr' handling from the C code looks to me like some candidate such a atomicity problem for example, but since I donnot know of the fxp device this might be just a quite wrong idea from me.=20 > In all cases the panics were extremely "strange". The original fxp+ncr > combination changed it's crash pattern when we put extra debugging in it = to > sanity check and check conditions. The results varied from registers get= ting > clobbered (as though an interrupt happened and the trapframe on the stack= got > changed by the interrupt handler and then returned with garbage contents = in > some registers.. this is what seems to be happening in the fxp_add_rfabuf= () > panics - %esi was getting loaded earlier on and when it got to do the > vtophys() it was zero. Some DMA performed using some stale but valid address (a difference of less than 65536 against a valid address is unlikely to make the address invalid for example) may lead to similar effects, btw, since any register value comes from memory. > People have printed the contents of "rfa" on the stack > and seen garbage - in fact it's a register variable under normal circumst= ances. > Adding debugging caused it to be stored in the local variable rather than > being left in %esi, and then the panics moved elsewhere (!).) >=20 > It had the markings of "something trashing something somewhere and then c= rashing > quite a bit later". :-( G=E9rard. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.95.991207224951.943A-100000>