From owner-freebsd-hackers Mon Dec 6 14:42: 2 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from pinhead.parag.codegen.com (207-44-235-154.CodeGen.COM [207.44.235.154]) by hub.freebsd.org (Postfix) with ESMTP id 740E314C30; Mon, 6 Dec 1999 14:41:58 -0800 (PST) (envelope-from parag@pinhead.parag.codegen.com) Received: from pinhead.parag.codegen.com (parag@localhost.parag.codegen.com [127.0.0.1]) by pinhead.parag.codegen.com (8.9.3/8.9.3) with ESMTP id OAA06272; Mon, 6 Dec 1999 14:41:57 -0800 (PST) (envelope-from parag@pinhead.parag.codegen.com) To: Mike Smith Cc: Gerard Roudier , Ed Hall , freebsd-hackers@FreeBSD.ORG Subject: Re: PCI DMA lockups in 3.2 (3.3 maybe?) In-Reply-To: Message from Mike Smith of "Mon, 06 Dec 1999 13:28:40 PST." <199912062128.NAA01671@mass.cdrom.com> X-Image-URL: http://www.codegen.com/images/CG-logo-only.gif X-URL: http://www.codegen.com X-Face: =O'Kj74icvU|oS*<7gS/8'\Pbpm}okVj*@UC!IgkmZQAO!W[|iBiMs*|)n*`X ]pW%m>Oz_mK^Gdazsr.Z0/JsFS1uF8gBVIoChGwOy{EK=<6g?aHE`[\S]C]T0Wm Date: Mon, 06 Dec 1999 14:41:57 -0800 Message-ID: <6268.944520117@pinhead.parag.codegen.com> From: Parag Patel Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Regarding the PCI DMA problems and corruption, it reminds of me of a similar PCI and DMA-related problem we had when porting OpenBSD to a now-defunct NKK MIPS chipset. It may not be related, but here it is. The port was up and running but under heavy load, say a compile, apps (specifically one of the compiler passes) that were running would start dying with seg-faults and then locking up the system. We finally had to get out the logic-analyzer and MIPS probe, and even then we still couldn't watch everything due to the MIPS on-chip cache. The support chipset was locking up. This chip had to handle memory access from the MIPS CPU, handle DRAM directly, and handle DMA access from the PCI bus. It bridged all three (CPU, RAM, PCI) and seemed to us to be hosing itself in some funky meta-stable condition. Heavy simultaneous memory access, typically PCI DMA bursts from different devices, usually triggered the lockup. So it's quite possible that the host-to-PCI-to-memory controller chipset may be the real culprit and not the drivers or specific PCI devices. In the proecss, we discovered a very interesting thing about the NCR/Symbios chips, at least the 810 and 825 series. Turns out that when they are executing their scripts, and the scripts access an on-board PCI register, that access actually negotiates for the PCI bus and uses it to read the register! That's right - it uses the PCI bus to talk to itself - even when it's not DMA-ing anything! Freaked us out when we saw it, 'cause the CPU wasn't anywhere near any code that was accessing the NCR's registers. Of course it slows down script execution but could slow down the PCI bus depending on the script. And this is all without the CPU being involved. Certainly it'll cause more PCI-bus activity that most other chips, and perhaps this is why NCR controllers tend to trigger the DMA condition. It seems that whoever designed the NCR's script-engine glommed it onto the original programmed I/O SCSI core using the PCI bus instead of redesigning the chip. Cheap short-cut. Dunno if any other NCR chips exhibit this behavior, but I wouldn't be surprised. -- Parag Patel To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message