Date: Mon, 6 Dec 1999 10:34:35 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Ed Hall <edhall@screech.weirdnoise.com> Cc: "Jonathan M. Bresler" <jmb@hub.freebsd.org>, kris@hub.freebsd.org, freebsd-hackers@FreeBSD.ORG Subject: Re: PCI DMA lockups in 3.2 (3.3 maybe?) Message-ID: <199912061834.KAA71206@apollo.backplane.com> References: <199912051944.LAA17720@screech.weirdnoise.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:You write: :: we can not identify the specific problem from this message. :: without sufficient information to indentify and hopefully reproduce :: the problem, we can not address it. please provide this information :: if it is available to you. if it is not, please provide us contact :: information for the commercial entities experiencing the problem. : :I work at Yahoo. My address there is "edhall@yahoo-inc.com". : :On a recent project I encountered two show-stopping bugs with 3.3-release :that did not exist in 2.2.8-release: : :1) Random crashes in FXP interrupt or low-level IP code. Something is : clobbering the kernel stack--possibly the NCR driver, since using an : Adaptec made the problem stop, as did a backport of the CAM driver : Peter Wemm tried. This was on an N440BX, which is becoming quite : common in server applications. Other installations are apparantly : seeing the same problem on this hardware. : :2) A hard loop in the pagedaemon. This was especially egregious, since : it meant the system had to be rebooted from the console--and since : the application could elicit the problem within a few minutes. : Disabling the use of mmap() for file update in the application : prevented the problem. After spending a day trying to cook up a : test program that elicited the same behavior that the application : did, I gave up for lack of time. But there have been other reports : of late that sound like this problem, mostly in high VM/RAM situations. : :That's two serious bugs that exist in 3.3-release but not in 2.2.8-release. :Looking back through the archives, I can see that I'm not the only one who :has experienced them. I came away from the experience with the feeling that :the FreeBSD project has some serious Q/A problems... and I can assure you, :I'm not alone in this feeling. : : -Ed Well, #2 at least should be fixed in -current. Unfortunately the changes to the VM system were too extensive to backport to 3.x. Or, I should say, that at the time I started working on the VM system core was not interested in allowing me to backport the changes, and then later it was simply too late - too many changes had been made. #1 has come up a couple of times. There was a conversation in October that closely relates to your problem: :From: Joe McGuckin <joe@monk.via.net> :Subject: fxp related kernel panic : :I have a 3.3-stable machine that I use as a news router (running diablo). The :fxp0 interface averages 10-15 Mbps bandwidth continously. : :About once a week the machine crashes & reboots. We enabled the debugger this ti :me :and captured the following debug output: : :Fatal trap 12: page fault while in kernel mode :fault virtual address = 0x382e4641 :fault code = supervisor write, page not present :instruction pointer = 0x8:0xc01a372e :stack pointer = 0x10:0xc02523b0 :frame pointer = 0x10:0xc02523c0 :code segment = base 0x0, limit 0xfffff, type 0x1b : = DPL 0, pres 1, def32 1, gran 1 :processor eflags = interrupt enabled, resume, IOPL = 0 :current process = Idle :interrupt mask = net :kernel: type 12 trap, code=0 :Stopped at fxp_add_rfabuf+0x1de: movw %ax,0x4(%esi) :db> : :%uname -a :FreeBSD feeder.via.net 3.3-STABLE FreeBSD 3.3-STABLE #7: Mon Oct 18 17:14:40 PDT : 1999 lewis@feeder.via.net:/usr/src/sys/compile/DIABLO i386 : :%dmesg :Copyright (c) 1992-1999 FreeBSD Inc. :Copyright (c) 1982, 1986, 1989, 1991, 1993 : The Regents of the University of California. All rights reserved. :FreeBSD 3.3-STABLE #7: Mon Oct 18 17:14:40 PDT 1999 To which DG responded: :From: David Greenman <dg@root.com> :Subject: Re: fxp related kernel panic :To: Joe McGuckin <joe@monk.via.net> :Cc: hackers@FreeBSD.ORG, lewis@lppi.com :Date: Tue, 26 Oct 1999 11:43:02 -0700 : : : Let me guess...your system has an Intel N440BX motherboard, right? If so, :then it's a known problem with no solution yet. : :-DG : :David Greenman :Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org :Creator of high-performance Internet servers - http://www.terasolutions.com :Pave the road of life with opportunities. And he also said: :From: David Greenman <dg@root.com> :Subject: Re: fxp related kernel panic :To: Lew Payne <lew@lppi.com> :Cc: hackers@FreeBSD.ORG, Joe McGuckin <joe@monk.via.net> :Date: Tue, 26 Oct 1999 13:19:45 -0700 : : :>Hi David -- What if I install a *real* EtherExpress Pro-100B (or :>whatever it's known as today) in the PCI slot, and use it instead :>of the on-board (N440BX motherboard) fxp0 interface? :> :>Judging that you probably know the nature of the problem, do you :>think this might circumvent it? : : I think it is caused by the NCR/Symbios controller. It might be a side :effect of the NCR just using up a lot of PCI bandwidth, with the real bug :being in the fxp driver (although I've looked and haven't found one). So :I don't think putting in a real Pro/100 will have any effect on the problem. :Of course I don't really know what is causing it, so just about anything :is possible. : :-DG : :David Greenman And that, I'm afraid is where it has been left. Nobody is sure where the problem is. I suspect that it may be a DMA synchronization problem with either the NCR or the FXP driver, or perhaps heavy PCI bandwidth useage is generating a FIFO overrun error during the FXP DMA that the driver is not handling properly. I just don't know. The only current solution is to use an adaptec controller. I have personally had *extremely* good luck with adaptec's, 2940UW, 7896 (or 97) U2W (on-motherboard), and 7890 (or 91) U2W (PCI card). I think part of the reason the problem has not been fixed is that many of the hardcore developers are using Adaptec controllers rather then NCR controllers and simply cannot reproduce it. -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912061834.KAA71206>