Date: Sun, 05 Dec 1999 17:10:20 -0800 From: Mike Smith <msmith@freebsd.org> To: Ed Hall <edhall@screech.weirdnoise.com> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: PCI DMA lockups in 3.2 (3.3 maybe?) Message-ID: <199912060110.RAA09520@mass.cdrom.com> In-Reply-To: Your message of "Sun, 05 Dec 1999 11:44:57 PST." <199912051944.LAA17720@screech.weirdnoise.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On a recent project I encountered two show-stopping bugs with 3.3-release > that did not exist in 2.2.8-release: > > 1) Random crashes in FXP interrupt or low-level IP code. Something is > clobbering the kernel stack--possibly the NCR driver, since using an > Adaptec made the problem stop, as did a backport of the CAM driver > Peter Wemm tried. This was on an N440BX, which is becoming quite > common in server applications. Other installations are apparantly > seeing the same problem on this hardware. So far the problem appears to require a combination of the 440BX chipset, an Intel EtherExpress and the 'fxp' driver, and an NCR/Symbios/LSI SCSI adapter and either the 'ncr' or 'sym' driver. We've tried on a number of occasions to diagnose this problem, but there have been many issues that have prevented it's resolution. These have included lack of interest on the driver developers' parts, lack of access to or cooperation from people complaining of the bug, and an inability to reproduce it in a useful fashion. It's been an eye-opening exercise and we're trying to learn what we can from it, as well as actually fix it for good. > 2) A hard loop in the pagedaemon. This was especially egregious, since > it meant the system had to be rebooted from the console--and since > the application could elicit the problem within a few minutes. > Disabling the use of mmap() for file update in the application > prevented the problem. After spending a day trying to cook up a > test program that elicited the same behavior that the application > did, I gave up for lack of time. But there have been other reports > of late that sound like this problem, mostly in high VM/RAM situations. > > That's two serious bugs that exist in 3.3-release but not in 2.2.8-release. > Looking back through the archives, I can see that I'm not the only one who > has experienced them. I came away from the experience with the feeling that > the FreeBSD project has some serious Q/A problems... and I can assure you, > I'm not alone in this feeling. Neither are we. But, since FreeBSD is a volunteer-developed project, and since you admit above that you have contributed to the lack of QA, I'm not entirely sure what your point is. We need this feedback in a timely fashion in order to do something with it. 3 months after a release is not "timely" by any stretch of the imagination, and without that sort of assistance, I have no idea what you think we can do to improve the situation. Yes, we want to improve our QA. But when customers come up months after the fact and complain about something that we could never possibly have either known or even guessed about during the development process, the best we can do is try to fix the problem then and there. If you want to improve that situation, you can; in your position you have plenty of opportunities to make a major contribution to the overall quality of FreeBSD releases. OTOH, if you choose not to do so, it's mere honesty to observe that you need to take a share of the blame for the current situation. ps: The N440BX is actually being phased out, however there are very large numbers of them still in production, yes. -- \\ Give a man a fish, and you feed him for a day. \\ Mike Smith \\ Tell him he should learn how to fish himself, \\ msmith@freebsd.org \\ and he'll hate you for a lifetime. \\ msmith@cdrom.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912060110.RAA09520>