Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Dec 1999 17:10:20 -0800
From:      Mike Smith <msmith@freebsd.org>
To:        Ed Hall <edhall@screech.weirdnoise.com>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: PCI DMA lockups in 3.2 (3.3 maybe?) 
Message-ID:  <199912060110.RAA09520@mass.cdrom.com>
In-Reply-To: Your message of "Sun, 05 Dec 1999 11:44:57 PST." <199912051944.LAA17720@screech.weirdnoise.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
> On a recent project I encountered two show-stopping bugs with 3.3-release
> that did not exist in 2.2.8-release:
> 
> 1) Random crashes in FXP interrupt or low-level IP code.  Something is
>    clobbering the kernel stack--possibly the NCR driver, since using an
>    Adaptec made the problem stop, as did a backport of the CAM driver
>    Peter Wemm tried.  This was on an N440BX, which is becoming quite
>    common in server applications.  Other installations are apparantly
>    seeing the same problem on this hardware.

So far the problem appears to require a combination of the 440BX chipset, 
an Intel EtherExpress and the 'fxp' driver, and an NCR/Symbios/LSI SCSI 
adapter and either the 'ncr' or 'sym' driver.  We've tried on a number of 
occasions to diagnose this problem, but there have been many issues that 
have prevented it's resolution.  These have included lack of interest on 
the driver developers' parts, lack of access to or cooperation from 
people complaining of the bug, and an inability to reproduce it in a 
useful fashion.  It's been an eye-opening exercise and we're trying to 
learn what we can from it, as well as actually fix it for good.

> 2) A hard loop in the pagedaemon.  This was especially egregious, since
>    it meant the system had to be rebooted from the console--and since
>    the application could elicit the problem within a few minutes.
>    Disabling the use of mmap() for file update in the application
>    prevented the problem.  After spending a day trying to cook up a
>    test program that elicited the same behavior that the application
>    did, I gave up for lack of time.  But there have been other reports
>    of late that sound like this problem, mostly in high VM/RAM situations.
> 
> That's two serious bugs that exist in 3.3-release but not in 2.2.8-release.
> Looking back through the archives, I can see that I'm not the only one who
> has experienced them.  I came away from the experience with the feeling that
> the FreeBSD project has some serious Q/A problems... and I can assure you,
> I'm not alone in this feeling.

Neither are we.  But, since FreeBSD is a volunteer-developed project, and 
since you admit above that you have contributed to the lack of QA, I'm 
not entirely sure what your point is.  We need this feedback in a timely 
fashion in order to do something with it.  3 months after a release is 
not "timely" by any stretch of the imagination, and without that sort of 
assistance, I have no idea what you think we can do to improve the 
situation.

Yes, we want to improve our QA.  But when customers come up months after 
the fact and complain about something that we could never possibly have 
either known or even guessed about during the development process, the 
best we can do is try to fix the problem then and there.  If you want to 
improve that situation, you can; in your position you have plenty of 
opportunities to make a major contribution to the overall quality of 
FreeBSD releases.  OTOH, if you choose not to do so, it's mere honesty to 
observe that you need to take a share of the blame for the current 
situation.

ps: The N440BX is actually being phased out, however there are very large 
    numbers of them still in production, yes.
-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  msmith@freebsd.org
\\ and he'll hate you for a lifetime.             \\  msmith@cdrom.com




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912060110.RAA09520>