Date: Sat, 4 Dec 1999 22:46:01 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Wes Peters <wes@softweyr.com> Cc: Kris Kennaway <kris@hub.freebsd.org>, freebsd-hackers@FreeBSD.ORG Subject: Re: PCI DMA lockups in 3.2 (3.3 maybe?) Message-ID: <199912050646.WAA59445@apollo.backplane.com> References: <Pine.BSF.4.21.9912041823460.94804-100000@hub.freebsd.org> <199912050514.VAA58998@apollo.backplane.com> <3849FD95.F0434263@softweyr.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:> :> He didn't say this until after the situation had started to degrade. :> :> Besides, he's right. 3.x has serious problems. : :All running software has serious problems, that's why it is never considered :done. Taking the time to enumerate specific problems that are currently :plaguing an installation is the only way anyone can possibly hope to help. :Problems reports of "It don't work" are helpful to absolutely noone. This simply isn't true. I have written plenty of software (large projects) that do not have serious problems and, in fact, some do not have any known problems at all. I have written several operating systems and one of them is least as complex as the FreeBSD core (but not as complex if you count drivers) which are bug-free (that is, there have been no recorded crashes and except for feature updates have never been rebooted). FreeBSD can become 'bug free' insofar as it is possible to become bug free. You have to believe that it can happen or it won't. I believe it can -- my personal goal for the project is to make the core bug free and uncrashable (and here I mean only with a network and disk driver, and not all the other drivers out there which would be an impossible task). Since I've actually *written* bug-free and uncrashable OS cores I am confident that it is possible to do with FreeBSD. Many of the issues relating to FreeBSD's instability and the many bugs in the core have nothing to do with continuing development work per-say, but instead has to do with an attitude that allows major pollution to be introduced into the code to optimize very specific cases (which destabilizes the source at the same time), and the lack of proper documention within the source code. It is precisely these two things which I have concentrated on the most - by rewriting where necessary, generalizing optimizations (and ripping quite a few out of the VM system entirely), and documenting the hell out of any procedure I modify with succinct comments. There are two good examples of code pollution and, needless to say, they have been responsible for a huge number of bugs over the years. Hundreds of bugs at least. The first example is all the VM hacking that was done to accomodate partial cache instantiation and, most noteably, partial byte-range writes for NFS. So far this year I have managed to rip about half of those hacks out at relatively little cost (a few esoteric NFS write cases will be slower is all and buffer cache writing is slightly slower due to the extra system process, but hopefully made up by the move to an O(1) algorithm (previously an O(N^2) algorithm). The second example is the VFS layer implemenation and, most especially, VOP_LOOKUP(). VOP_LOOKUP() has caused no end of trouble but the VFS layer implementation with all of its locking assumptions and return requirements has made filesystem design problematic at best. There is enormous complexity in the lookup, directory scanning, VFS cache code that hides bugs and that could be removed with a rewrite. In general, it is possible to fix these problems but some of those fixes require significant rewriting. You have to be willing to rewrite and take your lumps up front or you may be faced with a situation where new problems are found with a subsystem for years to come. The best example of this in my case is the getnewbuf() code. The code was originally optimized with so many 'hacks' that it created at least half a dozen serious bugs in the system. When I first rewrote it I encountered a huge amount of resistance from certain people who believed (wrongly) that rewriting would create more bugs then it fixed. While a few bugs were introduced (that's the 'taking your lumps part), the generalization of the code made finding and fixing them much, much easier and this will ultimately lead to a better track record down the road. I applaud the removal of dead code that has been going on, though I have major problems with the way some of it has been gone about. Compared to what some committers have been doing recently, the dead code removal that Alan and I had done to the VM system earlier in the year was a walk in the part. I am dead set against 'hiding' bugs by trying to cache around them instead of fixing them, which is essentially the category in which I put most of the recent changes to procfs and /bin/ps. It may seem counter-productive, but in order to fix bugs and make the system stable we actually need to cause the bugs to come to light more quickly and in a manner that is so blazingly obvious that we can fix them more quickly. Hence the reason for putting KASSERT()'s all throughout the VM system (which led to the discovery that VM pages were being put on the cache queue while still dirty and led to a fix for a serious filesystem corruption bug, amoung other things). When I did that some people screamed at me because they thought it would make the system unstable, but how many panics have we ever seen from it? I am happy to see other people start to do the same thing. So, I think it *IS* possible to make FreeBSD sufficiently bug-free that people become 'surprised' when they are able to crash a box running it. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912050646.WAA59445>