From owner-freebsd-hackers Fri Jun 4 1:26:53 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from storm.FreeBSD.org.uk (storm.freebsd.org.uk [194.242.128.198]) by hub.freebsd.org (Postfix) with ESMTP id 66EC9153B2 for ; Fri, 4 Jun 1999 01:26:45 -0700 (PDT) (envelope-from brian@keep.lan.Awfulhak.org) Received: from keep.lan.Awfulhak.org (keep.lan.Awfulhak.org [172.16.0.8]) by storm.FreeBSD.org.uk (8.9.3/8.9.3) with ESMTP id JAA23529; Fri, 4 Jun 1999 09:26:11 +0100 (BST) (envelope-from brian@keep.lan.Awfulhak.org) Received: from keep.lan.Awfulhak.org (localhost [127.0.0.1]) by keep.lan.Awfulhak.org (8.9.3/8.9.3) with ESMTP id JAA00510; Fri, 4 Jun 1999 09:13:30 +0100 (BST) (envelope-from brian@keep.lan.Awfulhak.org) Message-Id: <199906040813.JAA00510@keep.lan.Awfulhak.org> X-Mailer: exmh version 2.0.2 2/24/98 To: Matthew Dillon Cc: Brian Somers , dyson@iquest.net, ahasty@mindspring.com (Amancio Hasty), crossd@cs.rpi.edu (David E. Cross), freebsd-hackers@FreeBSD.ORG, schimken@cs.rpi.edu Subject: Re: 3.2-stable, panic #12 In-reply-to: Your message of "Thu, 03 Jun 1999 19:23:53 PDT." <199906040223.TAA01897@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 04 Jun 1999 09:13:30 +0100 From: Brian Somers Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :> It wasn't the "dark side" of core, it was the panic'ed and worried > :> part of core that was seeing things happening without careful review. > : > :The system was becoming unstable due to Matts changes. Whether the > :instabilities were in Matts code or somewhere else is irrelevent. > :The reaction was (IMHO) the right thing to do. > : > :-- > :Brian > > This is silly. The system was not becoming unstable due to my commits. > Where'd you get that from? > > I might have occassionally glitched something for a day or two. I think > I broke build world once, blew write efficiency temporarily one time, > and of the asserts added I recall only one or two panics that were due > to an incorrect assert, and serveral that were due to unrelated bugs > uncovered by the assert ( i.e. the assert did its job ). There might have > been minor problems with 'pstat' output after the VM switchover due to > the new swapper. We broke madvise() once or twice but that doesn't > count because it was already badly broken before we tried to fix it, and > we found several bugs in the course of trying to fix it. Not much else. > Considering the amount of code committed I'd say that's a pretty good > record. Most people blow things up committing just a few lines of code. Sure, and I'm no exception. I'm actually referring to my personal environment. I have a ``last release'' box used for things link 'net connectivity, storing company accounts etc etc. As it happens, this box is an nfs server, providing filesystems for distfiles, home directories and that sort of thing. 3.0-release and 3.1-release could not do the NFS side of things. I now have a -stable machine (I may go back to 3.2-release) because it was the only way to defeat the problem. I'm sure that the fact that -release ended up with such obvious instabilities was out of your control (IMHO RELENG_3 shouldn't have been dubbed -stable 'till 3.2 was tagged), but I'd bet that this did FreeBSDs public image some serious damage. I've had to say to many people since ``wait for 3.2 - don't install 2.* because the upgrade is to difficult and don't install 3.1 if you want nfs''. [.....] > This is what I mean about rumors turning into supposed facts. The fact > of the matter is that what mistakes I made ( and I would argue that a > certain number of mistakes are unavoidable ) were all minor. Point to > one major mistake! The only alternative would have been to not touch > the code at all, meaning that nothing would have gotten fixed or made > more maintainable. A lot of the rewrites that supposedly contained no > meaningful fixes actually do: They make the code more readable in > preparation for future changes coming down the line. There is a point > where emplacing a hack on top of a hack on top of a hack leads to > diminishing returns and rewrite is necessary to reset the clock. I > would argue that a good chunk of the code I rewrote ( which itself is only > a portion of the commits made ) fall into that category. > > The biggest mistake that programmers working on a large project make is > when they do *not* rewrite portions of the code that need to be rewritten. > For a case in point you need look no further then the buffer cache and > device I/O code. It's so messed up that even I could only add hacks to > portions of it to implement necessary VM pager functions properly, but > I sure do not intend those hacks to remain in there forever! The I/O > subsystem is a holy mess. The only reason I'm not working on it right now > is because I think Poul is intending to work on it later in the year. I buy into this argument whole-heartedly, but I also agree with what John D reckons about understanding the code and doing the requisite amount and type of testing. I'm a case-in-point with the ppp stuff. I can't say that I've actually understood *all* the code 'till about 2 months ago, and even then, my layering commit of about a month ago introduced about 7 or 8 distinct bugs (although I can't claim that they weren't my bugs). This wasn't even due to a lack of knowledge but was due to the wrong sort of testing. But, the end result is that I can now bring ppp into the kernel in a way that supports Multi-link, does demand-dialing properly (unlike the pppd hack) and does synchronous stuff properly (fairwell sppp). Same thing, different scale, and of course people are a lot more forgiving because none of it has been MFCd. > -Matt > Matthew Dillon > -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message