Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 04 Jun 1999 09:13:30 +0100
From:      Brian Somers <brian@Awfulhak.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Brian Somers <brian@Awfulhak.org>, dyson@iquest.net, ahasty@mindspring.com (Amancio Hasty), crossd@cs.rpi.edu (David E. Cross), freebsd-hackers@FreeBSD.ORG, schimken@cs.rpi.edu
Subject:   Re: 3.2-stable, panic #12 
Message-ID:  <199906040813.JAA00510@keep.lan.Awfulhak.org>
In-Reply-To: Your message of "Thu, 03 Jun 1999 19:23:53 PDT." <199906040223.TAA01897@apollo.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
> :> It wasn't the "dark side" of core, it was the panic'ed and worried
> :> part of core that was seeing things happening without careful review.
> :
> :The system was becoming unstable due to Matts changes.  Whether the 
> :instabilities were in Matts code or somewhere else is irrelevent.  
> :The reaction was (IMHO) the right thing to do.
> :
> :-- 
> :Brian <brian@Awfulhak.org>                        <brian@FreeBSD.org>
> 
>     This is silly.  The system was not becoming unstable due to my commits.
>     Where'd you get that from? 
> 
>     I might have occassionally glitched something for a day or two.  I think
>     I broke build world once, blew write efficiency temporarily one time,
>     and of the asserts added I recall only one or two panics that were due
>     to an incorrect assert, and serveral that were due to unrelated bugs 
>     uncovered by the assert ( i.e. the assert did its job ).  There might have
>     been minor problems with 'pstat' output after the VM switchover due to 
>     the new swapper.  We broke madvise() once or twice but that doesn't 
>     count because it was already badly broken before we tried to fix it, and
>     we found several bugs in the course of trying to fix it.  Not much else. 
>     Considering the amount of code committed I'd say that's a pretty good
>     record.  Most people blow things up committing just a few lines of code.

Sure, and I'm no exception.

I'm actually referring to my personal environment.  I have a ``last 
release'' box used for things link 'net connectivity, storing company 
accounts etc etc.  As it happens, this box is an nfs server, 
providing filesystems for distfiles, home directories and that sort 
of thing.

3.0-release and 3.1-release could not do the NFS side of things.  I 
now have a -stable machine (I may go back to 3.2-release) because it 
was the only way to defeat the problem.

I'm sure that the fact that -release ended up with such obvious 
instabilities was out of your control (IMHO RELENG_3 shouldn't have 
been dubbed -stable 'till 3.2 was tagged), but I'd bet that this did 
FreeBSDs public image some serious damage.  I've had to say to many 
people since ``wait for 3.2 - don't install 2.* because the upgrade 
is to difficult and don't install 3.1 if you want nfs''.

[.....]
>     This is what I mean about rumors turning into supposed facts.  The fact
>     of the matter is that what mistakes I made ( and I would argue that a
>     certain number of mistakes are unavoidable ) were all minor.  Point to
>     one major mistake!  The only alternative would have been to not touch 
>     the code at all, meaning that nothing would have gotten fixed or made 
>     more maintainable.  A lot of the rewrites that supposedly contained no
>     meaningful fixes actually do:  They make the code more readable in 
>     preparation for future changes coming down the line.  There is a point 
>     where emplacing a hack on top of a hack on top of a hack leads to 
>     diminishing returns and rewrite is necessary to reset the clock.  I
>     would argue that a good chunk of the code I rewrote ( which itself is only
>     a portion of the commits made ) fall into that category.
> 
>     The biggest mistake that programmers working on a large project make is
>     when they do *not* rewrite portions of the code that need to be rewritten.
>     For a case in point you need look no further then the buffer cache and
>     device I/O code.  It's so messed up that even I could only add hacks to
>     portions of it to implement necessary VM pager functions properly, but
>     I sure do not intend those hacks to remain in there forever!  The I/O
>     subsystem is a holy mess.  The only reason I'm not working on it right now
>     is because I think Poul is intending to work on it later in the year.

I buy into this argument whole-heartedly, but I also agree with what 
John D reckons about understanding the code and doing the requisite 
amount and type of testing.  I'm a case-in-point with the ppp stuff.
I can't say that I've actually understood *all* the code 'till about 
2 months ago, and even then, my layering commit of about a month ago 
introduced about 7 or 8 distinct bugs (although I can't claim that 
they weren't my bugs).

This wasn't even due to a lack of knowledge but was due to the wrong 
sort of testing.

But, the end result is that I can now bring ppp into the kernel in a 
way that supports Multi-link, does demand-dialing properly (unlike 
the pppd hack) and does synchronous stuff properly (fairwell sppp).

Same thing, different scale, and of course people are a lot more 
forgiving because none of it has been MFCd.

> 					-Matt
> 					Matthew Dillon 
> 					<dillon@backplane.com>

-- 
Brian <brian@Awfulhak.org>                        <brian@FreeBSD.org>
      <http://www.Awfulhak.org>;                   <brian@OpenBSD.org>
Don't _EVER_ lose your sense of humour !          <brian@uk.FreeBSD.org>




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906040813.JAA00510>