Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 05 Jun 2008 17:53:01 +0100
From:      Tom Evans <tevans.uk@googlemail.com>
To:        Paul Schmehl <pschmehl_lists_nada@tx.rr.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: challenge: end of life for 6.2 is premature with buggy 6.3
Message-ID:  <1212684781.10665.81.camel@localhost>
In-Reply-To: <CE0D857CF3C54017B29052F0@utd65257.utdallas.edu>
References:  <9B7FE91B-9C2E-4732-866C-930AC6022A40@netconsonance.com> <200806051023.56065.jhb@freebsd.org> <CE0D857CF3C54017B29052F0@utd65257.utdallas.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-dM+FnVMwpwNNYC4l6OtY
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable


On Thu, 2008-06-05 at 11:14 -0500, Paul Schmehl wrote:
> --On Thursday, June 05, 2008 10:23:55 -0400 John Baldwin <jhb@freebsd.org=
>=20
> wrote:
> >
> > FWIW, at Y! 6.3 is more stable than 6.2 (I had a list of about 10 patch=
es for
> > known deadlocks and kernel panics that were errata candidates for 6.2 t=
hat
> > never made it into RELENG_6_2 but all of them are in 6.3).  We also hav=
e many
> > machines with bge(4) and from our perspective 6.3 has less issues with =
bge0
> > devices than 6.2.
> >
>=20
> I'm glad to hear that.  I have a server that uses bce, and it was complet=
ely=20
> non-functional until I hunted down some beta code that made it usable.  I=
'd=20
> like to upgrade, but this is a critical server with no redundancy (and it=
's a=20
> hobby site with no money to pay for expensive support), and I'm not about=
 to=20
> upgrade unless I know for certain the problems won't reoccur, because I h=
ave to=20
> upgrade remotely and pay money if the system goes down.
>=20
> The problems with that driver were bad enough when the server was being=20
> configured in my study.  (The system would lock up, and only a hard reboo=
t=20
> would restore networking.)  It would be hell trying to troubleshoot probl=
ems if=20
> I had to drive the 45 miles to the hosting site and spend a night there t=
rying=20
> to get the server back up, then go to work the next day.
>=20
> # uname -a
> FreeBSD www.stovebolt.com 6.1-RELEASE-p10 FreeBSD 6.1-RELEASE-p10 #2: Mon=
 Oct=20
> 16 15:38:02 CDT 2006     root@www.stovebolt.com:/usr/obj/usr/src/sys/GENE=
RIC=20
> i386
>=20
> # grep bce /var/run/dmesg.boot
> bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B1), v0.9.6> mem=20
> 0xf4000000-0xf5ffffff irq 16 at device 0.0 on pci9
> bce0: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz
> miibus0: <MII bus> on bce0
> bce0: Ethernet address: 00:13:72:fb:2a:ad
> bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B1), v0.9.6> mem=20
> 0xf8000000-0xf9ffffff irq 16 at device 0.0 on pci5
> bce1: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz
> miibus1: <MII bus> on bce1
> bce1: Ethernet address: 00:13:72:fb:2a:ab
>=20
> # grep bce0 /var/log/messages
> May  2 09:10:31 www kernel: bce0: link state changed to DOWN
> May  2 09:10:39 www kernel: bce0: link state changed to UP
> May 25 07:49:49 www kernel: bce0: link state changed to DOWN
> May 25 07:50:31 www kernel: bce0: link state changed to UP
> May 26 21:28:36 www kernel: bce0: link state changed to DOWN
> May 26 21:28:40 www kernel: bce0: link state changed to UP
> May 27 13:13:21 www kernel: bce0: link state changed to DOWN
> May 27 13:13:31 www kernel: bce0: link state changed to UP
>=20
> It's been like that since the server was installed.
>=20
> So, if I upgrade to 6.3 or 7.0, am I still going to experience these prob=
lems?=20
> Is the server going to stop working entirely?  How can I know that for su=
re=20
> before starting an upgrade?
>=20
> Because, I have a 7.0 STABLE workstation (I'm sending this email from it)=
 with=20
> a serious problem with umass, and no fix seems to be forthcoming.  On a=20
> workstation, I can work around problems.  On a critical server, not so mu=
ch.
>=20
> Look, I know this is open source, all volunteer (hell, I'm a port maintai=
ner=20
> myself) and guys' time is extremely valuable (whose isn't?), but it seems=
 to me=20
> there needs to be better communication between the folks who know the cod=
e and=20
> those who only run boxes.  You might be able to read diffs and say, "Aha,=
=20
> they've fixed the problem", but I can't.  I don't know, if I upgrade to 6=
.3, if=20
> the server will stop passing packets or not.  And I can't take the chance=
 that=20
> it will.
>=20
> Saying put up or shut up isn't going to win many friends.  I can't use th=
e=20
> server for testing.  It's a website with 5 to 7 million hits per month.
>=20
> MInd you, I haven't complained about this and I'm not complaining now.  I=
'm=20
> simply saying it would be more productive if folks *listened* to what peo=
ple=20
> say about a particular problem and gave it some thought before firing sal=
vos at=20
> the "complainers" and demanding that they contribute to solving the probl=
em=20
> somehow.
>=20
> --=20
> Paul Schmehl

I think that, especially with open source products, there is a large
emphasis on testing in your own environments, and choosing the 'correct'
version of a particular software package is important. For example, at
$JOB, we had a lot of servers running 6.1 as it was an extended lifetime
release, so no point jumping to 6.2, instead we waited for 6.3 to pass
our integration testing.

We buy usually the same chassis for all our servers, and test
extensively before deploying to a new chassis/OS/anything. This is the
definition of change management, which is expensive, takes lots of time
and planning, and doesn't guarantee zaroo bugs - just a high likelihood
of not hitting them. It also isn't smooth, when we tested 6.1, we found
a multitude of bugs in bce(4), which we worked with net@ and David
Christensen of Broadcom to get fixed (they work lovely now :).

If you don't want to do this sort of work, then yes, things may fail
unexpectedly (sort of unexpectedly, I would consider not doing any
testing and then having things fail as a logical consequence..). It is
usually up to $BOSS to decide if you want to invest the resources (time,
people and money) locally to do change management planning, or outsource
your support to a 3rd party, or ignore the problem (this is a polite way
of saying 'put up or shut up', in my mind.)=20

If you just have <5 servers, the expense of getting duplicates for
testing, change management and release management may be too much to
handle. If you've got >100 servers, you really have no excuses not to do
CM; in my mind, not doing it is reckless.

This isn't a cost of OSS, we also do this for windows updates and
service packs etc.

Tom

--=-dM+FnVMwpwNNYC4l6OtY
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (FreeBSD)

iEYEABECAAYFAkhIGeoACgkQlcRvFfyds/cO2wCffg/GJjNr8wdAxlASYzPqK5mI
gQAAoK76XdWo/mp7d4IhdKV/jeM+2Hjl
=x32C
-----END PGP SIGNATURE-----

--=-dM+FnVMwpwNNYC4l6OtY--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1212684781.10665.81.camel>