From owner-freebsd-stable@FreeBSD.ORG Thu Jun 5 16:53:08 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C228E1065673 for ; Thu, 5 Jun 2008 16:53:08 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from ik-out-1112.google.com (ik-out-1112.google.com [66.249.90.183]) by mx1.freebsd.org (Postfix) with ESMTP id A42C28FC0A for ; Thu, 5 Jun 2008 16:53:07 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: by ik-out-1112.google.com with SMTP id c30so417304ika.3 for ; Thu, 05 Jun 2008 09:53:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer; bh=5dIRwiPPxxsiQx+E2HGe3YhSR819ivLx3VLtN+O5xIY=; b=jT7LSBdXotE6J0l+renslvtHME7QB74c9XGfKFJdThz4U+qMMi3FyB1GVcLQyfgarO fxEmeGnBFX2lh4wMG7c+/WDdj04rIHgQ1O+161JAvGL912/svH1WBA04HzjeIALeD5gT TV5h3BFdRExmpbGDjEhpI/9fErTMZkCirkd0E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer; b=rglmfe72OBgcVCMrTNCJj1zf8M6Uai4DbVh/UM5UQxqTuAnTE2jM7t8y728NinPyVX 0cansyq/s10pR49SJrUQBvi0ZueNXbwtQnRrVklqAKZ31dOwAZ5Ya01SFgwVnfyWfEeL wQfZ//Z8Wm9jlqHsjZZ4L0LgL5XgZ456vfeUQ= Received: by 10.210.13.17 with SMTP id 17mr1184293ebm.94.1212684785575; Thu, 05 Jun 2008 09:53:05 -0700 (PDT) Received: from ?127.0.0.1? ( [217.206.187.80]) by mx.google.com with ESMTPS id g11sm5818935gve.8.2008.06.05.09.53.03 (version=SSLv3 cipher=RC4-MD5); Thu, 05 Jun 2008 09:53:04 -0700 (PDT) From: Tom Evans To: Paul Schmehl In-Reply-To: References: <9B7FE91B-9C2E-4732-866C-930AC6022A40@netconsonance.com> <200806051023.56065.jhb@freebsd.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-dM+FnVMwpwNNYC4l6OtY" Date: Thu, 05 Jun 2008 17:53:01 +0100 Message-Id: <1212684781.10665.81.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.12.3 FreeBSD GNOME Team Port Cc: freebsd-stable@freebsd.org Subject: Re: challenge: end of life for 6.2 is premature with buggy 6.3 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jun 2008 16:53:08 -0000 --=-dM+FnVMwpwNNYC4l6OtY Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Thu, 2008-06-05 at 11:14 -0500, Paul Schmehl wrote: > --On Thursday, June 05, 2008 10:23:55 -0400 John Baldwin =20 > wrote: > > > > FWIW, at Y! 6.3 is more stable than 6.2 (I had a list of about 10 patch= es for > > known deadlocks and kernel panics that were errata candidates for 6.2 t= hat > > never made it into RELENG_6_2 but all of them are in 6.3). We also hav= e many > > machines with bge(4) and from our perspective 6.3 has less issues with = bge0 > > devices than 6.2. > > >=20 > I'm glad to hear that. I have a server that uses bce, and it was complet= ely=20 > non-functional until I hunted down some beta code that made it usable. I= 'd=20 > like to upgrade, but this is a critical server with no redundancy (and it= 's a=20 > hobby site with no money to pay for expensive support), and I'm not about= to=20 > upgrade unless I know for certain the problems won't reoccur, because I h= ave to=20 > upgrade remotely and pay money if the system goes down. >=20 > The problems with that driver were bad enough when the server was being=20 > configured in my study. (The system would lock up, and only a hard reboo= t=20 > would restore networking.) It would be hell trying to troubleshoot probl= ems if=20 > I had to drive the 45 miles to the hosting site and spend a night there t= rying=20 > to get the server back up, then go to work the next day. >=20 > # uname -a > FreeBSD www.stovebolt.com 6.1-RELEASE-p10 FreeBSD 6.1-RELEASE-p10 #2: Mon= Oct=20 > 16 15:38:02 CDT 2006 root@www.stovebolt.com:/usr/obj/usr/src/sys/GENE= RIC=20 > i386 >=20 > # grep bce /var/run/dmesg.boot > bce0: mem=20 > 0xf4000000-0xf5ffffff irq 16 at device 0.0 on pci9 > bce0: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz > miibus0: on bce0 > bce0: Ethernet address: 00:13:72:fb:2a:ad > bce1: mem=20 > 0xf8000000-0xf9ffffff irq 16 at device 0.0 on pci5 > bce1: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz > miibus1: on bce1 > bce1: Ethernet address: 00:13:72:fb:2a:ab >=20 > # grep bce0 /var/log/messages > May 2 09:10:31 www kernel: bce0: link state changed to DOWN > May 2 09:10:39 www kernel: bce0: link state changed to UP > May 25 07:49:49 www kernel: bce0: link state changed to DOWN > May 25 07:50:31 www kernel: bce0: link state changed to UP > May 26 21:28:36 www kernel: bce0: link state changed to DOWN > May 26 21:28:40 www kernel: bce0: link state changed to UP > May 27 13:13:21 www kernel: bce0: link state changed to DOWN > May 27 13:13:31 www kernel: bce0: link state changed to UP >=20 > It's been like that since the server was installed. >=20 > So, if I upgrade to 6.3 or 7.0, am I still going to experience these prob= lems?=20 > Is the server going to stop working entirely? How can I know that for su= re=20 > before starting an upgrade? >=20 > Because, I have a 7.0 STABLE workstation (I'm sending this email from it)= with=20 > a serious problem with umass, and no fix seems to be forthcoming. On a=20 > workstation, I can work around problems. On a critical server, not so mu= ch. >=20 > Look, I know this is open source, all volunteer (hell, I'm a port maintai= ner=20 > myself) and guys' time is extremely valuable (whose isn't?), but it seems= to me=20 > there needs to be better communication between the folks who know the cod= e and=20 > those who only run boxes. You might be able to read diffs and say, "Aha,= =20 > they've fixed the problem", but I can't. I don't know, if I upgrade to 6= .3, if=20 > the server will stop passing packets or not. And I can't take the chance= that=20 > it will. >=20 > Saying put up or shut up isn't going to win many friends. I can't use th= e=20 > server for testing. It's a website with 5 to 7 million hits per month. >=20 > MInd you, I haven't complained about this and I'm not complaining now. I= 'm=20 > simply saying it would be more productive if folks *listened* to what peo= ple=20 > say about a particular problem and gave it some thought before firing sal= vos at=20 > the "complainers" and demanding that they contribute to solving the probl= em=20 > somehow. >=20 > --=20 > Paul Schmehl I think that, especially with open source products, there is a large emphasis on testing in your own environments, and choosing the 'correct' version of a particular software package is important. For example, at $JOB, we had a lot of servers running 6.1 as it was an extended lifetime release, so no point jumping to 6.2, instead we waited for 6.3 to pass our integration testing. We buy usually the same chassis for all our servers, and test extensively before deploying to a new chassis/OS/anything. This is the definition of change management, which is expensive, takes lots of time and planning, and doesn't guarantee zaroo bugs - just a high likelihood of not hitting them. It also isn't smooth, when we tested 6.1, we found a multitude of bugs in bce(4), which we worked with net@ and David Christensen of Broadcom to get fixed (they work lovely now :). If you don't want to do this sort of work, then yes, things may fail unexpectedly (sort of unexpectedly, I would consider not doing any testing and then having things fail as a logical consequence..). It is usually up to $BOSS to decide if you want to invest the resources (time, people and money) locally to do change management planning, or outsource your support to a 3rd party, or ignore the problem (this is a polite way of saying 'put up or shut up', in my mind.)=20 If you just have <5 servers, the expense of getting duplicates for testing, change management and release management may be too much to handle. If you've got >100 servers, you really have no excuses not to do CM; in my mind, not doing it is reckless. This isn't a cost of OSS, we also do this for windows updates and service packs etc. Tom --=-dM+FnVMwpwNNYC4l6OtY Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (FreeBSD) iEYEABECAAYFAkhIGeoACgkQlcRvFfyds/cO2wCffg/GJjNr8wdAxlASYzPqK5mI gQAAoK76XdWo/mp7d4IhdKV/jeM+2Hjl =x32C -----END PGP SIGNATURE----- --=-dM+FnVMwpwNNYC4l6OtY--