From owner-freebsd-hackers@FreeBSD.ORG Wed Jan 16 19:05:28 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 764EA63E; Wed, 16 Jan 2013 19:05:28 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 436F236B; Wed, 16 Jan 2013 19:05:27 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.15]) by ltcfislmsgpa01.fnfis.com (8.14.5/8.14.5) with ESMTP id r0GJ4xhS018207 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 16 Jan 2013 13:04:59 -0600 Received: from [10.0.0.102] (10.14.152.24) by smtp.fisglobal.com (10.132.206.15) with Microsoft SMTP Server (TLS) id 14.2.309.2; Wed, 16 Jan 2013 13:04:59 -0600 Subject: Re: kgzip(1) is broken MIME-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset="windows-1252" From: Devin Teske In-Reply-To: Date: Wed, 16 Jan 2013 11:04:54 -0800 Content-Transfer-Encoding: quoted-printable Message-ID: <23AAEBCB-6438-42EB-9B2E-E657CFC3BA1B@fisglobal.com> References: <09b701cdf367$12737530$375a5f90$@freebsd.org> <1358291098.32417.134.camel@revolution.hippie.lan> <0a0001cdf375$60ddbc40$229934c0$@freebsd.org> <0a2301cdf37d$ebe705a0$c3b510e0$@fisglobal.com> <1358296967.32417.137.camel@revolution.hippie.lan> <0a4601cdf384$4ff98e40$efecaac0$@freebsd.org> To: Steven Hartland X-Mailer: Apple Mail (2.1283) X-Originating-IP: [10.14.152.24] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.9.8327, 1.0.431, 0.0.0000 definitions=2013-01-16_06:2013-01-16,2013-01-16,1970-01-01 signatures=0 Cc: 'Ian Lepore' , dteske@freebsd.org, freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Devin Teske List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jan 2013 19:05:28 -0000 On Jan 15, 2013, at 5:07 PM, Steven Hartland wrote: >=20 > ----- Original Message ----- From: > To: "'Ian Lepore'" > Cc: ; > Sent: Wednesday, January 16, 2013 12:56 AM > Subject: RE: kgzip(1) is broken >=20 >=20 >>> -----Original Message----- >>> From: Ian Lepore [mailto:freebsd@damnhippie.dyndns.org] >>> Sent: Tuesday, January 15, 2013 4:43 PM >>> To: Devin Teske >>> Cc: dteske@freebsd.org; freebsd-hackers@freebsd.org >>> Subject: RE: kgzip(1) is broken >>> On Tue, 2013-01-15 at 16:10 -0800, Devin Teske wrote: >>>>=20 >>>>> -----Original Message----- >>>>> From: Devin Teske [mailto:devin.teske@fisglobal.com] On Behalf Of >>>>> dteske@freebsd.org >>>>> Sent: Tuesday, January 15, 2013 3:10 PM >>>>> To: 'Ian Lepore' >>>>> Cc: freebsd-hackers@freebsd.org; dteske@freebsd.org >>>>> Subject: RE: kgzip(1) is broken >>>>>=20 >>>>>=20 >>>>>=20 >>>>>> -----Original Message----- >>>>>> From: Ian Lepore [mailto:freebsd@damnhippie.dyndns.org] >>>>>> Sent: Tuesday, January 15, 2013 3:05 PM >>>>>> To: dteske@freebsd.org >>>>>> Cc: freebsd-hackers@freebsd.org >>>>>> Subject: Re: kgzip(1) is broken >>>>>>=20 >>>>>> On Tue, 2013-01-15 at 13:27 -0800, dteske@freebsd.org wrote: >>>>>>> Hello, >>>>>>>=20 >>>>>>> I have been sad of-late because kgzip(1) no longer produces a usable >>>> kernel. >>>>>>>=20 >>>>>>> All versions of 9.x suffer this. >>>>>>>=20 >>>>>>> And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this >>> recently >>>>>> broke in >>>>>>> the 8.x series. >>>>>>>=20 >>>>>>> I haven't tried the 7 series lately, but if whatever is making the >> rounds >>>>> gets >>>>>>> MFC'd that far back, I expect the problem to percolate there too. >>>>>>>=20 >>>>>>> The symptom is that the machine reboots immediately and unexpectedly >>> the >>>>>> moment >>>>>>> the kernel is executed by the loader. >>>>>>>=20 >>>>>>> This is quite troubling and I am looking for someone to help find t= he >>>>> culprit. I >>>>>>> don't know where to start looking. >>>>>>=20 >>>>>> Here are some possible candidates from the things that were MFC'd to= 8 >>>>>> in that timeframe. I haven't looked at what these do, they're just >>>>>> changes that affect files related to booting. >>>>>>=20 >>>>>> r233211 >>>>>> r233377 >>>>>> r233469 >>>>>> r234563 >>>>>>=20 >>>>>=20 >>>>> Thanks Ian! >>>>>=20 >>>>> I'll test each one individually to see if regressing any one (or all) >>>> addresses >>>>> the problem. >>>>=20 >>>> Progress... >>>>=20 >>>> Looks like I found the culprit. >>>>=20 >>>> Turns out it's a back-ported bxe(4) driver (back-ported from 9 -- where >> kgzip >>>> seems to never work). >>>>=20 >>>> I wonder why back-porting bxe(4) from stable/9 to releng/8.3 would cau= se >>> kgzip >>>> to produce non-working kernels. >>>>=20 >>> Yeah, it'll be interesting to see how a device driver can lead to "the >>> machine reboots immediately and unexpectedly the moment the kernel is >>> executed by the loader," which I took to mean "before seeing the >>> copyright or anything." >> Indeed... loader throws up the syms and upon execution *KABOOM* (screen = goes >> black and back to POST) >> The copyright never appears. >>>> I'm emailing the maintainers (davidch + other Broadcom folk) >> The current dossier is even more interesting... the back-ported driver (= with >> zero modifications mind you from stable/9 to stable/8) exhibits memory f= ailures >> (example below), and causes terminals to become wedged when attempting t= o (for >> example) scp a file over an existing configured network (igb-based -- pr= esumably >> unrelated to bxe but in practice loading bxe causes igb to misbehave). >> $ ifconfig bxe0 inet 192.168.1.5/24 >> bxe0: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Canno= t fill >> fp[00] RX chain. >> bxe0: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborti= ng! >> $ ifconfig bxe1 inet 192.168.1.6/24 >> bxe1: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Canno= t fill >> fp[00] RX chain. >> bxe1: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborti= ng! >> (as expected, also sent mail off to maintainers w/respect to above notes= /errors) >=20 > Sounds like you may be out of mbufs which is easy, on a box with 4 igb's = simply > booting without tuning with cause this so, if you have igb's and bxe's th= is > could be your cause. >=20 > Try adding the following to loader.conf and see if it helps:- > kern.ipc.nmbclusters=3D51200 >=20 Sorry for delayed response -- we had to go through a power cycle. I haven't yet tried bumping the value as suggested, but I suspect it will i= ndeed help greatly -- I noticed that I got 18% into the scp before things t= ook a dive for the worse (hanging terminals and such). Another thing worth noting about the uplifted bxe(4) plopped into RELENG_8= =85 when we rebooted: bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full! bxe0: ---------- Begin crash dump ---------- bxe0: ---------- End crash dump ---------- bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full! bxe0: ---------- Begin crash dump ---------- bxe0: ---------- End crash dump ---------- bxe0: ../../../dev/bxe/if_bxe.c(3262): fp[01] client ramrod halt failed! Heh. The machine had to be hard cycled. --=20 Devin _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you.