From owner-freebsd-hackers@FreeBSD.ORG  Wed Jan 16 19:05:28 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 764EA63E;
 Wed, 16 Jan 2013 19:05:28 +0000 (UTC)
 (envelope-from Devin.Teske@fisglobal.com)
Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190])
 by mx1.freebsd.org (Postfix) with ESMTP id 436F236B;
 Wed, 16 Jan 2013 19:05:27 +0000 (UTC)
Received: from smtp.fisglobal.com ([10.132.206.15])
 by ltcfislmsgpa01.fnfis.com (8.14.5/8.14.5) with ESMTP id r0GJ4xhS018207
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT);
 Wed, 16 Jan 2013 13:04:59 -0600
Received: from [10.0.0.102] (10.14.152.24) by smtp.fisglobal.com
 (10.132.206.15) with Microsoft SMTP Server (TLS) id 14.2.309.2; Wed, 16 Jan
 2013 13:04:59 -0600
Subject: Re: kgzip(1) is broken
MIME-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="windows-1252"
From: Devin Teske <devin.teske@fisglobal.com>
In-Reply-To: <B22DF1755E60453F939EB6D7A7361622@multiplay.co.uk>
Date: Wed, 16 Jan 2013 11:04:54 -0800
Content-Transfer-Encoding: quoted-printable
Message-ID: <23AAEBCB-6438-42EB-9B2E-E657CFC3BA1B@fisglobal.com>
References: <09b701cdf367$12737530$375a5f90$@freebsd.org>	
 <1358291098.32417.134.camel@revolution.hippie.lan>	
 <0a0001cdf375$60ddbc40$229934c0$@freebsd.org>	
 <0a2301cdf37d$ebe705a0$c3b510e0$@fisglobal.com>
 <1358296967.32417.137.camel@revolution.hippie.lan>
 <0a4601cdf384$4ff98e40$efecaac0$@freebsd.org>
 <B22DF1755E60453F939EB6D7A7361622@multiplay.co.uk>
To: Steven Hartland <killing@multiplay.co.uk>
X-Mailer: Apple Mail (2.1283)
X-Originating-IP: [10.14.152.24]
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.9.8327, 1.0.431,
 0.0.0000
 definitions=2013-01-16_06:2013-01-16,2013-01-16,1970-01-01 signatures=0
Cc: 'Ian Lepore' <freebsd@damnhippie.dyndns.org>, dteske@freebsd.org,
 freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Devin Teske <dteske@freebsd.org>
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Jan 2013 19:05:28 -0000


On Jan 15, 2013, at 5:07 PM, Steven Hartland wrote:

>=20
> ----- Original Message ----- From: <dteske@freebsd.org>
> To: "'Ian Lepore'" <freebsd@damnhippie.dyndns.org>
> Cc: <freebsd-hackers@freebsd.org>; <dteske@freebsd.org>
> Sent: Wednesday, January 16, 2013 12:56 AM
> Subject: RE: kgzip(1) is broken
>=20
>=20
>>> -----Original Message-----
>>> From: Ian Lepore [mailto:freebsd@damnhippie.dyndns.org]
>>> Sent: Tuesday, January 15, 2013 4:43 PM
>>> To: Devin Teske
>>> Cc: dteske@freebsd.org; freebsd-hackers@freebsd.org
>>> Subject: RE: kgzip(1) is broken
>>> On Tue, 2013-01-15 at 16:10 -0800, Devin Teske wrote:
>>>>=20
>>>>> -----Original Message-----
>>>>> From: Devin Teske [mailto:devin.teske@fisglobal.com] On Behalf Of
>>>>> dteske@freebsd.org
>>>>> Sent: Tuesday, January 15, 2013 3:10 PM
>>>>> To: 'Ian Lepore'
>>>>> Cc: freebsd-hackers@freebsd.org; dteske@freebsd.org
>>>>> Subject: RE: kgzip(1) is broken
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>> -----Original Message-----
>>>>>> From: Ian Lepore [mailto:freebsd@damnhippie.dyndns.org]
>>>>>> Sent: Tuesday, January 15, 2013 3:05 PM
>>>>>> To: dteske@freebsd.org
>>>>>> Cc: freebsd-hackers@freebsd.org
>>>>>> Subject: Re: kgzip(1) is broken
>>>>>>=20
>>>>>> On Tue, 2013-01-15 at 13:27 -0800, dteske@freebsd.org wrote:
>>>>>>> Hello,
>>>>>>>=20
>>>>>>> I have been sad of-late because kgzip(1) no longer produces a usable
>>>> kernel.
>>>>>>>=20
>>>>>>> All versions of 9.x suffer this.
>>>>>>>=20
>>>>>>> And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this
>>> recently
>>>>>> broke in
>>>>>>> the 8.x series.
>>>>>>>=20
>>>>>>> I haven't tried the 7 series lately, but if whatever is making the
>> rounds
>>>>> gets
>>>>>>> MFC'd that far back, I expect the problem to percolate there too.
>>>>>>>=20
>>>>>>> The symptom is that the machine reboots immediately and unexpectedly
>>> the
>>>>>> moment
>>>>>>> the kernel is executed by the loader.
>>>>>>>=20
>>>>>>> This is quite troubling and I am looking for someone to help find t=
he
>>>>> culprit. I
>>>>>>> don't know where to start looking.
>>>>>>=20
>>>>>> Here are some possible candidates from the things that were MFC'd to=
 8
>>>>>> in that timeframe.  I haven't looked at what these do, they're just
>>>>>> changes that affect files related to booting.
>>>>>>=20
>>>>>> r233211
>>>>>> r233377
>>>>>> r233469
>>>>>> r234563
>>>>>>=20
>>>>>=20
>>>>> Thanks Ian!
>>>>>=20
>>>>> I'll test each one individually to see if regressing any one (or all)
>>>> addresses
>>>>> the problem.
>>>>=20
>>>> Progress...
>>>>=20
>>>> Looks like I found the culprit.
>>>>=20
>>>> Turns out it's a back-ported bxe(4) driver (back-ported from 9 -- where
>> kgzip
>>>> seems to never work).
>>>>=20
>>>> I wonder why back-porting bxe(4) from stable/9 to releng/8.3 would cau=
se
>>> kgzip
>>>> to produce non-working kernels.
>>>>=20
>>> Yeah, it'll be interesting to see how a device driver can lead to "the
>>> machine reboots immediately and unexpectedly the moment the kernel is
>>> executed by the loader," which I took to mean "before seeing the
>>> copyright or anything."
>> Indeed... loader throws up the syms and upon execution *KABOOM* (screen =
goes
>> black and back to POST)
>> The copyright never appears.
>>>> I'm emailing the maintainers (davidch + other Broadcom folk)
>> The current dossier is even more interesting... the back-ported driver (=
with
>> zero modifications mind you from stable/9 to stable/8) exhibits memory f=
ailures
>> (example below), and causes terminals to become wedged when attempting t=
o (for
>> example) scp a file over an existing configured network (igb-based -- pr=
esumably
>> unrelated to bxe but in practice loading bxe causes igb to misbehave).
>> $ ifconfig bxe0 inet 192.168.1.5/24
>> bxe0: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Canno=
t fill
>> fp[00] RX chain.
>> bxe0: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborti=
ng!
>> $ ifconfig bxe1 inet 192.168.1.6/24
>> bxe1: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Canno=
t fill
>> fp[00] RX chain.
>> bxe1: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborti=
ng!
>> (as expected, also sent mail off to maintainers w/respect to above notes=
/errors)
>=20
> Sounds like you may be out of mbufs which is easy, on a box with 4 igb's =
simply
> booting without tuning with cause this so, if you have igb's and bxe's th=
is
> could be your cause.
>=20
> Try adding the following to loader.conf and see if it helps:-
> kern.ipc.nmbclusters=3D51200
>=20

Sorry for delayed response -- we had to go through a power cycle.

I haven't yet tried bumping the value as suggested, but I suspect it will i=
ndeed help greatly -- I noticed that I got 18% into the scp before things t=
ook a dive for the worse (hanging terminals and such).

Another thing worth noting about the uplifted bxe(4) plopped into RELENG_8=
=85 when we rebooted:

bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full!
bxe0: ---------- Begin crash dump ----------
bxe0: ----------  End crash dump  ----------
bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full!
bxe0: ---------- Begin crash dump ----------
bxe0: ----------  End crash dump  ----------
bxe0: ../../../dev/bxe/if_bxe.c(3262): fp[01] client ramrod halt failed!

Heh. The machine had to be hard cycled.
--=20
Devin

_____________
The information contained in this message is proprietary and/or confidentia=
l. If you are not the intended recipient, please: (i) delete the message an=
d all copies; (ii) do not disclose, distribute or use the message in any ma=
nner; and (iii) notify the sender immediately. In addition, please be aware=
 that any message addressed to our domain is subject to archiving and revie=
w by persons other than the intended recipient. Thank you.