Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Jan 2013 11:04:54 -0800
From:      Devin Teske <devin.teske@fisglobal.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        'Ian Lepore' <freebsd@damnhippie.dyndns.org>, dteske@freebsd.org, freebsd-hackers@freebsd.org
Subject:   Re: kgzip(1) is broken
Message-ID:  <23AAEBCB-6438-42EB-9B2E-E657CFC3BA1B@fisglobal.com>
In-Reply-To: <B22DF1755E60453F939EB6D7A7361622@multiplay.co.uk>
References:  <09b701cdf367$12737530$375a5f90$@freebsd.org>	 <1358291098.32417.134.camel@revolution.hippie.lan>	 <0a0001cdf375$60ddbc40$229934c0$@freebsd.org>	 <0a2301cdf37d$ebe705a0$c3b510e0$@fisglobal.com> <1358296967.32417.137.camel@revolution.hippie.lan> <0a4601cdf384$4ff98e40$efecaac0$@freebsd.org> <B22DF1755E60453F939EB6D7A7361622@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jan 15, 2013, at 5:07 PM, Steven Hartland wrote:

>=20
> ----- Original Message ----- From: <dteske@freebsd.org>
> To: "'Ian Lepore'" <freebsd@damnhippie.dyndns.org>
> Cc: <freebsd-hackers@freebsd.org>; <dteske@freebsd.org>
> Sent: Wednesday, January 16, 2013 12:56 AM
> Subject: RE: kgzip(1) is broken
>=20
>=20
>>> -----Original Message-----
>>> From: Ian Lepore [mailto:freebsd@damnhippie.dyndns.org]
>>> Sent: Tuesday, January 15, 2013 4:43 PM
>>> To: Devin Teske
>>> Cc: dteske@freebsd.org; freebsd-hackers@freebsd.org
>>> Subject: RE: kgzip(1) is broken
>>> On Tue, 2013-01-15 at 16:10 -0800, Devin Teske wrote:
>>>>=20
>>>>> -----Original Message-----
>>>>> From: Devin Teske [mailto:devin.teske@fisglobal.com] On Behalf Of
>>>>> dteske@freebsd.org
>>>>> Sent: Tuesday, January 15, 2013 3:10 PM
>>>>> To: 'Ian Lepore'
>>>>> Cc: freebsd-hackers@freebsd.org; dteske@freebsd.org
>>>>> Subject: RE: kgzip(1) is broken
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>> -----Original Message-----
>>>>>> From: Ian Lepore [mailto:freebsd@damnhippie.dyndns.org]
>>>>>> Sent: Tuesday, January 15, 2013 3:05 PM
>>>>>> To: dteske@freebsd.org
>>>>>> Cc: freebsd-hackers@freebsd.org
>>>>>> Subject: Re: kgzip(1) is broken
>>>>>>=20
>>>>>> On Tue, 2013-01-15 at 13:27 -0800, dteske@freebsd.org wrote:
>>>>>>> Hello,
>>>>>>>=20
>>>>>>> I have been sad of-late because kgzip(1) no longer produces a usable
>>>> kernel.
>>>>>>>=20
>>>>>>> All versions of 9.x suffer this.
>>>>>>>=20
>>>>>>> And somewhere between 8.3-RELEASE-p1 and 8.3-RELEASE-p5 this
>>> recently
>>>>>> broke in
>>>>>>> the 8.x series.
>>>>>>>=20
>>>>>>> I haven't tried the 7 series lately, but if whatever is making the
>> rounds
>>>>> gets
>>>>>>> MFC'd that far back, I expect the problem to percolate there too.
>>>>>>>=20
>>>>>>> The symptom is that the machine reboots immediately and unexpectedly
>>> the
>>>>>> moment
>>>>>>> the kernel is executed by the loader.
>>>>>>>=20
>>>>>>> This is quite troubling and I am looking for someone to help find t=
he
>>>>> culprit. I
>>>>>>> don't know where to start looking.
>>>>>>=20
>>>>>> Here are some possible candidates from the things that were MFC'd to=
 8
>>>>>> in that timeframe.  I haven't looked at what these do, they're just
>>>>>> changes that affect files related to booting.
>>>>>>=20
>>>>>> r233211
>>>>>> r233377
>>>>>> r233469
>>>>>> r234563
>>>>>>=20
>>>>>=20
>>>>> Thanks Ian!
>>>>>=20
>>>>> I'll test each one individually to see if regressing any one (or all)
>>>> addresses
>>>>> the problem.
>>>>=20
>>>> Progress...
>>>>=20
>>>> Looks like I found the culprit.
>>>>=20
>>>> Turns out it's a back-ported bxe(4) driver (back-ported from 9 -- where
>> kgzip
>>>> seems to never work).
>>>>=20
>>>> I wonder why back-porting bxe(4) from stable/9 to releng/8.3 would cau=
se
>>> kgzip
>>>> to produce non-working kernels.
>>>>=20
>>> Yeah, it'll be interesting to see how a device driver can lead to "the
>>> machine reboots immediately and unexpectedly the moment the kernel is
>>> executed by the loader," which I took to mean "before seeing the
>>> copyright or anything."
>> Indeed... loader throws up the syms and upon execution *KABOOM* (screen =
goes
>> black and back to POST)
>> The copyright never appears.
>>>> I'm emailing the maintainers (davidch + other Broadcom folk)
>> The current dossier is even more interesting... the back-ported driver (=
with
>> zero modifications mind you from stable/9 to stable/8) exhibits memory f=
ailures
>> (example below), and causes terminals to become wedged when attempting t=
o (for
>> example) scp a file over an existing configured network (igb-based -- pr=
esumably
>> unrelated to bxe but in practice loading bxe causes igb to misbehave).
>> $ ifconfig bxe0 inet 192.168.1.5/24
>> bxe0: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Canno=
t fill
>> fp[00] RX chain.
>> bxe0: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborti=
ng!
>> $ ifconfig bxe1 inet 192.168.1.6/24
>> bxe1: ../../../dev/bxe/if_bxe.c(10939): Memory allocation failure! Canno=
t fill
>> fp[00] RX chain.
>> bxe1: ../../../dev/bxe/if_bxe.c(3921): NIC initialization failed, aborti=
ng!
>> (as expected, also sent mail off to maintainers w/respect to above notes=
/errors)
>=20
> Sounds like you may be out of mbufs which is easy, on a box with 4 igb's =
simply
> booting without tuning with cause this so, if you have igb's and bxe's th=
is
> could be your cause.
>=20
> Try adding the following to loader.conf and see if it helps:-
> kern.ipc.nmbclusters=3D51200
>=20

Sorry for delayed response -- we had to go through a power cycle.

I haven't yet tried bumping the value as suggested, but I suspect it will i=
ndeed help greatly -- I noticed that I got 18% into the scp before things t=
ook a dive for the worse (hanging terminals and such).

Another thing worth noting about the uplifted bxe(4) plopped into RELENG_8=
=85 when we rebooted:

bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full!
bxe0: ---------- Begin crash dump ----------
bxe0: ----------  End crash dump  ----------
bxe0: ../../../dev/bxe/if_bxe.c(6419): Slowpath queue is full!
bxe0: ---------- Begin crash dump ----------
bxe0: ----------  End crash dump  ----------
bxe0: ../../../dev/bxe/if_bxe.c(3262): fp[01] client ramrod halt failed!

Heh. The machine had to be hard cycled.
--=20
Devin

_____________
The information contained in this message is proprietary and/or confidentia=
l. If you are not the intended recipient, please: (i) delete the message an=
d all copies; (ii) do not disclose, distribute or use the message in any ma=
nner; and (iii) notify the sender immediately. In addition, please be aware=
 that any message addressed to our domain is subject to archiving and revie=
w by persons other than the intended recipient. Thank you.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?23AAEBCB-6438-42EB-9B2E-E657CFC3BA1B>