Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Mar 2018 21:10:31 +0300
From:      Toomas Soome <tsoome@me.com>
To:        Stefan Esser <se@freebsd.org>
Cc:        "M. Warner Losh" <imp@freebsd.org>, Kyle Evans <kevans@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Boot failure: panic: No heap setup
Message-ID:  <A80EB69F-ADDC-46B8-80E0-D82B6ACC9C6A@me.com>
In-Reply-To: <838e40f6-2f05-9251-e5a9-13d52ba510b7@freebsd.org>
References:  <79d2bd72-f8b2-6476-9589-ebad9716698f@freebsd.org> <CACNAnaEwq41PqQATGLF2OAaL6mnRpGgwqYQaux1gZ_kzp4DxoA@mail.gmail.com> <d4304b55-d265-2488-62e4-6117a7a33502@freebsd.org> <CACNAnaGpB434Mca9DdjnPJz_Mt4WhzrCbt=qu5AUGrgD2C6YOQ@mail.gmail.com> <CANCZdfqtxMGuSPuX6rQrLY0Zwi5Ndzff_%2Bf47GyGLuRoRTsggQ@mail.gmail.com> <f5e17e50-362b-21e6-f922-13b504d8420e@freebsd.org> <BB3062B2-5F86-4A75-A749-8FE69D622FE9@me.com> <838e40f6-2f05-9251-e5a9-13d52ba510b7@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 30 Mar 2018, at 18:03, Stefan Esser <se@freebsd.org> wrote:
>=20
> Am 29.03.18 um 07:15 schrieb Toomas Soome:
>>=20
>>=20
>>> On 29 Mar 2018, at 01:06, Stefan Esser <se@freebsd.org> wrote:
>>>=20
>>> Am 28.03.18 um 22:28 schrieb Warner Losh:
>>>>> Hmmm, the code references point into the boot loader code - I had
>>>>> expected that there is a problem in the kernel, not the boot =
loader.
>>>>>=20
>>>>>> [1]
>>>>>> =
https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=3Dmarkup#l56
>>>>   =
<https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=3Dmarkup#l56=
>
>>>>>=20
>>>>>=20
>>>>> Seems that setbase has either not been called or has been called =
with
>>>>> base=3D0.
>>>>=20
>>>>   Right, which is odd...
>>>>=20
>>>>>> [2]
>>>>>> =
https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=3Dm=
arkup#l688
>>>>   =
<https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=3D=
markup#l688>
>>>>>=20
>>>>>=20
>>>>> I had thought, that the zfs boot code has been initialized before =
the
>>>>> menu is displayed?
>>>>=20
>>>>   Right, all of this should be done looooong before we get to the
>>>>   interpreter. Can you break into the loader prompt and try the =
`heap`
>>>>   command, see what that outputs? CC'ing imp@ because he actually =
knows
>>>>   things.
>>>>=20
>>>> Totally weird. I'd add a printf to the sethead() function to =
display its args
>>>> and see if you get this panic before/after that printf...
>>>=20
>>> I'm currently using a Forth-enabled boot loader again, since this is =
a
>>> "production" machine (my home server, which also receives and keeps =
all
>>> my work email, for example).
>>>=20
>>> I'll build a clean world with the LUA loader and test it on one of =
the
>>> next days. Tests will include the "heap" loader command and I'll add =
the
>>> printf (though, if sbrk() has really not been called, I guess that =
will
>>> not go too well ...).
>>>=20
>>> Is it possible, that the setheap function is called a second time, =
just
>>> before jumping into the kernel? (In that case adding the printf =
might
>>> crash the loader in the first setheap call ...)
>>>=20
>>> Since the loader menu (and escaping from the menu) works, there must =
be
>>> a valid heap, at that time.
>>>=20
>>=20
>> indeed. and assuming the message really is from loader, it means, =
there must
>> be memory corruption - if so, you can check which variables are =
located
>> close to heap related ones=E2=80=A6 Also, since you have the working =
menu, it has to
>> be related to actual loading. Since the loading itself has been =
working so
>> far, it should be related to lua specific bits which are preparing =
towards
>> to call load functions.
>=20
> Ok, some more data points:
>=20
> 1) A printf in setheap reported plausible values during start-up of =
zfsboot.
>   The menu appeared and wiped away the values so fast that I could not =
take
>   a photo or write them down.
>=20


if you got menu and stuff, it means that at that point the heap was all =
OK. just after setheap() the bcache_init() is called and that too will =
allocate memory.

what you can do is to esc out from menu to OK prompt and check the =
output of heap and biosmem commands=E2=80=A6=20


> 2) I have rebuilt world and kernel based on r331763. Booting resulted =
in the
>   same panic as reported before. There was no debug output from the =
patched
>   setheap call before the panic (which indicates that it was not =
called a
>   second time).
>=20
> 3) In order to get my system to boot, I interrupted loading of =
zfsloader and
>   forced loading of the previous version (from a world build with =
Forth in
>   the loader). Booting succeeded with the latest kernel ...
>=20
> It looks as if sbrk() was called in zfsloader before setheap() has =
been used
> to initialize the heap parameters, if lua is enabled instead if Forth. =
See
> stand/i386/loader/main.c:124 for the location of the setheap call in =
the
> loader.

this can only happen when something is called before main=E2=80=A6=20

>=20
> This is obviously hard to debug, though, since printf cannot be called =
at that
> point. A pure write(2) should be possible without heap, but since the =
console
> has not been initialized at the point of the setheap invocation, there =
is no
> working output device, AFAIK.
>=20
> I do not see, how any sbrk() call could occur before setheap is =
called. And
> there does not appear to be any other setheap function (or macro) in =
the
> tree, that could overload the one defined in stand/libsa/sbrk.c ...
>=20
> I have no idea how to proceed from here ...
>=20
> But now I'm sure it is a problem in zfsloader (or loader in general?).
>=20
> Hmmm: How is the panic message printed by sbrk() without a initialized =
heap?
> The definition of panic in stand/libsa/panic.c relies on a working =
printf!
>=20
> I should be able to use printf in the same way as panic does, but I =
did
> not succeed when I tried to use it early in zfsloader ...
>=20
> Regards, STefan


rgds,
toomas




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A80EB69F-ADDC-46B8-80E0-D82B6ACC9C6A>