Date: Fri, 30 Mar 2018 21:10:31 +0300 From: Toomas Soome <tsoome@me.com> To: Stefan Esser <se@freebsd.org> Cc: "M. Warner Losh" <imp@freebsd.org>, Kyle Evans <kevans@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: Boot failure: panic: No heap setup Message-ID: <A80EB69F-ADDC-46B8-80E0-D82B6ACC9C6A@me.com> In-Reply-To: <838e40f6-2f05-9251-e5a9-13d52ba510b7@freebsd.org> References: <79d2bd72-f8b2-6476-9589-ebad9716698f@freebsd.org> <CACNAnaEwq41PqQATGLF2OAaL6mnRpGgwqYQaux1gZ_kzp4DxoA@mail.gmail.com> <d4304b55-d265-2488-62e4-6117a7a33502@freebsd.org> <CACNAnaGpB434Mca9DdjnPJz_Mt4WhzrCbt=qu5AUGrgD2C6YOQ@mail.gmail.com> <CANCZdfqtxMGuSPuX6rQrLY0Zwi5Ndzff_%2Bf47GyGLuRoRTsggQ@mail.gmail.com> <f5e17e50-362b-21e6-f922-13b504d8420e@freebsd.org> <BB3062B2-5F86-4A75-A749-8FE69D622FE9@me.com> <838e40f6-2f05-9251-e5a9-13d52ba510b7@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 30 Mar 2018, at 18:03, Stefan Esser <se@freebsd.org> wrote: >=20 > Am 29.03.18 um 07:15 schrieb Toomas Soome: >>=20 >>=20 >>> On 29 Mar 2018, at 01:06, Stefan Esser <se@freebsd.org> wrote: >>>=20 >>> Am 28.03.18 um 22:28 schrieb Warner Losh: >>>>> Hmmm, the code references point into the boot loader code - I had >>>>> expected that there is a problem in the kernel, not the boot = loader. >>>>>=20 >>>>>> [1] >>>>>> = https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=3Dmarkup#l56 >>>> = <https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=3Dmarkup#l56= > >>>>>=20 >>>>>=20 >>>>> Seems that setbase has either not been called or has been called = with >>>>> base=3D0. >>>>=20 >>>> Right, which is odd... >>>>=20 >>>>>> [2] >>>>>> = https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=3Dm= arkup#l688 >>>> = <https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=3D= markup#l688> >>>>>=20 >>>>>=20 >>>>> I had thought, that the zfs boot code has been initialized before = the >>>>> menu is displayed? >>>>=20 >>>> Right, all of this should be done looooong before we get to the >>>> interpreter. Can you break into the loader prompt and try the = `heap` >>>> command, see what that outputs? CC'ing imp@ because he actually = knows >>>> things. >>>>=20 >>>> Totally weird. I'd add a printf to the sethead() function to = display its args >>>> and see if you get this panic before/after that printf... >>>=20 >>> I'm currently using a Forth-enabled boot loader again, since this is = a >>> "production" machine (my home server, which also receives and keeps = all >>> my work email, for example). >>>=20 >>> I'll build a clean world with the LUA loader and test it on one of = the >>> next days. Tests will include the "heap" loader command and I'll add = the >>> printf (though, if sbrk() has really not been called, I guess that = will >>> not go too well ...). >>>=20 >>> Is it possible, that the setheap function is called a second time, = just >>> before jumping into the kernel? (In that case adding the printf = might >>> crash the loader in the first setheap call ...) >>>=20 >>> Since the loader menu (and escaping from the menu) works, there must = be >>> a valid heap, at that time. >>>=20 >>=20 >> indeed. and assuming the message really is from loader, it means, = there must >> be memory corruption - if so, you can check which variables are = located >> close to heap related ones=E2=80=A6 Also, since you have the working = menu, it has to >> be related to actual loading. Since the loading itself has been = working so >> far, it should be related to lua specific bits which are preparing = towards >> to call load functions. >=20 > Ok, some more data points: >=20 > 1) A printf in setheap reported plausible values during start-up of = zfsboot. > The menu appeared and wiped away the values so fast that I could not = take > a photo or write them down. >=20 if you got menu and stuff, it means that at that point the heap was all = OK. just after setheap() the bcache_init() is called and that too will = allocate memory. what you can do is to esc out from menu to OK prompt and check the = output of heap and biosmem commands=E2=80=A6=20 > 2) I have rebuilt world and kernel based on r331763. Booting resulted = in the > same panic as reported before. There was no debug output from the = patched > setheap call before the panic (which indicates that it was not = called a > second time). >=20 > 3) In order to get my system to boot, I interrupted loading of = zfsloader and > forced loading of the previous version (from a world build with = Forth in > the loader). Booting succeeded with the latest kernel ... >=20 > It looks as if sbrk() was called in zfsloader before setheap() has = been used > to initialize the heap parameters, if lua is enabled instead if Forth. = See > stand/i386/loader/main.c:124 for the location of the setheap call in = the > loader. this can only happen when something is called before main=E2=80=A6=20 >=20 > This is obviously hard to debug, though, since printf cannot be called = at that > point. A pure write(2) should be possible without heap, but since the = console > has not been initialized at the point of the setheap invocation, there = is no > working output device, AFAIK. >=20 > I do not see, how any sbrk() call could occur before setheap is = called. And > there does not appear to be any other setheap function (or macro) in = the > tree, that could overload the one defined in stand/libsa/sbrk.c ... >=20 > I have no idea how to proceed from here ... >=20 > But now I'm sure it is a problem in zfsloader (or loader in general?). >=20 > Hmmm: How is the panic message printed by sbrk() without a initialized = heap? > The definition of panic in stand/libsa/panic.c relies on a working = printf! >=20 > I should be able to use printf in the same way as panic does, but I = did > not succeed when I tried to use it early in zfsloader ... >=20 > Regards, STefan rgds, toomas
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A80EB69F-ADDC-46B8-80E0-D82B6ACC9C6A>